Latest Papers on Radiology AI. Tags: Benchmark SOTA

Automated detection of lacunes in brain MR images using SAM with robust prompts using self-distillation and anatomy-informed priors.

Deepika P, Shanker G, Narayanan R, Sundaresan V

•papers•Aug 4 2025

Lacunes, which are small fluid-filled cavities in the brain, are signs of cerebral small vessel disease and have been clinically associated with various neurodegenerative and cerebrovascular diseases. Hence, accurate detection of lacunes is crucial and is one of the initial steps for the precise diagnosis of these diseases. However, developing a robust and consistently reliable method for detecting lacunes is challenging because of the heterogeneity in their appearance, contrast, shape, and size. In this study, we propose a lacune detection method using the Segment Anything Model (SAM), guided by point prompts from a candidate prompt generator. The prompt generator initially detects potential lacunes with a high sensitivity using a composite loss function. The true lacunes are then selected using SAM by discriminating their characteristics from mimics such as the sulcus and enlarged perivascular spaces, imitating the clinicians' strategy of examining the potential lacunes along all three axes. False positives are further reduced by adaptive thresholds based on the region wise prevalence of lacunes. We evaluated our method on two diverse, multi-centric MRI datasets, VALDO and ISLES, comprising only FLAIR sequences. Despite diverse imaging conditions and significant variations in slice thickness (0.5-6 mm), our method achieved sensitivities of 84% and 92%, with average false positive rates of 0.05 and 0.06 per slice in ISLES and VALDO datasets respectively. The proposed method demonstrates robust performance across varied imaging conditions and outperformed the state-of-the-art methods, demonstrating its effectiveness in lacune detection and quantification.

MRI Detection Neurological Methodology In Silico Benchmark SOTA

AI-Driven Integration of Deep Learning with Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction: Progress and Perspectives.

Huang K, Wu C, Fang J, Pi R

•papers•Aug 4 2025

This Perspective article explores the transformative role of artificial intelligence (AI) in predicting perioperative hypoxemia through the integration of deep learning (DL) with multimodal clinical data, including lung imaging, pulmonary function tests (PFTs), and arterial blood gas (ABG) analysis. Perioperative hypoxemia, defined as arterial oxygen partial pressure (PaO₂) <60 mmHg or oxygen saturation (SpO₂) <90%, poses significant risks of delayed recovery and organ dysfunction. Traditional diagnostic methods, such as radiological imaging and ABG analysis, often lack integrated predictive accuracy. AI frameworks, particularly convolutional neural networks (CNNs) and hybrid models like TD-CNNLSTM-LungNet, demonstrate exceptional performance in detecting pulmonary inflammation and stratifying hypoxemia risk, achieving up to 96.57% accuracy in pneumonia subtype differentiation and an AUC of 0.96 for postoperative hypoxemia prediction. Multimodal AI systems, such as DeepLung-Predict, unify CT scans, PFTs, and ABG parameters to enhance predictive precision, surpassing conventional methods by 22%. However, challenges persist, including dataset heterogeneity, model interpretability, and clinical workflow integration. Future directions emphasize multicenter validation, explainable AI (XAI) frameworks, and pragmatic trials to ensure equitable and reliable deployment. This AI-driven approach not only optimizes resource allocation but also mitigates financial burdens on healthcare systems by enabling early interventions and reducing ICU admission risks.

CT Classification Chest Review Concept Academic Lab Benchmark SOTA Ethics

S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework

Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou

•preprint•Aug 4 2025

Radiology report generation (RRG) for diagnostic images, such as chest X-rays, plays a pivotal role in both clinical practice and AI. Traditional free-text reports suffer from redundancy and inconsistent language, complicating the extraction of critical clinical details. Structured radiology report generation (S-RRG) offers a promising solution by organizing information into standardized, concise formats. However, existing approaches often rely on classification or visual question answering (VQA) pipelines that require predefined label sets and produce only fragmented outputs. Template-based approaches, which generate reports by replacing keywords within fixed sentence patterns, further compromise expressiveness and often omit clinically important details. In this work, we present a novel approach to S-RRG that includes dataset construction, model training, and the introduction of a new evaluation framework. We first create a robust chest X-ray dataset (MIMIC-STRUC) that includes disease names, severity levels, probabilities, and anatomical locations, ensuring that the dataset is both clinically relevant and well-structured. We train an LLM-based model to generate standardized, high-quality reports. To assess the generated reports, we propose a specialized evaluation metric (S-Score) that not only measures disease prediction accuracy but also evaluates the precision of disease-specific details, thus offering a clinically meaningful metric for report quality that focuses on elements critical to clinical decision-making and demonstrates a stronger alignment with human assessments. Our approach highlights the effectiveness of structured reports and the importance of a tailored evaluation metric for S-RRG, providing a more clinically relevant measure of report quality.

X-Ray LLM Radiology Report Chest Methodology In Silico Open Dataset Benchmark SOTA GenAI

Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Population-Based Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)

Anindo Saha, Joeran S. Bosma, Jasper J. Twilt, Alexander B. C. D. Ng, Aqua Asif, Kirti Magudia, Peder Larson, Qinglin Xie, Xiaodong Zhang, Chi Pham Minh, Samuel N. Gitau, Ivo G. Schoots, Martijn F. Boomsma, Renato Cuocolo, Nikolaos Papanikolaou, Daniele Regge, Derya Yakar, Mattijs Elschot, Jeroen Veltman, Baris Turkbey, Nancy A. Obuchowski, Jurgen J. Fütterer, Anwar R. Padhani, Hashim U. Ahmed, Tobias Nordström, Martin Eklund, Veeru Kasivisvanathan, Maarten de Rooij, Henkjan Huisman

•preprint•Aug 4 2025

In this intercontinental, confirmatory study, we include a retrospective cohort of 22,481 MRI examinations (21,288 patients; 46 cities in 22 countries) to train and externally validate the PI-CAI-2B model, i.e., an efficient, next-generation iteration of the state-of-the-art AI system that was developed for detecting Gleason grade group $\geq$2 prostate cancer on MRI during the PI-CAI study. Of these examinations, 20,471 cases (19,278 patients; 26 cities in 14 countries) from two EU Horizon projects (ProCAncer-I, COMFORT) and 12 independent centers based in Europe, North America, Asia and Africa, are used for training and internal testing. Additionally, 2010 cases (2010 patients; 20 external cities in 12 countries) from population-based screening (STHLM3-MRI, IP1-PROSTAGRAM trials) and primary diagnostic settings (PRIME trial) based in Europe, North and South Americas, Asia and Australia, are used for external testing. Primary endpoint is the proportion of AI-based assessments in agreement with the standard of care diagnoses (i.e., clinical assessments made by expert uropathologists on histopathology, if available, or at least two expert urogenital radiologists in consensus; with access to patient history and peer consultation) in the detection of Gleason grade group $\geq$2 prostate cancer within the external testing cohorts. Our statistical analysis plan is prespecified with a hypothesis of diagnostic interchangeability to the standard of care at the PI-RADS $\geq$3 (primary diagnosis) or $\geq$4 (screening) cut-off, considering an absolute margin of 0.05 and reader estimates derived from the PI-CAI observer study (62 radiologists reading 400 cases). Secondary measures comprise the area under the receiver operating characteristic curve (AUROC) of the AI system stratified by imaging quality, patient age and patient ethnicity to identify underlying biases (if any).

MRI Detection Abdominal Retrospective Clinical In Silico Consortium Benchmark SOTA Open Dataset Ethics

Diagnostic Performance of Imaging-Based Artificial Intelligence Models for Preoperative Detection of Cervical Lymph Node Metastasis in Clinically Node-Negative Papillary Thyroid Carcinoma: A Systematic Review and Meta-Analysis.

Li B, Cheng G, Mo Y, Dai J, Cheng S, Gong S, Li H, Liu Y

•papers•Aug 4 2025

This systematic review and meta-analysis evaluated the performance of imaging-based artificial intelligence (AI) models in diagnosing preoperative cervical lymph node metastasis (LNM) in clinically node-negative (cN0) papillary thyroid carcinoma (PTC). We conducted a literature search in PubMed, Embase, and Web of Science until February 25, 2025. Studies were selected that focused on imaging-based AI models for predicting cervical LNM in cN0 PTC. The diagnostic performance metrics were analyzed using a bivariate random-effects model, and study quality was assessed with the QUADAS-2 tool. From 671 articles, 11 studies involving 3366 patients were included. Ultrasound (US)-based AI models showed pooled sensitivity of 0.79 and specificity of 0.82, significantly higher than radiologists (p < 0.001). CT-based AI models demonstrated sensitivity of 0.78 and specificity of 0.89. Imaging-based AI models, particularly US-based AI, show promising diagnostic performance. There is a need for further multicenter prospective studies for validation. PROSPERO: (CRD420251063416).

Mixed Modality Detection Meta Analysis In Silico Academic Lab Benchmark SOTA

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Jin W, Zhang H, Ning Y, Chen X, Zhang G, Li H, Zhang H

•papers•Aug 4 2025

We developed an MRI-based habitat radiomics model (HRM) to predict p53-abnormal (p53abn) molecular subtypes of endometrial cancer (EC). Patients with pathologically confirmed EC were retrospectively enrolled from three hospitals and categorized into a training cohort (n = 270), test cohort 1 (n = 70), and test cohort 2 (n = 154). The tumour was divided into habitat sub-regions using diffusion-weighted imaging (DWI) and contrast-enhanced (CE) images with the K-means algorithm. Radiomics features were extracted from T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), DWI, and CE images. Three machine learning classifiers-logistic regression, support vector machines, and random forests-were applied to develop predictive models for p53abn EC. Model performance was validated using receiver operating characteristic (ROC) curves, and the model with the best predictive performance was selected as the HRM. A whole-region radiomics model (WRM) was also constructed, and a clinical model (CM) with five clinical features was developed. The SHApley Additive ExPlanations (SHAP) method was used to explain the outputs of the models. DeLong's test evaluated and compared the performance across the cohorts. A total of 1920 habitat radiomics features were considered. Eight features were selected for the HRM, ten for the WRM, and three clinical features for the CM. The HRM achieved the highest AUC: 0.855 (training), 0.769 (test1), and 0.766 (test2). The AUCs of the WRM were 0.707 (training), 0.703 (test1), and 0.738 (test2). The AUCs of the CM were 0.709 (training), 0.641 (test1), and 0.665 (test2). The MRI-based HRM successfully predicted p53abn EC. The results indicate that habitat combined with machine learning, radiomics, and SHAP can effectively predict p53abn EC, providing clinicians with intuitive insights and interpretability regarding the impact of risk factors in the model.

MRI Classification Abdominal Retrospective Clinical In Silico Benchmark SOTA GenAI

Deep Learning-Enabled Ultrasound for Advancing Anterior Talofibular Ligament Injuries Classification: A Multicenter Model Development and Validation Study.

Shi X, Zhang H, Yuan Y, Xu Z, Meng L, Xi Z, Qiao Y, Liu S, Sun J, Cui J, Du R, Yu Q, Wang D, Shen S, Gao C, Li P, Bai L, Xu H, Wang K

•papers•Aug 4 2025

Ultrasound (US) is the preferred modality for assessing anterior talofibular ligament (ATFL) injuries. We aimed to advance ATFL injuries classification by developing a US-based deep learning (DL) model, and explore how artificial intelligence (AI) could help radiologists improve diagnostic performance. Consecutive healthy controls and patients with acute ATFL injuries (mild strain, partial tear, complete tear, and avulsion fracture) at 10 hospitals were retrospectively included. A US-based DL model (ATFLNet) was trained (n=2566), internally validated (n=642), and externally validated (n=717 and 493). Surgical or radiological findings based on the majority consensus of three experts served as the reference standard. Prospective validation was conducted at three additional hospitals (n=472). The performance was compared to that of 12 radiologists at different levels (external validation sets 1 and 2); an ATFLNet-aided strategy was developed, comparing with the radiologists when reviewing B-mode images (external validation set 2); the strategy was then tested in a simulated scenario (reviewing images alongside dynamic clips; prospective validation set). Statistical comparisons were performed using the McNemar's test, while inter-reader agreement was evaluated with the Multireader Fleiss κ statistic. ATFLNet obtained macro-average area under the curve ≥0.970 across all five classes in each dataset, indicating robust overall performance. Additionally, it consistently outperformed senior radiologists in external validation sets (all p<.05). ATFLNet-aided strategy improved radiologists' average accuracy (0.707 vs. 0.811, p<.001) for image review. In the simulated scenario, it led to enhanced accuracy (0.794 to 0.864, p=.003), and a reduction in diagnostic variability, particularly for junior radiologists. Our US-based model outperformed human experts for ATFL injury evaluation. AI-aided strategies hold the potential to enhance diagnostic performance in real-world clinical scenarios.

Ultrasound Classification Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

M$^3$AD: Multi-task Multi-gate Mixture of Experts for Alzheimer's Disease Diagnosis with Conversion Pattern Modeling

Yufeng Jiang, Hexiao Ding, Hongzhao Chen, Jing Lan, Xinzhi Teng, Gerald W. Y. Cheng, Zongxi Li, Haoran Xie, Jung Sun Yoo, Jing Cai

•preprint•Aug 3 2025

Alzheimer's disease (AD) progression follows a complex continuum from normal cognition (NC) through mild cognitive impairment (MCI) to dementia, yet most deep learning approaches oversimplify this into discrete classification tasks. This study introduces M$^3$AD, a novel multi-task multi-gate mixture of experts framework that jointly addresses diagnostic classification and cognitive transition modeling using structural MRI. We incorporate three key innovations: (1) an open-source T1-weighted sMRI preprocessing pipeline, (2) a unified learning framework capturing NC-MCI-AD transition patterns with demographic priors (age, gender, brain volume) for improved generalization, and (3) a customized multi-gate mixture of experts architecture enabling effective multi-task learning with structural MRI alone. The framework employs specialized expert networks for diagnosis-specific pathological patterns while shared experts model common structural features across the cognitive continuum. A two-stage training protocol combines SimMIM pretraining with multi-task fine-tuning for joint optimization. Comprehensive evaluation across six datasets comprising 12,037 T1-weighted sMRI scans demonstrates superior performance: 95.13% accuracy for three-class NC-MCI-AD classification and 99.15% for binary NC-AD classification, representing improvements of 4.69% and 0.55% over state-of-the-art approaches. The multi-task formulation simultaneously achieves 97.76% accuracy in predicting cognitive transition. Our framework outperforms existing methods using fewer modalities and offers a clinically practical solution for early intervention. Code: https://github.com/csyfjiang/M3AD.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Code

LoRA-based methods on Unet for transfer learning in Subarachnoid Hematoma Segmentation

Cristian Minoccheri, Matthew Hodgman, Haoyuan Ma, Rameez Merchant, Emily Wittrup, Craig Williamson, Kayvan Najarian

•preprint•Aug 3 2025

Aneurysmal subarachnoid hemorrhage (SAH) is a life-threatening neurological emergency with mortality rates exceeding 30%. Transfer learning from related hematoma types represents a potentially valuable but underexplored approach. Although Unet architectures remain the gold standard for medical image segmentation due to their effectiveness on limited datasets, Low-Rank Adaptation (LoRA) methods for parameter-efficient transfer learning have been rarely applied to convolutional neural networks in medical imaging contexts. We implemented a Unet architecture pre-trained on computed tomography scans from 124 traumatic brain injury patients across multiple institutions, then fine-tuned on 30 aneurysmal SAH patients from the University of Michigan Health System using 3-fold cross-validation. We developed a novel CP-LoRA method based on tensor CP-decomposition and introduced DoRA variants (DoRA-C, convDoRA, CP-DoRA) that decompose weight matrices into magnitude and directional components. We compared these approaches against existing LoRA methods (LoRA-C, convLoRA) and standard fine-tuning strategies across different modules on a multi-view Unet model. LoRA-based methods consistently outperformed standard Unet fine-tuning. Performance varied by hemorrhage volume, with all methods showing improved accuracy for larger volumes. CP-LoRA achieved comparable performance to existing methods while using significantly fewer parameters. Over-parameterization with higher ranks consistently yielded better performance than strictly low-rank adaptations. This study demonstrates that transfer learning between hematoma types is feasible and that LoRA-based methods significantly outperform conventional Unet fine-tuning for aneurysmal SAH segmentation.

CT Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA

Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation

Andrea Dosi, Semanto Mondal, Rajib Chandra Ghosh, Massimo Brescia, Giuseppe Longo

•preprint•Aug 3 2025

This work presents the results of a methodological transfer from remote sensing to healthcare, adapting AMBER -- a transformer-based model originally designed for multiband images, such as hyperspectral data -- to the task of 3D medical datacube segmentation. In this study, we use the AMBER architecture with Adaptive Fourier Neural Operators (AFNO) in place of the multi-head self-attention mechanism. While existing models rely on various forms of attention to capture global context, AMBER-AFNO achieves this through frequency-domain mixing, enabling a drastic reduction in model complexity. This design reduces the number of trainable parameters by over 80% compared to UNETR++, while maintaining a FLOPs count comparable to other state-of-the-art architectures. Model performance is evaluated on two benchmark 3D medical datasets -- ACDC and Synapse -- using standard metrics such as Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD), demonstrating that AMBER-AFNO achieves competitive or superior accuracy with significant gains in training efficiency, inference speed, and memory usage.

CT Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Automated detection of lacunes in brain MR images using SAM with robust prompts using self-distillation and anatomy-informed priors.

AI-Driven Integration of Deep Learning with Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction: Progress and Perspectives.

S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework

Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Population-Based Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)

Diagnostic Performance of Imaging-Based Artificial Intelligence Models for Preoperative Detection of Cervical Lymph Node Metastasis in Clinically Node-Negative Papillary Thyroid Carcinoma: A Systematic Review and Meta-Analysis.

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Deep Learning-Enabled Ultrasound for Advancing Anterior Talofibular Ligament Injuries Classification: A Multicenter Model Development and Validation Study.

M$^3$AD: Multi-task Multi-gate Mixture of Experts for Alzheimer's Disease Diagnosis with Conversion Pattern Modeling

LoRA-based methods on Unet for transfer learning in Subarachnoid Hematoma Segmentation

Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation

Ready to Sharpen Your Edge?