Latest Papers on Radiology AI. Tags: Benchmark SOTA

Evaluating the performance and potential bias of predictive models for the detection of transthyretin cardiac amyloidosis

Hourmozdi, J., Easton, N., Benigeri, S., Thomas, J. D., Narang, A., Ouyang, D., Duffy, G., Upton, R., Hawkes, W., Akerman, A., Okwuosa, I., Kline, A., Kho, A. N., Luo, Y., Shah, S. J., Ahmad, F. S.

•preprint•Jun 2 2025

BackgroundDelays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of disease-modifying therapies. Screening for ATTR-CM with AI and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared. ObjectivesThe aim of this study was to compare the performance of four algorithms for ATTR-CM detection in a heart failure population and assess the risk for harms due to model bias. MethodsWe identified patients in an integrated health system from 2010-2022 with ATTR-CM and age- and sex-matched them to controls with heart failure to target 5% prevalence. We compared the performance of a claims-based random forest model (Huda et al. model), a regression-based score (Mayo ATTR-CM), and two deep learning echo models (EchoNet-LVH and EchoGo(R) Amyloidosis). We evaluated for bias using standard fairness metrics. ResultsThe analytical cohort included 176 confirmed cases of ATTR-CM and 3192 control patients with 79.2% self-identified as White and 9.0% as Black. The Huda et al. model performed poorly (AUC 0.49). Both deep learning echo models had a higher AUC when compared to the Mayo ATTR-CM Score (EchoNet-LVH 0.88; EchoGo Amyloidosis 0.92; Mayo ATTR-CM Score 0.79; DeLong P<0.001 for both). Bias auditing met fairness criteria for equal opportunity among patients who identified as Black. ConclusionsDeep learning, echo-based models to detect ATTR-CM demonstrated best overall discrimination when compared to two other models in external validation with low risk of harms due to racial bias.

Ultrasound Classification Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA Ethics

Current trends in glioma tumor segmentation: A survey of deep learning modules.

Shoushtari FK, Elahi R, Valizadeh G, Moodi F, Salari HM, Rad HS

•papers•Jun 2 2025

Multiparametric Magnetic Resonance Imaging (mpMRI) is the gold standard for diagnosing brain tumors, especially gliomas, which are difficult to segment due to their heterogeneity and varied sub-regions. While manual segmentation is time-consuming and error-prone, Deep Learning (DL) automates the process with greater accuracy and speed. We conducted ablation studies on surveyed articles to evaluate the impact of "add-on" modules-addressing challenges like spatial information loss, class imbalance, and overfitting-on glioma segmentation performance. Advanced modules-such as atrous (dilated) convolutions, inception, attention, transformer, and hybrid modules-significantly enhance segmentation accuracy, efficiency, multiscale feature extraction, and boundary delineation, while lightweight modules reduce computational complexity. Experiments on the Brain Tumor Segmentation (BraTS) dataset (comprising low- and high-grade gliomas) confirm their robustness, with top-performing models achieving high Dice score for tumor sub-regions. This survey underscores the need for optimal module selection and placement to balance speed, accuracy, and interpretability in glioma segmentation. Future work should focus on improving model interpretability, lowering computational costs, and boosting generalizability. Tools like NeuroQuant® and Raidionics demonstrate potential for clinical translation. Further refinement could enable regulatory approval, advancing precision in brain tumor diagnosis and treatment planning.

MRI Segmentation Neurological Review In Silico Academic Lab Benchmark SOTA GenAI

Radiogenomics and Radiomics of Skull Base Chordoma: Classification of Novel Radiomic Subgroups and Prediction of Genetic Signatures and Clinical Outcomes.

Gersey ZC, Zenkin S, Mamindla P, Amjadzadeh M, Ak M, Plute T, Peddagangireddy V, Abdallah H, Muthiah N, Wang EW, Snyderman C, Gardner PA, Colen RR, Zenonos GA

•papers•Jun 2 2025

Chordomas are rare, aggressive tumors of notochordal origin, commonly affecting the spine and skull base. Skull Base Chordomas (SBCs) comprise approximately 39% of cases, with an incidence of less than 1 per million annually in the U.S. Prognosis remains poor due to resistance to chemotherapy, often requiring extensive surgical resection and adjuvant radiotherapy. Current classification methods based on chromosomal deletions are invasive and costly, presenting a need for alternative diagnostic tools. Radiomics allows for non-invasive SBC diagnosis and treatment planning. We developed and validated radiomic-based models using MRI data to predict Overall Survival (OS) and Progression-Free Survival following Surgery (PFSS) in SBC patients. Machine learning classifiers, including eXtreme Gradient Boosting (XGBoost), were employed along with feature selection techniques. Unsupervised clustering identified radiomic-based subgroups, which were correlated with chromosomal deletions and clinical outcomes. Our XGBoost model demonstrated superior predictive performance, achieving an area under the curve (AUC) of 83.33% for OS and 80.36% for PFSS, outperforming other classifiers. Radiomic clustering revealed two SBC groups with differing survival and molecular characteristics, strongly correlating with chromosomal deletion profiles. These findings indicate that radiomics can non-invasively characterize SBC phenotypes and stratify patients by prognosis. Radiomics shows promise as a reliable, non-invasive tool for the prognostication and classification of SBCs, minimizing the need for invasive genetic testing and supporting personalized treatment strategies.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Robust Uncertainty-Informed Glaucoma Classification Under Data Shift.

Rashidisabet H, Chan RVP, Leiderman YI, Vajaranant TS, Yi D

•papers•Jun 2 2025

Standard deep learning (DL) models often suffer significant performance degradation on out-of-distribution (OOD) data, where test data differs from training data, a common challenge in medical imaging due to real-world variations. We propose a unified self-censorship framework as an alternative to the standard DL models for glaucoma classification using deep evidential uncertainty quantification. Our approach detects OOD samples at both the dataset and image levels. Dataset-level self-censorship enables users to accept or reject predictions for an entire new dataset based on model uncertainty, whereas image-level self-censorship refrains from making predictions on individual OOD images rather than risking incorrect classifications. We validated our approach across diverse datasets. Our dataset-level self-censorship method outperforms the standard DL model in OOD detection, achieving an average 11.93% higher area under the curve (AUC) across 14 OOD datasets. Similarly, our image-level self-censorship model improves glaucoma classification accuracy by an average of 17.22% across 4 external glaucoma datasets against baselines while censoring 28.25% more data. Our approach addresses the challenge of generalization in standard DL models for glaucoma classification across diverse datasets by selectively withholding predictions when the model is uncertain. This method reduces misclassification errors compared to state-of-the-art baselines, particularly for OOD cases. This study introduces a tunable framework that explores the trade-off between prediction accuracy and data retention in glaucoma prediction. By managing uncertainty in model outputs, the approach lays a foundation for future decision support tools aimed at improving the reliability of automated glaucoma diagnosis.

OCT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multicycle Dosimetric Behavior and Dose-Effect Relationships in [177Lu]Lu-DOTATATE Peptide Receptor Radionuclide Therapy.

Kayal G, Roseland ME, Wang C, Fitzpatrick K, Mirando D, Suresh K, Wong KK, Dewaraja YK

•papers•Jun 2 2025

We investigated pharmacokinetics, dosimetric patterns, and absorbed dose (AD)-effect correlations in [177Lu]Lu-DOTATATE peptide receptor radionuclide therapy (PRRT) for metastatic neuroendocrine tumors (NETs) to develop strategies for future personalized dosimetry-guided treatments. Methods: Patients treated with standard [177Lu]Lu-DOTATATE PRRT were recruited for serial SPECT/CT imaging. Kidneys were segmented on CT using a deep learning algorithm, and tumors were segmented at each cycle using a SPECT gradient-based tool, guided by radiologist-defined contours on baseline CT/MRI. Dosimetry was performed using an automated workflow that included contour intensity-based SPECT-SPECT registration, generation of Monte Carlo dose-rate maps, and dose-rate fitting. Lesion-level response at first follow-up was evaluated using both radiologic (RECIST and modified RECIST) and [68Ga]Ga-DOTATATE PET-based criteria. Kidney toxicity was evaluated based on the estimated glomerular filtration rate (eGFR) at 9 mo after PRRT. Results: Dosimetry was performed after cycle 1 in 30 patients and after all cycles in 22 of 30 patients who completed SPECT/CT imaging after each cycle. Median cumulative tumor (n = 78) AD was 2.2 Gy/GBq (range, 0.1-20.8 Gy/GBq), whereas median kidney AD was 0.44 Gy/GBq (range, 0.25-0.96 Gy/GBq). The tumor-to-kidney AD ratio decreased with each cycle (median, 6.4, 5.7, 4.7, and 3.9 for cycles 1-4) because of a decrease in tumor AD, while kidney AD remained relatively constant. Higher-grade (grade 2) and pancreatic NETs showed a significantly larger drop in AD with each cycle, as well as significantly lower AD and effective half-life (Teff), than did low-grade (grade 1) and small intestinal NETs, respectively. Teff remained relatively constant with each cycle for both tumors and kidneys. Kidney Teff and AD were significantly higher in patients with low eGFR than in those with high eGFR. Tumor AD was not significantly associated with response measures. There was no nephrotoxicity higher than grade 2; however, a significant negative association was found in univariate analyses between eGFR at 9 mo and AD to the kidney, which improved in a multivariable model that also adjusted for baseline eGFR (cycle 1 AD, P = 0.020, adjusted R 2 = 0.57; cumulative AD, P = 0.049, adjusted R 2 = 0.65). The association between percentage change in eGFR and AD to the kidney was also significant in univariate analysis and after adjusting for baseline eGFR (cycle 1 AD, P = 0.006, adjusted R 2 = 0.21; cumulative AD, P = 0.019, adjusted R 2 = 0.21). Conclusion: The dosimetric behavior we report over different cycles and for different NET subgroups can be considered when optimizing PRRT to individual patients. The models we present for the relationship between eGFR and AD have potential for clinical use in predicting renal function early in the treatment course. Furthermore, reported pharmacokinetics for patient subgroups allow more appropriate selection of population parameters to be used in protocols with fewer imaging time points that facilitate more widespread adoption of dosimetry.

Mixed Modality Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Impact of Optic Nerve Tortuosity, Globe Proptosis, and Size on Retinal Ganglion Cell Thickness Across General, Glaucoma, and Myopic Populations.

Chiang CYN, Wang X, Gardiner SK, Buist M, Girard MJA

•papers•Jun 2 2025

The purpose of this study was to investigate the impact of optic nerve tortuosity (ONT), and the interaction of globe proptosis and size on retinal ganglion cell (RGC) thickness, using retinal nerve fiber layer (RNFL) thickness, across general, glaucoma, and myopic populations. This study analyzed 17,940 eyes from the UKBiobank cohort (ID 76442), including 72 glaucomatous and 2475 myopic eyes. Artificial intelligence models were developed to derive RNFL thickness corrected for ocular magnification from 3D optical coherence tomography scans and orbit features from 3D magnetic resonance images, including ONT, globe proptosis, axial length, and a novel feature: the interzygomatic line-to-posterior pole (ILPP) distance - a composite marker of globe proptosis and size. Generalized estimating equation (GEE) models evaluated associations between orbital and retinal features. RNFL thickness was positively correlated with ONT and ILPP distance (r = 0.065, P < 0.001 and r = 0.206, P < 0.001, respectively) in the general population. The same was true for glaucoma (r = 0.040, P = 0.74 and r = 0.224, P = 0.059), and for myopia (r = 0.069, P < 0.001 and r = 0.100, P < 0.001). GEE models revealed that straighter optic nerves and shorter ILPP distance were predictive of thinner RNFL in all populations. Straighter optic nerves and decreased ILPP distance could cause RNFL thinning, possibly due to greater traction forces. ILPP distance emerged as a potential biomarker of axonal health. These findings underscore the importance of orbit structures in RGC axonal health and warrant further research into orbit biomechanics.

Mixed Modality Segmentation Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Slim UNETR++: A lightweight 3D medical image segmentation network for medical image analysis.

Jin J, Yang S, Tong J, Zhang K, Wang Z

•papers•Jun 2 2025

Convolutional neural network (CNN) models, such as U-Net, V-Net, and DeepLab, have achieved remarkable results across various medical imaging modalities, and ultrasound. Additionally, hybrid Transformer-based segmentation methods have shown great potential in medical image analysis. Despite the breakthroughs in feature extraction through self-attention mechanisms, these methods are computationally intensive, especially for three-dimensional medical imaging, posing significant challenges to graphics processing unit (GPU) hardware. Consequently, the demand for lightweight models is increasing. To address this issue, we designed a high-accuracy yet lightweight model that combines the strengths of CNNs and Transformers. We introduce Slim UNEt TRansformers++ (Slim UNETR++), which builds upon Slim UNETR by incorporating Medical ConvNeXt (MedNeXt), Spatial-Channel Attention (SCA), and Efficient Paired-Attention (EPA) modules. This integration leverages the advantages of both CNN and Transformer architectures to enhance model accuracy. The core component of Slim UNETR++ is the Slim UNETR++ block, which facilitates efficient information exchange through a sparse self-attention mechanism and low-cost representation aggregation. We also introduced throughput as a performance metric to quantify data processing speed. Experimental results demonstrate that Slim UNETR++ outperforms other models in terms of accuracy and model size. On the BraTS2021 dataset, Slim UNETR++ achieved a Dice accuracy of 93.12% and a 95% Hausdorff distance (HD95) of 4.23mm, significantly surpassing mainstream relevant methods such as Swin UNETR.

MRI Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA

Multi-Organ metabolic profiling with [18F]F-FDG PET/CT predicts pathological response to neoadjuvant immunochemotherapy in resectable NSCLC.

Ma Q, Yang J, Guo X, Mu W, Tang Y, Li J, Hu S

•papers•Jun 2 2025

To develop and validate a novel nomogram combining multi-organ PET metabolic metrics for major pathological response (MPR) prediction in resectable non-small cell lung cancer (rNSCLC) patients receiving neoadjuvant immunochemotherapy. This retrospective cohort included rNSCLC patients who underwent baseline [18F]F-FDG PET/CT prior to neoadjuvant immunochemotherapy at Xiangya Hospital from April 2020 to April 2024. Patients were randomly stratified into training (70%) and validation (30%) cohorts. Using deep learning-based automated segmentation, we quantified metabolic parameters (SUVmean, SUVmax, SUVpeak, MTV, TLG) and their ratio to liver metabolic parameters for primary tumors and nine key organs. Feature selection employed a tripartite approach: univariate analysis, LASSO regression, and random forest optimization. The final multivariable model was translated into a clinically interpretable nomogram, with validation assessing discrimination, calibration, and clinical utility. Among 115 patients (MPR rate: 63.5%, n = 73), five metabolic parameters emerged as predictive biomarkers for MPR: Spleen_SUVmean, Colon_SUVpeak, Spine_TLG, Lesion_TLG, and Spleen-to-Liver SUVmax ratio. The nomogram demonstrated consistent performance across cohorts (training AUC = 0.78 [95%CI 0.67-0.88]; validation AUC = 0.78 [95%CI 0.62-0.94]), with robust calibration and enhanced clinical net benefit on decision curve analysis. Compared to tumor-only parameters, the multi-organ model showed higher specificity (100% vs. 92%) and positive predictive value (100% vs. 90%) in the validation set, maintaining 76% overall accuracy. This first-reported multi-organ metabolic nomogram noninvasively predicts MPR in rNSCLC patients receiving neoadjuvant immunochemotherapy, outperforming conventional tumor-centric approaches. By quantifying systemic host-tumor metabolic crosstalk, this tool could help guide personalized therapeutic decisions while mitigating treatment-related risks, representing a paradigm shift towards precision immuno-oncology management.

PET Segmentation Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Diagnostic Performance of ChatGPT-4o in Detecting Hip Fractures on Pelvic X-rays.

Erdem TE, Kirilmaz A, Kekec AF

•papers•Jun 1 2025

Hip fractures are a major orthopedic problem, especially in the elderly population. Hip fractures are usually diagnosed by clinical evaluation and imaging, especially X-rays. In recent years, new approaches to fracture detection have emerged with the use of artificial intelligence (AI) and deep learning techniques in medical imaging. In this study, we aimed to evaluate the diagnostic performance of ChatGPT-4o, an artificial intelligence model, in diagnosing hip fractures. A total of 200 anteroposterior pelvic X-ray images were retrospectively analyzed. Half of the images belonged to patients with surgically confirmed hip fractures, including both displaced and non-displaced types, while the other half represented patients with soft tissue trauma and no fractures. Each image was evaluated by ChatGPT-4o through a standardized prompt, and its predictions (fracture vs. no fracture) were compared against the gold standard diagnoses. Diagnostic performance metrics such as sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), receiver operating characteristic (ROC) curve, Cohen's kappa, and F1 score were calculated. ChatGPT-4o demonstrated an overall accuracy of 82.5% in detecting hip fractures on pelvic radiographs, with a sensitivity of 78.0% and specificity of 87.0%. PPVs and NPVs were 85.7% and 79.8%, respectively. The area under the ROC curve (AUC) was 0.825, indicating good discriminative performance. Among 22 false-negative cases, 68.2% were non-displaced fractures, suggesting the model had greater difficulty identifying subtle radiographic findings. Cohen's kappa coefficient was 0.65, showing substantial agreement with actual diagnoses. Chi-square analysis revealed a strong correlation (χ² = 82.59, P < 0.001), while McNemar's test (P = 0.176) showed no significant asymmetry in error distribution. ChatGPT-4o shows promising accuracy in identifying hip fractures on pelvic X-rays, especially when fractures are displaced. However, its sensitivity drops significantly for non-displaced fractures, leading to many false negatives. This highlights the need for caution when interpreting negative AI results, particularly when clinical suspicion remains high. While not a replacement for expert assessment, ChatGPT-4o may assist in settings with limited specialist access.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Integration of Deep Learning and Sub-regional Radiomics Improves the Prediction of Pathological Complete Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer Patients.

Wu X, Wang J, Chen C, Cai W, Guo Y, Guo K, Chen Y, Shi Y, Chen J, Lin X, Jiang X

•papers•Jun 1 2025

The precise prediction of response to neoadjuvant chemoradiotherapy is crucial for tailoring perioperative treatment in patients diagnosed with locally advanced rectal cancer (LARC). This retrospective study aims to develop and validate a model that integrates deep learning and sub-regional radiomics from MRI imaging to predict pathological complete response (pCR) in patients with LARC. We retrospectively enrolled 768 eligible participants from three independent hospitals who had received neoadjuvant chemoradiotherapy followed by radical surgery. Pretreatment pelvic MRI scans (T2-weighted), were collected for annotation and feature extraction. The K-means approach was used to segment the tumor into sub-regions. Radiomics and deep learning features were extracted by the Pyradiomics and 3D ResNet50, respectively. The predictive models were developed using the radiomics, sub-regional radiomics, and deep learning features with the machine learning algorithm in training cohort, and then validated in the external tests. The models' performance was assessed using various metrics, including the area under the curve (AUC), decision curve analysis, Kaplan-Meier survival analysis. We constructed a combined model, named SRADL, which includes deep learning with sub-regional radiomics signatures, enabling precise prediction of pCR in LARC patients. SRADL had satisfactory performance for the prediction of pCR in the training cohort (AUC 0.925 [95% CI 0.894 to 0.948]), and in test 1 (AUC 0.915 [95% CI 0.869 to 0.949]) and in test 2 (AUC 0.902 [95% CI 0.846 to 0.945]). By employing optimal threshold of 0.486, the predicted pCR group had longer survival compared to predicted non-pCR group across three cohorts. SRADL also outperformed other single-modality prediction models. The novel SRADL, which integrates deep learning with sub-regional signatures, showed high accuracy and robustness in predicting pCR to neoadjuvant chemoradiotherapy using pretreatment MRI images, making it a promising tool for the personalized management of LARC.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Evaluating the performance and potential bias of predictive models for the detection of transthyretin cardiac amyloidosis

Current trends in glioma tumor segmentation: A survey of deep learning modules.

Radiogenomics and Radiomics of Skull Base Chordoma: Classification of Novel Radiomic Subgroups and Prediction of Genetic Signatures and Clinical Outcomes.

Robust Uncertainty-Informed Glaucoma Classification Under Data Shift.

Multicycle Dosimetric Behavior and Dose-Effect Relationships in [<sup>177</sup>Lu]Lu-DOTATATE Peptide Receptor Radionuclide Therapy.

Impact of Optic Nerve Tortuosity, Globe Proptosis, and Size on Retinal Ganglion Cell Thickness Across General, Glaucoma, and Myopic Populations.

Slim UNETR++: A lightweight 3D medical image segmentation network for medical image analysis.

Multi-Organ metabolic profiling with [<sup>18</sup>F]F-FDG PET/CT predicts pathological response to neoadjuvant immunochemotherapy in resectable NSCLC.

Diagnostic Performance of ChatGPT-4o in Detecting Hip Fractures on Pelvic X-rays.

Integration of Deep Learning and Sub-regional Radiomics Improves the Prediction of Pathological Complete Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer Patients.

Ready to Sharpen Your Edge?