Latest Papers on Radiology AI. Tags: Retrospective Clinical, Order: Best Match, Limit: 10.

Impact of Optic Nerve Tortuosity, Globe Proptosis, and Size on Retinal Ganglion Cell Thickness Across General, Glaucoma, and Myopic Populations.

Chiang CYN, Wang X, Gardiner SK, Buist M, Girard MJA

•papers•Jun 2 2025

The purpose of this study was to investigate the impact of optic nerve tortuosity (ONT), and the interaction of globe proptosis and size on retinal ganglion cell (RGC) thickness, using retinal nerve fiber layer (RNFL) thickness, across general, glaucoma, and myopic populations. This study analyzed 17,940 eyes from the UKBiobank cohort (ID 76442), including 72 glaucomatous and 2475 myopic eyes. Artificial intelligence models were developed to derive RNFL thickness corrected for ocular magnification from 3D optical coherence tomography scans and orbit features from 3D magnetic resonance images, including ONT, globe proptosis, axial length, and a novel feature: the interzygomatic line-to-posterior pole (ILPP) distance - a composite marker of globe proptosis and size. Generalized estimating equation (GEE) models evaluated associations between orbital and retinal features. RNFL thickness was positively correlated with ONT and ILPP distance (r = 0.065, P < 0.001 and r = 0.206, P < 0.001, respectively) in the general population. The same was true for glaucoma (r = 0.040, P = 0.74 and r = 0.224, P = 0.059), and for myopia (r = 0.069, P < 0.001 and r = 0.100, P < 0.001). GEE models revealed that straighter optic nerves and shorter ILPP distance were predictive of thinner RNFL in all populations. Straighter optic nerves and decreased ILPP distance could cause RNFL thinning, possibly due to greater traction forces. ILPP distance emerged as a potential biomarker of axonal health. These findings underscore the importance of orbit structures in RGC axonal health and warrant further research into orbit biomechanics.

Mixed Modality Segmentation Other Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

A Deep Learning-Based Artificial Intelligence Model Assisting Thyroid Nodule Diagnosis and Management: Pilot Results for Evaluating Thyroid Malignancy in Pediatric Cohorts.

Ha EJ, Lee JH, Mak N, Duh AK, Tong E, Yeom KW, Meister KD

•papers•Jun 2 2025

Purpose: Artificial intelligence (AI) models have shown promise in predicting malignant thyroid nodules in adults; however, research on deep learning (DL) for pediatric cases is limited. We evaluated the applicability of a DL-based model for assessing thyroid nodules in children. Methods: We retrospectively identified two pediatric cohorts (n = 128; mean age 15.5 ± 2.4 years; 103 girls) who had thyroid nodule ultrasonography (US) with histological confirmation at two institutions. The AI-Thyroid DL model, originally trained on adult data, was tested on pediatric nodules in three scenarios axial US images, longitudinal US images, and both. We conducted a subgroup analysis based on the two pediatric cohorts and age groups (≥14 years vs. < 14 years) and compared the model's performance with radiologist interpretations using the Thyroid Imaging Reporting and Data System (TIRADS). Results: Out of 156 nodules analyzed, 47 (30.1%) were malignant. AI-Thyroid demonstrated respective area under the receiver operating characteristic (AUROC), sensitivity, and specificity values of 0.913-0.929, 78.7-89.4%, and 79.8-91.7%, respectively. The AUROC values did not significantly differ across the image planes (all p > 0.05) and between the two pediatric cohorts (p = 0.804). No significant differences were observed between age groups in terms of sensitivity and specificity (all p > 0.05) while the AUROC values were higher for patients aged <14 years compared to those aged ≥14 years (all p < 0.01). AI-Thyroid yielded the highest AUROC values, followed by ACR-TIRADS and K-TIRADS (p = 0.016 and p < 0.001, respectively). Conclusion: AI-Thyroid demonstrated high performance in diagnosing pediatric thyroid cancer. Future research should focus on optimizing AI-Thyroid for pediatric use and exploring its role alongside tissue sampling in clinical practice.

Ultrasound Classification Abdominal Retrospective Clinical In Silico None Academic Lab

Multi-Organ metabolic profiling with [18F]F-FDG PET/CT predicts pathological response to neoadjuvant immunochemotherapy in resectable NSCLC.

Ma Q, Yang J, Guo X, Mu W, Tang Y, Li J, Hu S

•papers•Jun 2 2025

To develop and validate a novel nomogram combining multi-organ PET metabolic metrics for major pathological response (MPR) prediction in resectable non-small cell lung cancer (rNSCLC) patients receiving neoadjuvant immunochemotherapy. This retrospective cohort included rNSCLC patients who underwent baseline [18F]F-FDG PET/CT prior to neoadjuvant immunochemotherapy at Xiangya Hospital from April 2020 to April 2024. Patients were randomly stratified into training (70%) and validation (30%) cohorts. Using deep learning-based automated segmentation, we quantified metabolic parameters (SUVmean, SUVmax, SUVpeak, MTV, TLG) and their ratio to liver metabolic parameters for primary tumors and nine key organs. Feature selection employed a tripartite approach: univariate analysis, LASSO regression, and random forest optimization. The final multivariable model was translated into a clinically interpretable nomogram, with validation assessing discrimination, calibration, and clinical utility. Among 115 patients (MPR rate: 63.5%, n = 73), five metabolic parameters emerged as predictive biomarkers for MPR: Spleen_SUVmean, Colon_SUVpeak, Spine_TLG, Lesion_TLG, and Spleen-to-Liver SUVmax ratio. The nomogram demonstrated consistent performance across cohorts (training AUC = 0.78 [95%CI 0.67-0.88]; validation AUC = 0.78 [95%CI 0.62-0.94]), with robust calibration and enhanced clinical net benefit on decision curve analysis. Compared to tumor-only parameters, the multi-organ model showed higher specificity (100% vs. 92%) and positive predictive value (100% vs. 90%) in the validation set, maintaining 76% overall accuracy. This first-reported multi-organ metabolic nomogram noninvasively predicts MPR in rNSCLC patients receiving neoadjuvant immunochemotherapy, outperforming conventional tumor-centric approaches. By quantifying systemic host-tumor metabolic crosstalk, this tool could help guide personalized therapeutic decisions while mitigating treatment-related risks, representing a paradigm shift towards precision immuno-oncology management.

PET Segmentation Chest Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Fine-tuned large Language model for extracting newly identified acute brain infarcts based on computed tomography or magnetic resonance imaging reports.

Fujita N, Yasaka K, Kiryu S, Abe O

•papers•Jun 2 2025

This study aimed to develop an automated early warning system using a large language model (LLM) to identify acute to subacute brain infarction from free-text computed tomography (CT) or magnetic resonance imaging (MRI) radiology reports. In this retrospective study, 5,573, 1,883, and 834 patients were included in the training (mean age, 67.5 ± 17.2 years; 2,831 males), validation (mean age, 61.5 ± 18.3 years; 994 males), and test (mean age, 66.5 ± 16.1 years; 488 males) datasets. An LLM (Japanese Bidirectional Encoder Representations from Transformers model) was fine-tuned to classify the CT and MRI reports into three groups (group 0, newly identified acute to subacute infarction; group 1, known acute to subacute infarction or old infarction; group 2, without infarction). The training and validation processes were repeated 15 times, and the best-performing model on the validation dataset was selected to further evaluate its performance on the test dataset. The best fine-tuned model exhibited sensitivities of 0.891, 0.905, and 0.959 for groups 0, 1, and 2, respectively, in the test dataset. The macrosensitivity (the average of sensitivity for all groups) and accuracy were 0.918 and 0.923, respectively. The model's performance in extracting newly identified acute brain infarcts was high, with an area under the receiver operating characteristic curve of 0.979 (95% confidence interval, 0.956-1.000). The average prediction time was 0.115 ± 0.037 s per patient. A fine-tuned LLM could extract newly identified acute to subacute brain infarcts based on CT or MRI findings with high performance.

Mixed Modality LLM Radiology Report Neurological Retrospective Clinical In Silico None Academic Lab GenAI

Decision support using machine learning for predicting adequate bladder filling in prostate radiotherapy: a feasibility study.

Saiyo N, Assawanuwat K, Janthawanno P, Paduka S, Prempetch K, Chanphol T, Sakchatchawan B, Thongsawad S

•papers•Jun 2 2025

This study aimed to develop a model for predicting the bladder volume ratio between daily CBCT and CT to determine adequate bladder filling in patients undergoing treatment for prostate cancer with external beam radiation therapy (EBRT). The model was trained using 465 datasets obtained from 34 prostate cancer patients. A total of 16 features were collected as input data, which included basic patient information, patient health status, blood examination laboratory results, and specific radiation therapy information. The ratio of the bladder volume between daily CBCT (dCBCT) and planning CT (pCT) was used as the model response. The model was trained using a bootstrap aggregation (bagging) algorithm with two machine learning (ML) approaches: classification and regression. The model accuracy was validated using other 93 datasets. For the regression approach, the accuracy of the model was evaluated based on the root mean square error (RMSE) and mean absolute error (MAE). By contrast, the model performance of the classification approach was assessed using sensitivity, specificity, and accuracy scores. The ML model showed promising results in the prediction of the bladder volume ratio between dCBCT and pCT, with an RMSE of 0.244 and MAE of 0.172 for the regression approach, sensitivity of 95.24%, specificity of 92.16%, and accuracy of 93.55% for the classification approach. The prediction model could potentially help the radiological technologist determine whether the bladder is full before treatment, thereby reducing the requirement for re-scan CBCT. HIGHLIGHTS: The bagging model demonstrates strong performance in predicting optimal bladder filling. The model achieves promising results with 95.24% sensitivity and 92.16% specificity. It supports therapists in assessing bladder fullness prior to treatment. It helps reduce the risk of requiring repeat CBCT scans.

CT Classification Abdominal Retrospective Clinical In Silico None Academic Lab

Performance Comparison of Machine Learning Using Radiomic Features and CNN-Based Deep Learning in Benign and Malignant Classification of Vertebral Compression Fractures Using CT Scans.

Yeom JC, Park SH, Kim YJ, Ahn TR, Kim KG

•papers•Jun 2 2025

Distinguishing benign from malignant vertebral compression fractures is critical for clinical management but remains challenging on contrast-enhanced abdominal CT, which lacks the soft tissue contrast of MRI. This study evaluates and compares radiomic feature-based machine learning and convolutional neural network-based deep learning models for classifying VCFs using abdominal CT. A retrospective cohort of 447 vertebral compression fractures (196 benign, 251 malignant) from 286 patients was analyzed. Radiomic features were extracted using PyRadiomics, with Recursive Feature Elimination selecting six key texture-based features (e.g., Run Variance, Dependence Non-Uniformity Normalized), highlighting textural heterogeneity as a malignancy marker. Machine learning models (XGBoost, SVM, KNN, Random Forest) and a 3D CNN were trained on CT data, with performance assessed via precision, recall, F1 score, accuracy, and AUC. The deep learning model achieved marginally superior overall performance, with a statistically significant higher AUC (77.66% vs. 75.91%, p < 0.05) and better precision, F1 score, and accuracy compared to the top-performing machine learning model (XGBoost). Deep learning's attention maps localized diagnostically relevant regions, mimicking radiologists' focus, whereas radiomics lacked spatial interpretability despite offering quantifiable biomarkers. This study underscores the complementary strengths of machine learning and deep learning: radiomics provides interpretable features tied to tumor heterogeneity, while DL autonomously extracts high-dimensional patterns with spatial explainability. Integrating both approaches could enhance diagnostic accuracy and clinician trust in abdominal CT-based VCF assessment. Limitations include retrospective single-center data and potential selection bias. Future multi-center studies with diverse protocols and histopathological validation are warranted to generalize these findings.

CT Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Multicycle Dosimetric Behavior and Dose-Effect Relationships in [177Lu]Lu-DOTATATE Peptide Receptor Radionuclide Therapy.

Kayal G, Roseland ME, Wang C, Fitzpatrick K, Mirando D, Suresh K, Wong KK, Dewaraja YK

•papers•Jun 2 2025

We investigated pharmacokinetics, dosimetric patterns, and absorbed dose (AD)-effect correlations in [177Lu]Lu-DOTATATE peptide receptor radionuclide therapy (PRRT) for metastatic neuroendocrine tumors (NETs) to develop strategies for future personalized dosimetry-guided treatments. Methods: Patients treated with standard [177Lu]Lu-DOTATATE PRRT were recruited for serial SPECT/CT imaging. Kidneys were segmented on CT using a deep learning algorithm, and tumors were segmented at each cycle using a SPECT gradient-based tool, guided by radiologist-defined contours on baseline CT/MRI. Dosimetry was performed using an automated workflow that included contour intensity-based SPECT-SPECT registration, generation of Monte Carlo dose-rate maps, and dose-rate fitting. Lesion-level response at first follow-up was evaluated using both radiologic (RECIST and modified RECIST) and [68Ga]Ga-DOTATATE PET-based criteria. Kidney toxicity was evaluated based on the estimated glomerular filtration rate (eGFR) at 9 mo after PRRT. Results: Dosimetry was performed after cycle 1 in 30 patients and after all cycles in 22 of 30 patients who completed SPECT/CT imaging after each cycle. Median cumulative tumor (n = 78) AD was 2.2 Gy/GBq (range, 0.1-20.8 Gy/GBq), whereas median kidney AD was 0.44 Gy/GBq (range, 0.25-0.96 Gy/GBq). The tumor-to-kidney AD ratio decreased with each cycle (median, 6.4, 5.7, 4.7, and 3.9 for cycles 1-4) because of a decrease in tumor AD, while kidney AD remained relatively constant. Higher-grade (grade 2) and pancreatic NETs showed a significantly larger drop in AD with each cycle, as well as significantly lower AD and effective half-life (Teff), than did low-grade (grade 1) and small intestinal NETs, respectively. Teff remained relatively constant with each cycle for both tumors and kidneys. Kidney Teff and AD were significantly higher in patients with low eGFR than in those with high eGFR. Tumor AD was not significantly associated with response measures. There was no nephrotoxicity higher than grade 2; however, a significant negative association was found in univariate analyses between eGFR at 9 mo and AD to the kidney, which improved in a multivariable model that also adjusted for baseline eGFR (cycle 1 AD, P = 0.020, adjusted R 2 = 0.57; cumulative AD, P = 0.049, adjusted R 2 = 0.65). The association between percentage change in eGFR and AD to the kidney was also significant in univariate analysis and after adjusting for baseline eGFR (cycle 1 AD, P = 0.006, adjusted R 2 = 0.21; cumulative AD, P = 0.019, adjusted R 2 = 0.21). Conclusion: The dosimetric behavior we report over different cycles and for different NET subgroups can be considered when optimizing PRRT to individual patients. The models we present for the relationship between eGFR and AD have potential for clinical use in predicting renal function early in the treatment course. Furthermore, reported pharmacokinetics for patient subgroups allow more appropriate selection of population parameters to be used in protocols with fewer imaging time points that facilitate more widespread adoption of dosimetry.

Mixed Modality Segmentation Abdominal Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Referenceless 4D Flow Cardiovascular Magnetic Resonance with deep learning.

Trenti C, Ylipää E, Ebbers T, Carlhäll CJ, Engvall J, Dyverfeldt P

•papers•Jun 2 2025

Despite its potential to improve the assessment of cardiovascular diseases, 4D Flow CMR is hampered by long scan times. 4D Flow CMR is conventionally acquired with three motion encodings and one reference encoding, as the 3-dimensional velocity data are obtained by subtracting the phase of the reference from the phase of the motion encodings. In this study, we aim to use deep learning to predict the reference encoding from the three motion encodings for cardiovascular 4D Flow. A U-Net was trained with adversarial learning (U-NetADV) and with a velocity frequency-weighted loss function (U-NetVEL) to predict the reference encoding from the three motion encodings obtained with a non-symmetric velocity-encoding scheme. Whole-heart 4D Flow datasets from 126 patients with different types of cardiomyopathies were retrospectively included. The models were trained on 113 patients with a 5-fold cross-validation, and tested on 13 patients. Flow volumes in the aorta and pulmonary artery, mean and maximum velocity, total and maximum turbulent kinetic energy at peak systole in the cardiac chambers and main vessels were assessed. 3-dimensional velocity data reconstructed with the reference encoding predicted by deep learning agreed well with the velocities obtained with the reference encoding acquired at the scanner for both models. U-NetADV performed more consistently throughout the cardiac cycle and across the test subjects, while U-NetVEL performed better for systolic velocities. Comprehensively, the largest error for flow volumes, maximum and mean velocities was -6.031% for maximum velocities in the right ventricle for the U-NetADV, and -6.92% for mean velocities in the right ventricle for U-NetVEL. For total turbulent kinetic energy, the highest errors were in the left ventricle (-77.17%) for the U-NetADV, and in the right ventricle (24.96%) for the U-NetVEL, while for maximum turbulent kinetic energy were in the pulmonary artery for both models, with a value of -15.5% for U-NetADV and 15.38% for the U-NetVEL. Deep learning-enabled referenceless 4D Flow CMR permits velocities and flow volumes quantification comparable to conventional 4D Flow. Omitting the reference encoding reduces the amount of acquired data by 25%, thus allowing shorter scan times or improved resolution, which is valuable for utilization in the clinical routine.

MRI Reconstruction Cardiac Retrospective Clinical In Silico None Academic Lab

Validation of a Dynamic Risk Prediction Model Incorporating Prior Mammograms in a Diverse Population.

Jiang S, Bennett DL, Colditz GA

•papers•Jun 2 2025

For breast cancer risk prediction to be clinically useful, it must be accurate and applicable to diverse groups of women across multiple settings. To examine whether a dynamic risk prediction model incorporating prior mammograms, previously validated in Black and White women, could predict future risk of breast cancer across a racially and ethnically diverse population in a population-based screening program. This prognostic study included women aged 40 to 74 years with 1 or more screening mammograms drawn from the British Columbia Breast Screening Program from January 1, 2013, to December 31, 2019, with follow-up via linkage to the British Columbia Cancer Registry through June 2023. This provincial, organized screening program offers screening mammography with full field digital mammography (FFDM) every 2 years. Data were analyzed from May to August 2024. FFDM-based, artificial intelligence-generated mammogram risk score (MRS), including up to 4 years of prior mammograms. The primary outcomes were 5-year risk of breast cancer (measured with the area under the receiver operating characteristic curve [AUROC]) and absolute risk of breast cancer calibrated to the US Surveillance, Epidemiology, and End Results incidence rates. Among 206 929 women (mean [SD] age, 56.1 [9.7] years; of 118 093 with data on race, there were 34 266 East Asian; 1946 Indigenous; 6116 South Asian; and 66 742 White women), there were 4168 pathology-confirmed incident breast cancers diagnosed through June 2023. Mean (SD) follow-up time was 5.3 (3.0) years. Using up to 4 years of prior mammogram images in addition to the most current mammogram, a 5-year AUROC of 0.78 (95% CI, 0.77-0.80) was obtained based on analysis of images alone. Performance was consistent across subgroups defined by race and ethnicity in East Asian (AUROC, 0.77; 95% CI, 0.75-0.79), Indigenous (AUROC, 0.77; 95% CI 0.71-0.83), and South Asian (AUROC, 0.75; 95% CI 0.71-0.79) women. Stratification by age gave a 5-year AUROC of 0.76 (95% CI, 0.74-0.78) for women aged 50 years or younger and 0.80 (95% CI, 0.78-0.82) for women older than 50 years. There were 18 839 participants (9.0%) with a 5-year risk greater than 3%, and the positive predictive value was 4.9% with an incidence of 11.8 per 1000 person-years. A dynamic MRS generated from both current and prior mammograms showed robust performance across diverse racial and ethnic populations in a province-wide screening program starting from age 40 years, reflecting improved accuracy for racially and ethnically diverse populations.

Mammography Classification Breast Retrospective Clinical In Silico None Academic Lab

Evaluating the performance and potential bias of predictive models for the detection of transthyretin cardiac amyloidosis

Hourmozdi, J., Easton, N., Benigeri, S., Thomas, J. D., Narang, A., Ouyang, D., Duffy, G., Upton, R., Hawkes, W., Akerman, A., Okwuosa, I., Kline, A., Kho, A. N., Luo, Y., Shah, S. J., Ahmad, F. S.

•preprint•Jun 2 2025

BackgroundDelays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of disease-modifying therapies. Screening for ATTR-CM with AI and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared. ObjectivesThe aim of this study was to compare the performance of four algorithms for ATTR-CM detection in a heart failure population and assess the risk for harms due to model bias. MethodsWe identified patients in an integrated health system from 2010-2022 with ATTR-CM and age- and sex-matched them to controls with heart failure to target 5% prevalence. We compared the performance of a claims-based random forest model (Huda et al. model), a regression-based score (Mayo ATTR-CM), and two deep learning echo models (EchoNet-LVH and EchoGo(R) Amyloidosis). We evaluated for bias using standard fairness metrics. ResultsThe analytical cohort included 176 confirmed cases of ATTR-CM and 3192 control patients with 79.2% self-identified as White and 9.0% as Black. The Huda et al. model performed poorly (AUC 0.49). Both deep learning echo models had a higher AUC when compared to the Mayo ATTR-CM Score (EchoNet-LVH 0.88; EchoGo Amyloidosis 0.92; Mayo ATTR-CM Score 0.79; DeLong P<0.001 for both). Bias auditing met fairness criteria for equal opportunity among patients who identified as Black. ConclusionsDeep learning, echo-based models to detect ATTR-CM demonstrated best overall discrimination when compared to two other models in external validation with low risk of harms due to racial bias.

Ultrasound Classification Cardiac Retrospective Clinical In Silico None Academic Lab Benchmark SOTA Ethics

Impact of Optic Nerve Tortuosity, Globe Proptosis, and Size on Retinal Ganglion Cell Thickness Across General, Glaucoma, and Myopic Populations.

A Deep Learning-Based Artificial Intelligence Model Assisting Thyroid Nodule Diagnosis and Management: Pilot Results for Evaluating Thyroid Malignancy in Pediatric Cohorts.

Multi-Organ metabolic profiling with [<sup>18</sup>F]F-FDG PET/CT predicts pathological response to neoadjuvant immunochemotherapy in resectable NSCLC.

Fine-tuned large Language model for extracting newly identified acute brain infarcts based on computed tomography or magnetic resonance imaging reports.

Decision support using machine learning for predicting adequate bladder filling in prostate radiotherapy: a feasibility study.

Performance Comparison of Machine Learning Using Radiomic Features and CNN-Based Deep Learning in Benign and Malignant Classification of Vertebral Compression Fractures Using CT Scans.

Multicycle Dosimetric Behavior and Dose-Effect Relationships in [<sup>177</sup>Lu]Lu-DOTATATE Peptide Receptor Radionuclide Therapy.

Referenceless 4D Flow Cardiovascular Magnetic Resonance with deep learning.

Validation of a Dynamic Risk Prediction Model Incorporating Prior Mammograms in a Diverse Population.

Evaluating the performance and potential bias of predictive models for the detection of transthyretin cardiac amyloidosis

Ready to Sharpen Your Edge?