Latest Papers on Radiology AI. Category: papers, Sources: pubmed, Order: Best Match, Limit: 10.

Predicting progression-free survival in sarcoma using MRI-based automatic segmentation models and radiomics nomograms: a preliminary multicenter study.

Zhu N, Niu F, Fan S, Meng X, Hu Y, Han J, Wang Z

•papers•Jul 1 2025

Some sarcomas are highly malignant, associated with high recurrence despite treatment. This multicenter study aimed to develop and validate a radiomics signature to estimate sarcoma progression-free survival (PFS). The study retrospectively enrolled 202 consecutive patients with pathologically diagnosed sarcoma, who had pre-treatment axial fat-suppressed T2-weighted images (FS-T2WI), and included them in the ROI-Net model for training. Among them, 120 patients were included in the radiomics analysis, all of whom had pre-treatment axial T1-weighted and transverse FS-T2WI images, and were randomly divided into a development group (n = 96) and a validation group (n = 24). In the development cohort, Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression was used to develop the radiomics features for PFS prediction. By combining significant clinical features with radiomics features, a nomogram was constructed using Cox regression. The proposed ROI-Net framework achieved a Dice coefficient of 0.820 (0.791-0.848). The radiomics signature based on 21 features could distinguish high-risk patients with poor PFS. Univariate Cox analysis revealed that peritumoral edema, metastases, and the radiomics score were associated with poor PFS and were included in the construction of the nomogram. The Radiomics-T1WI-Clinical model exhibited the best performance, with AUC values of 0.947, 0.907, and 0.924 at 300 days, 600 days, and 900 days, respectively. The proposed ROI-Net framework demonstrated high consistency between its segmentation results and expert annotations. The radiomics features and the combined nomogram have the potential to aid in predicting PFS for patients with sarcoma.

MRI Segmentation Musculoskeletal Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Implementing an AI algorithm in the clinical setting: a case study for the accuracy paradox.

Scaringi JA, McTaggart RA, Alvin MD, Atalay M, Bernstein MH, Jayaraman MV, Jindal G, Movson JS, Swenson DW, Baird GL

•papers•Jul 1 2025

We report our experience implementing an algorithm for the detection of large vessel occlusion (LVO) for suspected stroke in the emergency setting, including its performance, and offer an explanation as to why it was poorly received by radiologists. An algorithm was deployed in the emergency room at a single tertiary care hospital for the detection of LVO on CT angiography (CTA) between September 1st-27th, 2021. A retrospective analysis of the algorithm's accuracy was performed. During the study period, 48 patients underwent CTA examination in the emergency department to evaluate for emergent LVO, with 2 positive cases (60.3 years ± 18.2; 32 women). The LVO algorithm demonstrated a sensitivity and specificity of 100% and 92%, respectively. While the sensitivity of the algorithm at our institution was even higher than the manufacturer's reported values, the false discovery rate was 67%, leading to the perception that the algorithm was inaccurate. In addition, the positive predictive value at our institution was 33% compared with the manufacturer's reported values of 95-98%. This disparity can be attributed to differences in disease prevalence of 4.1% at our institution compared with 45.0-62.2% from the manufacturer's reported values. Despite the LVO algorithm's accuracy performing as advertised, it was perceived as inaccurate due to more false positives than anticipated and was removed from clinical practice. This was likely due to a cognitive bias called the accuracy paradox. To mitigate the accuracy paradox, radiologists should be presented with metrics based on a disease prevalence similar to their practice when evaluating and utilizing artificial intelligence tools. Question An artificial intelligence algorithm for detecting emergent LVOs was implemented in an emergency department, but it was perceived to be inaccurate. Findings Although the algorithm's accuracy was both high and as advertised, the algorithm demonstrated a high false discovery rate. Clinical relevance The misperception of the algorithm's inaccuracy was likely due to a special case of the base rate fallacy-the accuracy paradox. Equipping radiologists with an algorithm's false discovery rate based on local prevalence will ensure realistic expectations for real-world performance.

CT Detection Neurological Retrospective Clinical Clinical Pilot None Academic Lab

Image quality assessment of artificial intelligence iterative reconstruction for low dose unenhanced abdomen: comparison with hybrid iterative reconstruction.

Qi H, Cui D, Xu S, Li W, Zeng Q

•papers•Jul 1 2025

To assess the impact of artificial intelligence iterative reconstruction algorithms (AIIR) on image quality with phantom and clinical studies. The phantom images were reconstructed with the hybrid iterative algorithm (HIR: Karl 3D-3, 5, 7, 9) and AIIR (grades 1-5) algorithm. Noise power spectra (NPS), task transfer functions (TTF) were measured, and additionally sharpness was assessed using a "blur metric" procedure. Sixty-two consecutive patients underwent standard-dose and low-dose unenhanced abdominal computed tomography (CT) scans, i.e., SDCT and LDCT groups, respectively. The SDCT images reconstructed using the Karl 3D-5, and the LDCT images reconstructed using the Karl 3D-5 and the AIIR-3 and 5, respectively. CT values, standard deviation (SD), signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR) were assessed for hepatic parenchyma and paravertebral muscles. Images were independently evaluated by two radiologists for image-quality, noise, sharpness, and lesion diagnostic confidence. In the phantom study, AIIR algorithm provided higher TTF50% and NPS average spatial frequency compared to HIR. In the clinical study, there was no statistically significant difference in CT values among the four reconstruction images (p > 0.05). The LDCT group AIIR-3 obtained the lowest SD values and the highest mean CNR and SNR values compared to the other three groups (p < 0.05). For qualitative assessment, the image subjective characteristic scores of AIIR-5 in the LDCT group, compared with the SDCT group, were not statistically significant (p > 0.05). AIIR reduces radiation dose levels by approximately 78% and still maintains the image quality of unenhanced abdominal CT compared to HIR with SDCT. NCT06142539.

CT Reconstruction Abdominal Retrospective Clinical Clinical Pilot None Academic Lab

Deep learning model for low-dose CT late iodine enhancement imaging and extracellular volume quantification.

Yu Y, Wu D, Lan Z, Dai X, Yang W, Yuan J, Xu Z, Wang J, Tao Z, Ling R, Zhang S, Zhang J

•papers•Jul 1 2025

To develop and validate deep learning (DL)-models that denoise late iodine enhancement (LIE) images and enable accurate extracellular volume (ECV) quantification. This study retrospectively included patients with chest discomfort who underwent CT myocardial perfusion + CT angiography + LIE from two hospitals. Two DL models, residual dense network (RDN) and conditional generative adversarial network (cGAN), were developed and validated. 423 patients were randomly divided into training (182 patients), tuning (48 patients), internal validation (92 patients) and external validation group (101 patients). LIEsingle (single-stack image), LIEaveraging (averaging multiple-stack images), LIERDN (single-stack image denoised by RDN) and LIEGAN (single-stack image denoised by cGAN) were generated. We compared image quality score, signal-to-noise (SNR) and contrast-to-noise (CNR) of four LIE sets. The identifiability of denoised images for positive LIE and increased ECV (> 30%) was assessed. The image quality of LIEGAN (SNR: 13.3 ± 1.9; CNR: 4.5 ± 1.1) and LIERDN (SNR: 20.5 ± 4.7; CNR: 7.5 ± 2.3) images was markedly better than that of LIEsingle (SNR: 4.4 ± 0.7; CNR: 1.6 ± 0.4). At per-segment level, the area under the curve (AUC) of LIERDN images for LIE evaluation was significantly improved compared with those of LIEGAN and LIEsingle images (p = 0.040 and p < 0.001, respectively). Meanwhile, the AUC and accuracy of ECVRDN were significantly higher than those of ECVGAN and ECVsingle at per-segment level (p < 0.001 for all). RDN model generated denoised LIE images with markedly higher SNR and CNR than the cGAN-model and original images, which significantly improved the identifiability of visual analysis. Moreover, using denoised single-stack images led to accurate CT-ECV quantification. Question Can the developed models denoise CT-derived late iodine enhancement high images and improve signal-to-noise ratio? Findings The residual dense network model significantly improved the image quality for late iodine enhancement and enabled accurate CT- extracellular volume quantification. Clinical relevance The residual dense network model generates denoised late iodine enhancement images with the highest signal-to-noise ratio and enables accurate quantification of extracellular volume.

CT Reconstruction Cardiac Retrospective Clinical In Silico None Academic Lab

Preoperative prediction of post hepatectomy liver failure after surgery for hepatocellular carcinoma on CT-scan by machine learning and radiomics analyses.

Famularo S, Maino C, Milana F, Ardito F, Rompianesi G, Ciulli C, Conci S, Gallotti A, La Barba G, Romano M, De Angelis M, Patauner S, Penzo C, De Rose AM, Marescaux J, Diana M, Ippolito D, Frena A, Boccia L, Zanus G, Ercolani G, Maestri M, Grazi GL, Ruzzenente A, Romano F, Troisi RI, Giuliante F, Donadon M, Torzilli G

•papers•Jul 1 2025

No instruments are available to predict preoperatively the risk of posthepatectomy liver failure (PHLF) in HCC patients. The aim was to predict the occurrence of PHLF preoperatively by radiomics and clinical data through machine-learning algorithms. Clinical data and 3-phases CT scans were retrospectively collected among 13 Italian centres between 2008 and 2022. Radiomics features were extracted in the non-tumoral liver area. Data were split between training(70 %) and test(30 %) sets. An oversampling was run(ADASYN) in the training set. Random-Forest(RF), extreme gradient boosting (XGB) and support vector machine (SVM) models were fitted to predict PHLF. Final evaluation of the metrics was run in the test set. The best models were included in an averaging ensemble model (AEM). Five-hundred consecutive preoperative CT scans were collected with the relative clinical data. Of them, 17 (3.4 %) experienced a PHLF. Two-hundred sixteen radiomics features per patient were extracted. PCA selected 19 dimensions explaining >75 % of the variance. Associated clinical variables were: size, macrovascular invasion, cirrhosis, major resection and MELD score. Data were split in training cohort (70 %, n = 351) and a test cohort (30 %, n = 149). The RF model obtained an AUC = 89.1 %(Spec. = 70.1 %, Sens. = 100 %, accuracy = 71.1 %, PPV = 10.4 %, NPV = 100 %). The XGB model showed an AUC = 89.4 %(Spec. = 100 %, Sens. = 20.0 %, Accuracy = 97.3 %, PPV = 20 %, NPV = 97.3 %). The AEM combined the XGB and RF model, obtaining an AUC = 90.1 %(Spec. = 89.5 %, Sens. = 80.0 %, accuracy = 89.2 %, PPV = 21.0 %, NPV = 99.2 %). The AEM obtained the best results in terms of discrimination and true positive identification. This could lead to better define patients fit or unfit for liver resection.

CT Classification Abdominal Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Accuracy of machine learning models for pre-diagnosis and diagnosis of pancreatic ductal adenocarcinoma in contrast-CT images: a systematic review and meta-analysis.

Lopes Costa GL, Tasca Petroski G, Machado LG, Eulalio Santos B, de Oliveira Ramos F, Feuerschuette Neto LM, De Luca Canto G

•papers•Jul 1 2025

To evaluate the diagnostic ability and methodological quality of ML models in detecting Pancreatic Ductal Adenocarcinoma (PDAC) in Contrast CT images. Included studies assessed adults diagnosed with PDAC, confirmed by histopathology. Metrics of tests were interpreted by ML algorithms. Studies provided data on sensitivity and specificity. Studies that did not meet the inclusion criteria, segmentation-focused studies, multiple classifiers or non-diagnostic studies were excluded. PubMed, Cochrane Central Register of Controlled Trials, and Embase were searched without restrictions. Risk of bias was assessed using QUADAS-2, methodological quality was evaluated using Radiomics Quality Score (RQS) and a Checklist for AI in Medical Imaging (CLAIM). Bivariate random-effects models were used for meta-analysis of sensitivity and specificity, I2 values and subgroup analysis used to assess heterogeneity. Nine studies were included and 12,788 participants were evaluated, of which 3,997 were included in the meta-analysis. AI models based on CT scans showed an accuracy of 88.7% (IC 95%, 87.7%-89.7%), sensitivity of 87.9% (95% CI, 82.9%-91.6%), and specificity of 92.2% (95% CI, 86.8%-95.5%). The average score of six radiomics studies was 17.83 RQS points. Nine ML methods had an average CLAIM score of 30.55 points. Our study is the first to quantitatively interpret various independent research, offering insights for clinical application. Despite favorable sensitivity and specificity results, the studies were of low quality, limiting definitive conclusions. Further research is necessary to validate these models before widespread adoption.

CT Classification Abdominal Meta Analysis In Silico None Academic Lab Benchmark SOTA

Identifying threshold of CT-defined muscle loss after radiotherapy for survival in oral cavity cancer using machine learning.

Lee J, Lin JB, Lin WC, Jan YT, Leu YS, Chen YJ, Wu KP

•papers•Jul 1 2025

Muscle loss after radiotherapy is associated with poorer survival in patients with oral cavity squamous cell carcinoma (OCSCC). However, the threshold of muscle loss remains unclear. This study aimed to utilize explainable artificial intelligence to identify the threshold of muscle loss associated with survival in OCSCC. We enrolled 1087 patients with OCSCC treated with surgery and adjuvant radiotherapy at two tertiary centers (660 in the derivation cohort and 427 in the external validation cohort). Skeletal muscle index (SMI) was measured using pre- and post-radiotherapy computed tomography (CT) at the C3 vertebral level. Random forest (RF), eXtreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost) models were developed to predict all-cause mortality, and their performances were evaluated using the area under the curve (AUC). Muscle loss threshold was identified using the SHapley Additive exPlanations (SHAP) method and validated using Cox regression analysis. In the external validation cohort, the RF, XGBoost, and CatBoost models achieved favorable performance in predicting all-cause mortality (AUC: 0.898, 0.859, and 0.842). The SHAP method demonstrated that SMI change after radiotherapy was the most important feature for predicting all-cause mortality and consistently identified SMI loss ≥ 4.2% as the threshold in all three models. In multivariable analysis, SMI loss ≥ 4.2% was independently associated with increased all-cause mortality risk in both cohorts (derivation cohort: hazard ratio: 6.66, p < 0.001; external validation cohort: hazard ratio: 8.46, p < 0.001). This study can assist clinicians in identifying patients with considerable muscle loss after treatment and guide interventions to improve muscle mass. Question Muscle loss after radiotherapy is associated with poorer survival in patients with oral cavity cancer; however, the threshold of muscle loss remains unclear. Findings Explainable artificial intelligence identified muscle loss ≥ 4.2% as the threshold of increased all-cause mortality risk in both derivation and external validation cohorts. Clinical Relevance Muscle loss ≥ 4.2% may be the optimal threshold for survival in patients who receive adjuvant radiotherapy for oral cavity cancer. This threshold can guide clinicians in improving muscle mass after radiotherapy.

CT Classification Other Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

A multimodal deep-learning model based on multichannel CT radiomics for predicting pathological grade of bladder cancer.

Zhao T, He J, Zhang L, Li H, Duan Q

•papers•Jul 1 2025

To construct a predictive model using deep-learning radiomics and clinical risk factors for assessing the preoperative histopathological grade of bladder cancer according to computed tomography (CT) images. A retrospective analysis was conducted involving 201 bladder cancer patients with definite pathological grading results after surgical excision at the organization between January 2019 and June 2023. The cohort was classified into a test set of 81 cases and a training set of 120 cases. Hand-crafted radiomics (HCR) and features derived from deep-learning (DL) were obtained from computed tomography (CT) images. The research builds a prediction model using 12 machine-learning classifiers, which integrate HCR, DL features, and clinical data. Model performance was estimated utilizing decision-curve analysis (DCA), the area under the curve (AUC), and calibration curves. Among the classifiers tested, the logistic regression model that combined DL and HCR characteristics demonstrated the finest performance. The AUC values were 0.912 (training set) and 0.777 (test set). The AUC values of clinical model achieved 0.850 (training set) and 0.804 (test set). The AUC values of the combined model were 0.933 (training set) and 0.824 (test set), outperforming both the clinical and HCR-only models. The CT-based combined model demonstrated considerable diagnostic capability in differentiating high-grade from low-grade bladder cancer, serving as a valuable noninvasive instrument for preoperative pathological evaluation.

CT Classification Abdominal Retrospective Clinical In Silico None Academic Lab

Response prediction for neoadjuvant treatment in locally advanced rectal cancer patients-improvement in decision-making: A systematic review.

Boldrini L, Charles-Davies D, Romano A, Mancino M, Nacci I, Tran HE, Bono F, Boccia E, Gambacorta MA, Chiloiro G

•papers•Jul 1 2025

Predicting pathological complete response (pCR) from pre or post-treatment features could be significant in improving the process of making clinical decisions and providing a more personalized treatment approach for better treatment outcomes. However, the lack of external validation of predictive models, missing in several published articles, is a major issue that can potentially limit the reliability and applicability of predictive models in clinical settings. Therefore, this systematic review described different externally validated methods of predicting response to neoadjuvant chemoradiotherapy (nCRT) in locally advanced rectal cancer (LARC) patients and how they could improve clinical decision-making. An extensive search for eligible articles was performed on PubMed, Cochrane, and Scopus between 2018 and 2023, using the keywords: (Response OR outcome) prediction AND (neoadjuvant OR chemoradiotherapy) treatment in 'locally advanced Rectal Cancer'. (i) Studies including patients diagnosed with LARC (T3/4 and N- or any T and N+) by pre-medical imaging and pathological examination or as stated by the author (ii) Standardized nCRT completed. (iii) Treatment with long or short course radiotherapy. (iv) Studies reporting on the prediction of response to nCRT with pathological complete response (pCR) as the primary outcome. (v) Studies reporting external validation results for response prediction. (vi) Regarding language restrictions, only articles in English were accepted. (i) We excluded case report studies, conference abstracts, reviews, studies reporting patients with distant metastases at diagnosis. (ii) Studies reporting response prediction with only internally validated approaches. Three researchers (DC-D, FB, HT) independently reviewed and screened titles and abstracts of all articles retrieved after de-duplication. Possible disagreements were resolved through discussion among the three researchers. If necessary, three other researchers (LB, GC, MG) were consulted to make the final decision. The extraction of data was performed using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) template and quality assessment was done using the Prediction model Risk Of Bias Assessment Tool (PROBAST). A total of 4547 records were identified from the three databases. After excluding 392 duplicate results, 4155 records underwent title and abstract screening. Three thousand and eight hundred articles were excluded after title and abstract screening and 355 articles were retrieved. Out of the 355 retrieved articles, 51 studies were assessed for eligibility. Nineteen reports were then excluded due to lack of reports on external validation, while 4 were excluded due to lack of evaluation of pCR as the primary outcome. Only Twenty-eight articles were eligible and included in this systematic review. In terms of quality assessment, 89 % of the models had low concerns in the participants domain, while 11 % had an unclear rating. 96 % of the models were of low concern in both the predictors and outcome domains. The overall rating showed high applicability potential of the models with 82 % showing low concern, while 18 % were deemed unclear. Most of the external validated techniques showed promising performances and the potential to be applied in clinical settings, which is a crucial step towards evidence-based medicine. However, more studies focused on the external validations of these models in larger cohorts is necessary to ensure that they can reliably predict outcomes in diverse populations.

Mixed Modality Classification Abdominal Meta Analysis In Silico None Academic Lab Benchmark SOTA

Deep learning algorithm enables automated Cobb angle measurements with high accuracy.

Hayashi D, Regnard NE, Ventre J, Marty V, Clovis L, Lim L, Nitche N, Zhang Z, Tournier A, Ducarouge A, Kompel AJ, Tannoury C, Guermazi A

•papers•Jul 1 2025

To determine the accuracy of automatic Cobb angle measurements by deep learning (DL) on full spine radiographs. Full spine radiographs of patients aged > 2 years were screened using the radiology reports to identify radiographs for performing Cobb angle measurements. Two senior musculoskeletal radiologists and one senior orthopedic surgeon independently annotated Cobb angles exceeding 7° indicating the angle location as either proximal thoracic (apices between T3 and T5), main thoracic (apices between T6 and T11), or thoraco-lumbar (apices between T12 and L4). If at least two readers agreed on the number of angles, location of the angles, and difference between comparable angles was < 8°, then the ground truth was defined as the mean of their measurements. Otherwise, the radiographs were reviewed by the three annotators in consensus. The DL software (BoneMetrics, Gleamer) was evaluated against the manual annotation in terms of mean absolute error (MAE). A total of 345 patients were included in the study (age 33 ± 24 years, 221 women): 179 pediatric patients (< 22 years old) and 166 adult patients (22 to 85 years old). Fifty-three cases were reviewed in consensus. The MAE of the DL algorithm for the main curvature was 2.6° (95% CI [2.0; 3.3]). For the subgroup of pediatric patients, the MAE was 1.9° (95% CI [1.6; 2.2]) versus 3.3° (95% CI [2.2; 4.8]) for adults. The DL algorithm predicted the Cobb angle of scoliotic patients with high accuracy.

X-Ray Segmentation Musculoskeletal Retrospective Clinical Clinical Pilot None Startup

Predicting progression-free survival in sarcoma using MRI-based automatic segmentation models and radiomics nomograms: a preliminary multicenter study.

Implementing an AI algorithm in the clinical setting: a case study for the accuracy paradox.

Image quality assessment of artificial intelligence iterative reconstruction for low dose unenhanced abdomen: comparison with hybrid iterative reconstruction.

Deep learning model for low-dose CT late iodine enhancement imaging and extracellular volume quantification.

Preoperative prediction of post hepatectomy liver failure after surgery for hepatocellular carcinoma on CT-scan by machine learning and radiomics analyses.

Accuracy of machine learning models for pre-diagnosis and diagnosis of pancreatic ductal adenocarcinoma in contrast-CT images: a systematic review and meta-analysis.

Identifying threshold of CT-defined muscle loss after radiotherapy for survival in oral cavity cancer using machine learning.

A multimodal deep-learning model based on multichannel CT radiomics for predicting pathological grade of bladder cancer.

Response prediction for neoadjuvant treatment in locally advanced rectal cancer patients-improvement in decision-making: A systematic review.

Deep learning algorithm enables automated Cobb angle measurements with high accuracy.

Ready to Sharpen Your Edge?