Latest Papers on Radiology AI. Tags: In Silico

Identifying threshold of CT-defined muscle loss after radiotherapy for survival in oral cavity cancer using machine learning.

Lee J, Lin JB, Lin WC, Jan YT, Leu YS, Chen YJ, Wu KP

•papers•Jul 1 2025

Muscle loss after radiotherapy is associated with poorer survival in patients with oral cavity squamous cell carcinoma (OCSCC). However, the threshold of muscle loss remains unclear. This study aimed to utilize explainable artificial intelligence to identify the threshold of muscle loss associated with survival in OCSCC. We enrolled 1087 patients with OCSCC treated with surgery and adjuvant radiotherapy at two tertiary centers (660 in the derivation cohort and 427 in the external validation cohort). Skeletal muscle index (SMI) was measured using pre- and post-radiotherapy computed tomography (CT) at the C3 vertebral level. Random forest (RF), eXtreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost) models were developed to predict all-cause mortality, and their performances were evaluated using the area under the curve (AUC). Muscle loss threshold was identified using the SHapley Additive exPlanations (SHAP) method and validated using Cox regression analysis. In the external validation cohort, the RF, XGBoost, and CatBoost models achieved favorable performance in predicting all-cause mortality (AUC: 0.898, 0.859, and 0.842). The SHAP method demonstrated that SMI change after radiotherapy was the most important feature for predicting all-cause mortality and consistently identified SMI loss ≥ 4.2% as the threshold in all three models. In multivariable analysis, SMI loss ≥ 4.2% was independently associated with increased all-cause mortality risk in both cohorts (derivation cohort: hazard ratio: 6.66, p < 0.001; external validation cohort: hazard ratio: 8.46, p < 0.001). This study can assist clinicians in identifying patients with considerable muscle loss after treatment and guide interventions to improve muscle mass. Question Muscle loss after radiotherapy is associated with poorer survival in patients with oral cavity cancer; however, the threshold of muscle loss remains unclear. Findings Explainable artificial intelligence identified muscle loss ≥ 4.2% as the threshold of increased all-cause mortality risk in both derivation and external validation cohorts. Clinical Relevance Muscle loss ≥ 4.2% may be the optimal threshold for survival in patients who receive adjuvant radiotherapy for oral cavity cancer. This threshold can guide clinicians in improving muscle mass after radiotherapy.

CT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Accuracy of machine learning models for pre-diagnosis and diagnosis of pancreatic ductal adenocarcinoma in contrast-CT images: a systematic review and meta-analysis.

Lopes Costa GL, Tasca Petroski G, Machado LG, Eulalio Santos B, de Oliveira Ramos F, Feuerschuette Neto LM, De Luca Canto G

•papers•Jul 1 2025

To evaluate the diagnostic ability and methodological quality of ML models in detecting Pancreatic Ductal Adenocarcinoma (PDAC) in Contrast CT images. Included studies assessed adults diagnosed with PDAC, confirmed by histopathology. Metrics of tests were interpreted by ML algorithms. Studies provided data on sensitivity and specificity. Studies that did not meet the inclusion criteria, segmentation-focused studies, multiple classifiers or non-diagnostic studies were excluded. PubMed, Cochrane Central Register of Controlled Trials, and Embase were searched without restrictions. Risk of bias was assessed using QUADAS-2, methodological quality was evaluated using Radiomics Quality Score (RQS) and a Checklist for AI in Medical Imaging (CLAIM). Bivariate random-effects models were used for meta-analysis of sensitivity and specificity, I2 values and subgroup analysis used to assess heterogeneity. Nine studies were included and 12,788 participants were evaluated, of which 3,997 were included in the meta-analysis. AI models based on CT scans showed an accuracy of 88.7% (IC 95%, 87.7%-89.7%), sensitivity of 87.9% (95% CI, 82.9%-91.6%), and specificity of 92.2% (95% CI, 86.8%-95.5%). The average score of six radiomics studies was 17.83 RQS points. Nine ML methods had an average CLAIM score of 30.55 points. Our study is the first to quantitatively interpret various independent research, offering insights for clinical application. Despite favorable sensitivity and specificity results, the studies were of low quality, limiting definitive conclusions. Further research is necessary to validate these models before widespread adoption.

CT Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA

Preoperative prediction of post hepatectomy liver failure after surgery for hepatocellular carcinoma on CT-scan by machine learning and radiomics analyses.

Famularo S, Maino C, Milana F, Ardito F, Rompianesi G, Ciulli C, Conci S, Gallotti A, La Barba G, Romano M, De Angelis M, Patauner S, Penzo C, De Rose AM, Marescaux J, Diana M, Ippolito D, Frena A, Boccia L, Zanus G, Ercolani G, Maestri M, Grazi GL, Ruzzenente A, Romano F, Troisi RI, Giuliante F, Donadon M, Torzilli G

•papers•Jul 1 2025

No instruments are available to predict preoperatively the risk of posthepatectomy liver failure (PHLF) in HCC patients. The aim was to predict the occurrence of PHLF preoperatively by radiomics and clinical data through machine-learning algorithms. Clinical data and 3-phases CT scans were retrospectively collected among 13 Italian centres between 2008 and 2022. Radiomics features were extracted in the non-tumoral liver area. Data were split between training(70 %) and test(30 %) sets. An oversampling was run(ADASYN) in the training set. Random-Forest(RF), extreme gradient boosting (XGB) and support vector machine (SVM) models were fitted to predict PHLF. Final evaluation of the metrics was run in the test set. The best models were included in an averaging ensemble model (AEM). Five-hundred consecutive preoperative CT scans were collected with the relative clinical data. Of them, 17 (3.4 %) experienced a PHLF. Two-hundred sixteen radiomics features per patient were extracted. PCA selected 19 dimensions explaining >75 % of the variance. Associated clinical variables were: size, macrovascular invasion, cirrhosis, major resection and MELD score. Data were split in training cohort (70 %, n = 351) and a test cohort (30 %, n = 149). The RF model obtained an AUC = 89.1 %(Spec. = 70.1 %, Sens. = 100 %, accuracy = 71.1 %, PPV = 10.4 %, NPV = 100 %). The XGB model showed an AUC = 89.4 %(Spec. = 100 %, Sens. = 20.0 %, Accuracy = 97.3 %, PPV = 20 %, NPV = 97.3 %). The AEM combined the XGB and RF model, obtaining an AUC = 90.1 %(Spec. = 89.5 %, Sens. = 80.0 %, accuracy = 89.2 %, PPV = 21.0 %, NPV = 99.2 %). The AEM obtained the best results in terms of discrimination and true positive identification. This could lead to better define patients fit or unfit for liver resection.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning model for low-dose CT late iodine enhancement imaging and extracellular volume quantification.

Yu Y, Wu D, Lan Z, Dai X, Yang W, Yuan J, Xu Z, Wang J, Tao Z, Ling R, Zhang S, Zhang J

•papers•Jul 1 2025

To develop and validate deep learning (DL)-models that denoise late iodine enhancement (LIE) images and enable accurate extracellular volume (ECV) quantification. This study retrospectively included patients with chest discomfort who underwent CT myocardial perfusion + CT angiography + LIE from two hospitals. Two DL models, residual dense network (RDN) and conditional generative adversarial network (cGAN), were developed and validated. 423 patients were randomly divided into training (182 patients), tuning (48 patients), internal validation (92 patients) and external validation group (101 patients). LIEsingle (single-stack image), LIEaveraging (averaging multiple-stack images), LIERDN (single-stack image denoised by RDN) and LIEGAN (single-stack image denoised by cGAN) were generated. We compared image quality score, signal-to-noise (SNR) and contrast-to-noise (CNR) of four LIE sets. The identifiability of denoised images for positive LIE and increased ECV (> 30%) was assessed. The image quality of LIEGAN (SNR: 13.3 ± 1.9; CNR: 4.5 ± 1.1) and LIERDN (SNR: 20.5 ± 4.7; CNR: 7.5 ± 2.3) images was markedly better than that of LIEsingle (SNR: 4.4 ± 0.7; CNR: 1.6 ± 0.4). At per-segment level, the area under the curve (AUC) of LIERDN images for LIE evaluation was significantly improved compared with those of LIEGAN and LIEsingle images (p = 0.040 and p < 0.001, respectively). Meanwhile, the AUC and accuracy of ECVRDN were significantly higher than those of ECVGAN and ECVsingle at per-segment level (p < 0.001 for all). RDN model generated denoised LIE images with markedly higher SNR and CNR than the cGAN-model and original images, which significantly improved the identifiability of visual analysis. Moreover, using denoised single-stack images led to accurate CT-ECV quantification. Question Can the developed models denoise CT-derived late iodine enhancement high images and improve signal-to-noise ratio? Findings The residual dense network model significantly improved the image quality for late iodine enhancement and enabled accurate CT- extracellular volume quantification. Clinical relevance The residual dense network model generates denoised late iodine enhancement images with the highest signal-to-noise ratio and enables accurate quantification of extracellular volume.

CT Reconstruction Cardiac Retrospective Clinical In Silico Academic Lab

Predicting progression-free survival in sarcoma using MRI-based automatic segmentation models and radiomics nomograms: a preliminary multicenter study.

Zhu N, Niu F, Fan S, Meng X, Hu Y, Han J, Wang Z

•papers•Jul 1 2025

Some sarcomas are highly malignant, associated with high recurrence despite treatment. This multicenter study aimed to develop and validate a radiomics signature to estimate sarcoma progression-free survival (PFS). The study retrospectively enrolled 202 consecutive patients with pathologically diagnosed sarcoma, who had pre-treatment axial fat-suppressed T2-weighted images (FS-T2WI), and included them in the ROI-Net model for training. Among them, 120 patients were included in the radiomics analysis, all of whom had pre-treatment axial T1-weighted and transverse FS-T2WI images, and were randomly divided into a development group (n = 96) and a validation group (n = 24). In the development cohort, Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression was used to develop the radiomics features for PFS prediction. By combining significant clinical features with radiomics features, a nomogram was constructed using Cox regression. The proposed ROI-Net framework achieved a Dice coefficient of 0.820 (0.791-0.848). The radiomics signature based on 21 features could distinguish high-risk patients with poor PFS. Univariate Cox analysis revealed that peritumoral edema, metastases, and the radiomics score were associated with poor PFS and were included in the construction of the nomogram. The Radiomics-T1WI-Clinical model exhibited the best performance, with AUC values of 0.947, 0.907, and 0.924 at 300 days, 600 days, and 900 days, respectively. The proposed ROI-Net framework demonstrated high consistency between its segmentation results and expert annotations. The radiomics features and the combined nomogram have the potential to aid in predicting PFS for patients with sarcoma.

MRI Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images.

Guérendel C, Petrychenko L, Chupetlovska K, Bodalal Z, Beets-Tan RGH, Benson S

•papers•Jul 1 2025

This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning. We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations. The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart. Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods. Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Risk prediction for elderly cognitive impairment by radiomic and morphological quantification analysis based on a cerebral MRA imaging cohort.

Xu X, Zhou Y, Sun S, Cui L, Chen Z, Guo Y, Jiang J, Wang X, Sun T, Yang Q, Wang Y, Yuan Y, Fan L, Yang G, Cao F

•papers•Jul 1 2025

To establish morphological and radiomic models for early prediction of cognitive impairment associated with cerebrovascular disease (CI-CVD) in an elderly cohort based on cerebral magnetic resonance angiography (MRA). One-hundred four patients with CI-CVD and 107 control subjects were retrospectively recruited from the 14-year elderly MRA cohort, and 63 subjects were enrolled for external validation. Automated quantitative analysis was applied to analyse the morphological features, including the stenosis score, length, relative length, twisted angle, and maximum deviation of cerebral arteries. Clinical and morphological risk factors were screened using univariate logistic regression. Radiomic features were extracted via least absolute shrinkage and selection operator (LASSO) regression. The predictive models of CI-CVD were established in the training set and verified in the external testing set. A history of stroke was demonstrated to be a clinical risk factor (OR 2.796, 1.359-5.751). Stenosis ≥ 50% in the right middle cerebral artery (RMCA) and left posterior cerebral artery (LPCA), maximum deviation of the left internal carotid artery (LICA), and twisted angles of the right internal carotid artery (RICA) and LICA were identified as morphological risk factors, with ORs of 4.522 (1.237-16.523), 2.851 (1.438-5.652), 1.373 (1.136-1.661), 0.981 (0.966-0.997) and 0.976 (0.958-0.994), respectively. Overall, 33 radiomic features were screened as risk factors. The clinical-morphological-radiomic model demonstrated optimal performance, with an AUC of 0.883 (0.838-0.928) in the training set and 0.843 (0.743-0.943) in the external testing set. Radiomics features combined with morphological indicators of cerebral arteries were effective indicators for early signs of CI-CVD in elderly individuals. Question The relationship between morphological features of cerebral arteries and cognitive impairment associated with cerebrovascular disease (CI-CVD) deserves to be explored. Findings The multipredictor model combining with stroke history, vascular morphological indicators and radiomic features of cerebral arteries demonstrated optimal performance for the early warning of CI-CVD. Clinical relevance Stenosis percentage and tortuosity score of the cerebral arteries are important risk factors for cognitive impairment. The radiomic features combined with morphological quantification analysis based on cerebral MRA provide higher predictive performance of CI-CVD.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Deep learning-based image domain reconstruction enhances image quality and pulmonary nodule detection in ultralow-dose CT with adaptive statistical iterative reconstruction-V.

Ye K, Xu L, Pan B, Li J, Li M, Yuan H, Gong NJ

•papers•Jul 1 2025

To evaluate the image quality and lung nodule detectability of ultralow-dose CT (ULDCT) with adaptive statistical iterative reconstruction-V (ASiR-V) post-processed using a deep learning image reconstruction (DLIR)-based image domain compared to low-dose CT (LDCT) and ULDCT without DLIR. A total of 210 patients undergoing lung cancer screening underwent LDCT (mean ± SD, 0.81 ± 0.28 mSv) and ULDCT (0.17 ± 0.03 mSv) scans. ULDCT images were reconstructed with ASiR-V (ULDCT-ASiR-V) and post-processed using DLIR (ULDCT-DLIR). The quality of the three CT images was analyzed. Three radiologists detected and measured pulmonary nodules on all CT images, with LDCT results serving as references. Nodule conspicuity was assessed using a five-point Likert scale, followed by further statistical analyses. A total of 463 nodules were detected using LDCT. The image noise of ULDCT-DLIR decreased by 60% compared to that of ULDCT-ASiR-V and was lower than that of LDCT (p < 0.001). The subjective image quality scores for ULDCT-DLIR (4.4 [4.1, 4.6]) were also higher than those for ULDCT-ASiR-V (3.6 [3.1, 3.9]) (p < 0.001). The overall nodule detection rates for ULDCT-ASiR-V and ULDCT-DLIR were 82.1% (380/463) and 87.0% (403/463), respectively (p < 0.001). The percentage difference between diameters > 1 mm was 2.9% (ULDCT-ASiR-V vs. LDCT) and 0.5% (ULDCT-DLIR vs. LDCT) (p = 0.009). Scores of nodule imaging sharpness on ULDCT-DLIR (4.0 ± 0.68) were significantly higher than those on ULDCT-ASiR-V (3.2 ± 0.50) (p < 0.001). DLIR-based image domain improves image quality, nodule detection rate, nodule imaging sharpness, and nodule measurement accuracy of ASiR-V on ULDCT. Question Deep learning post-processing is simple and cheap compared with raw data processing, but its performance is not clear on ultralow-dose CT. Findings Deep learning post-processing enhanced image quality and improved the nodule detection rate and accuracy of nodule measurement of ultralow-dose CT. Clinical relevance Deep learning post-processing improves the practicability of ultralow-dose CT and makes it possible for patients with less radiation exposure during lung cancer screening.

CT Detection Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Malignancy risk stratification for pulmonary nodules: comparing a deep learning approach to multiparametric statistical models in different disease groups.

Piskorski L, Debic M, von Stackelberg O, Schlamp K, Welzel L, Weinheimer O, Peters AA, Wielpütz MO, Frauenfelder T, Kauczor HU, Heußel CP, Kroschke J

•papers•Jul 1 2025

Incidentally detected pulmonary nodules present a challenge in clinical routine with demand for reliable support systems for risk classification. We aimed to evaluate the performance of the lung-cancer-prediction-convolutional-neural-network (LCP-CNN), a deep learning-based approach, in comparison to multiparametric statistical methods (Brock model and Lung-RADS®) for risk classification of nodules in cohorts with different risk profiles and underlying pulmonary diseases. Retrospective analysis was conducted on non-contrast and contrast-enhanced CT scans containing pulmonary nodules measuring 5-30 mm. Ground truth was defined by histology or follow-up stability. The final analysis was performed on 297 patients with 422 eligible nodules, of which 105 nodules were malignant. Classification performance of the LCP-CNN, Brock model, and Lung-RADS® was evaluated in terms of diagnostic accuracy measurements including ROC-analysis for different subcohorts (total, screening, emphysema, and interstitial lung disease). LCP-CNN demonstrated superior performance compared to the Brock model in total and screening cohorts (AUC 0.92 (95% CI: 0.89-0.94) and 0.93 (95% CI: 0.89-0.96)). Superior sensitivity of LCP-CNN was demonstrated compared to the Brock model and Lung-RADS® in total, screening, and emphysema cohorts for a risk threshold of 5%. Superior sensitivity of LCP-CNN was also shown across all disease groups compared to the Brock model at a threshold of 65%, compared to Lung-RADS® sensitivity was better or equal. No significant differences in the performance of LCP-CNN were found between subcohorts. This study offers further evidence of the potential to integrate deep learning-based decision support systems into pulmonary nodule classification workflows, irrespective of the individual patient risk profile and underlying pulmonary disease. Question Is a deep-learning approach (LCP-CNN) superior to multiparametric models (Brock model, Lung-RADS®) in classifying pulmonary nodule risk across varied patient profiles? Findings LCP-CNN shows superior performance in risk classification of pulmonary nodules compared to multiparametric models with no significant impact on risk profiles and structural pulmonary diseases. Clinical relevance LCP-CNN offers efficiency and accuracy, addressing limitations of traditional models, such as variations in manual measurements or lack of patient data, while producing robust results. Such approaches may therefore impact clinical work by complementing or even replacing current approaches.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multi-site, multi-vendor development and validation of a deep learning model for liver stiffness prediction using abdominal biparametric MRI.

Ali R, Li H, Zhang H, Pan W, Reeder SB, Harris D, Masch W, Aslam A, Shanbhogue K, Bernieh A, Ranganathan S, Parikh N, Dillman JR, He L

•papers•Jul 1 2025

Chronic liver disease (CLD) is a substantial cause of morbidity and mortality worldwide. Liver stiffness, as measured by MR elastography (MRE), is well-accepted as a surrogate marker of liver fibrosis. To develop and validate deep learning (DL) models for predicting MRE-derived liver stiffness using routine clinical non-contrast abdominal T1-weighted (T1w) and T2-weighted (T2w) data from multiple institutions/system manufacturers in pediatric and adult patients. We identified pediatric and adult patients with known or suspected CLD from four institutions, who underwent clinical MRI with MRE from 2011 to 2022. We used T1w and T2w data to train DL models for liver stiffness classification. Patients were categorized into two groups for binary classification using liver stiffness thresholds (≥ 2.5 kPa, ≥ 3.0 kPa, ≥ 3.5 kPa, ≥ 4 kPa, or ≥ 5 kPa), reflecting various degrees of liver stiffening. We identified 4695 MRI examinations from 4295 patients (mean ± SD age, 47.6 ± 18.7 years; 428 (10.0%) pediatric; 2159 males [50.2%]). With a primary liver stiffness threshold of 3.0 kPa, our model correctly classified patients into no/minimal (< 3.0 kPa) vs moderate/severe (≥ 3.0 kPa) liver stiffness with AUROCs of 0.83 (95% CI: 0.82, 0.84) in our internal multi-site cross-validation (CV) experiment, 0.82 (95% CI: 0.80, 0.84) in our temporal hold-out validation experiment, and 0.79 (95% CI: 0.75, 0.81) in our external leave-one-site-out CV experiment. The developed model is publicly available ( https://github.com/almahdir1/Multi-channel-DeepLiverNet2.0.git ). Our DL models exhibited reasonable diagnostic performance for categorical classification of liver stiffness on a large diverse dataset using T1w and T2w MRI data. Question Can DL models accurately predict liver stiffness using routine clinical biparametric MRI in pediatric and adult patients with CLD? Findings DeepLiverNet2.0 used biparametric MRI data to classify liver stiffness, achieving AUROCs of 0.83, 0.82, and 0.79 for multi-site CV, hold-out validation, and external CV. Clinical relevance Our DeepLiverNet2.0 AI model can categorically classify the severity of liver stiffening using anatomic biparametric MR images in children and young adults. Model refinements and incorporation of clinical features may decrease the need for MRE.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Open Code

Filter Papers

Tags

Identifying threshold of CT-defined muscle loss after radiotherapy for survival in oral cavity cancer using machine learning.

Accuracy of machine learning models for pre-diagnosis and diagnosis of pancreatic ductal adenocarcinoma in contrast-CT images: a systematic review and meta-analysis.

Preoperative prediction of post hepatectomy liver failure after surgery for hepatocellular carcinoma on CT-scan by machine learning and radiomics analyses.

Deep learning model for low-dose CT late iodine enhancement imaging and extracellular volume quantification.

Predicting progression-free survival in sarcoma using MRI-based automatic segmentation models and radiomics nomograms: a preliminary multicenter study.

Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images.

Risk prediction for elderly cognitive impairment by radiomic and morphological quantification analysis based on a cerebral MRA imaging cohort.

Deep learning-based image domain reconstruction enhances image quality and pulmonary nodule detection in ultralow-dose CT with adaptive statistical iterative reconstruction-V.

Malignancy risk stratification for pulmonary nodules: comparing a deep learning approach to multiparametric statistical models in different disease groups.

Multi-site, multi-vendor development and validation of a deep learning model for liver stiffness prediction using abdominal biparametric MRI.

Ready to Sharpen Your Edge?