Latest Papers on Radiology AI. Tags: Classification

Machine learning predicts severe adverse events and salvage success of CT-guided lung biopsy after nondiagnostic transbronchial lung biopsy.

Yang S, Hua Z, Chen Y, Liu L, Wang Z, Cheng Y, Wang J, Xu Z, Chen C

•papers•Sep 22 2025

To address the unmet clinical need for validated risk stratification tools in salvage CT-guided percutaneous lung biopsy (PNLB) following nondiagnostic transbronchial lung biopsy (TBLB). We aimed to develop machine learning models predicting severe adverse events (SAEs) in PNLB (Model 1) and diagnostic success of salvage PNLB post-TBLB failure (Model 2). This multicenter predictive modeling study enrolled 2910 cases undergoing PNLB across two centers (Center 1: n = 2653 (2016-2020); Center 2: n = 257 (2017-2022)) with complete imaging and clinical documentation meeting predefined inclusion and exclusion criteria. Key variables were selected via LASSO regression, followed by development and validation of Model 1 (incorporating sex, smoking, pleural contact, lesion size, and puncture depth) and Model 2 (including age, lesion size, lesion characteristics, and post-bronchoscopic pathological categories (PBPCs)) using ten machine learning algorithms. Model performance was rigorously evaluated through discrimination metrics, calibration curves, and decision curve analysis to assess clinical applicability. A total of 2653 and 257 PNLB cases were included from two centers, where Model 1 achieved external validation ROC-AUC 0.717 (95% CI: 0.609-0.825) and PR-AUC 0.258 (95% CI: 0.0365-0.708), while Model 2 exhibited ROC-AUC 0.884 (95% CI: 0.784-0.984) and PR-AUC 0.852 (95% CI: 0.784-0.896), with XGBoost outperforming other algorithms. The dual XGBoost system stratifies salvage PNLB candidates by quantifying SAE risks (AUC = 0.717) versus diagnostic yield (AUC = 0.884), addressing the unmet need for personalized biopsy pathway optimization. Question Current tools cannot quantify severe adverse event (SAE) risks versus salvage diagnostic success for CT-guided lung biopsy (PNLB) after failed transbronchial biopsy (TBLB). Findings Dual XGBoost models successfully predicted the risks of PNLB SAEs (AUC = 0.717) and diagnostic success post-TBLB failure (AUC = 0.884) with validated clinical stratification benefits. Clinical relevance The dual XGBoost system guides clinical decision-making by integrating individual risk of SAEs with predictors of diagnostic success, enabling personalized salvage biopsy strategies that balance safety and diagnostic yield.

CT Classification Chest Retrospective Clinical In Silico Academic Lab

Deep-learning-based prediction of significant portal hypertension with single cross-sectional non-enhanced CT.

Yamamoto A, Sato S, Ueda D, Walston SL, Kageyama K, Jogo A, Nakano M, Kotani K, Uchida-Kobayashi S, Kawada N, Miki Y

•papers•Sep 22 2025

The purpose of this study was to establish a predictive deep learning (DL) model for clinically significant portal hypertension (CSPH) based on a single cross-sectional non-contrast CT image and to compare four representative positional images to determine the most suitable for the detection of CSPH. The study included 421 patients with chronic liver disease who underwent hepatic venous pressure gradient measurement at our institution between May 2007 and January 2024. Patients were randomly classified into training, validation, and test datasets at a ratio of 8:1:1. Non-contrast cross-sectional CT images from four target areas of interest were used to create four deep-learning-based models for predicting CSPH. The areas of interest were the umbilical portion of the portal vein (PV), the first right branch of the PV, the confluence of the splenic vein and PV, and the maximum cross-section of the spleen. The models were implemented using convolutional neural networks with a multilayer perceptron as the classifier. The model with the best predictive ability for CSPH was then compared to 13 conventional evaluation methods. Among the four areas, the umbilical portion of the PV had the highest predictive ability for CSPH (area under the curve [AUC]: 0.80). At the threshold maximizing the Youden index, sensitivity and specificity were 0.867 and 0.615, respectively. This DL model outperformed the ANTICIPATE model. We developed an algorithm that can predict CSPH immediately from a single slice of non-contrast CT, using the most suitable image of the umbilical portion of the PV. Question CSPH predicts complications but requires invasive hepatic venous pressure gradient measurement for diagnosis. Findings At the threshold maximizing the Youden index, sensitivity and specificity were 0.867 and 0.615, respectively. This DL model outperformed the ANTICIPATE model. Clinical relevance This study shows that a DL model can accurately predict CSPH from a single non-contrast CT image, providing a non-invasive alternative to invasive methods and aiding early detection and risk stratification in chronic liver disease without image manipulation.

CT Classification Abdominal Retrospective Clinical In Silico

Multitask radioclinical decision stratification in non-metastatic colon cancer: integrating MMR status, pT staging, and high-risk pathological factors.

Yang R, Liu J, Li L, Fan Y, Shu Y, Wu W, Shu J

•papers•Sep 22 2025

Constructing a multi-task global decision support system based on preoperative enhanced CT features to predict the mismatch repair (MMR) status, T stage, and pathological risk factors (e.g., histological differentiation, lymphovascular invasion) for patients with non-metastatic colon cancer. 372 eligible non-metastatic colon cancer (NMCC) participants (training cohort: n = 260; testing cohort: n = 112) were enrolled from two institutions. The 34 features (imaging features: n = 27; clinical features: n = 7) were subjected to feature selection using LASSO, Boruta, ReliefF, mRMR, and XGBoost-RFE, respectively. In each of the three categories-MMR, pT staging, and pathological risk factors-four features were selected to construct the total feature set. Subsequently, the multitask model was built with 14 machine learning algorithms. The predictive performance of the machine model was evaluated using the area under the receiver operating characteristic curve (AUC). The final feature set for constructing the model is based on the mRMR feature screening method. For the final MMR classification, pT staging, and pathological risk factors, SVC, Bernoulli NB, and Decision Tree algorithm were selected respectively, with AUC scores of 0.80 [95% CI 0.71-0.89], 0.82 [95% CI 0.71-0.94], and 0.85 [95% CI 0.77-0.93] on the test set. Furthermore, a direct multiclass model constructed using the total feature set resulted in an average AUC of 0.77 across four management plans in the test set. The multi-task machine learning model proposed in this study enables non-invasive and precise preoperative stratification of patients with NMCC based on MMR status, pT stage, and pathological risk factors. This predictive tool demonstrates significant potential in facilitating preoperative risk stratification and guiding individualized therapeutic strategies.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

The optimal diagnostic assistance system for predicting three-dimensional contact between mandibular third molars and the mandibular canal on panoramic radiographs.

Fukuda M, Nomoto D, Nozawa M, Kise Y, Kuwada C, Kubo H, Ariji E, Ariji Y

•papers•Sep 22 2025

This study aimed to identify the most effective diagnostic assistance system for assessing the relationship between mandibular third molars (M3M) and mandibular canals (MC) using panoramic radiographs. In total, 2,103 M3M were included from patients in whom the M3M and MC overlapped on panoramic radiographs. All M3M were classified into high-risk and low-risk groups based on the degree of contact with the MC observed on computed tomography. The contact classification was evaluated using four machine learning models (Prediction One software, AdaBoost, XGBoost, and random forest), three convolutional neural networks (CNNs) (EfficientNet-B0, ResNet18, and Inception v3), and three human observers (two radiologists and one oral surgery resident). Receiver operating characteristic curves were plotted; the area under the curve (AUC), accuracy, sensitivity, and specificity were calculated. Factors contributing to prediction of high-risk cases by machine learning models were identified. Machine learning models demonstrated AUC values ranging from 0.84 to 0.88, with accuracy ranging from 0.81 to 0.88 and sensitivity of 0.80, indicating consistently strong performance. Among the CNNs, ResNet18 achieved the best performance, with an AUC of 0.83. The human observers exhibited AUC values between 0.67 and 0.80. Three factors were identified as contributing to prediction of high-risk cases by machine learning models: increased root radiolucency, diversion of the MC, and narrowing of the MC. Machine learning models demonstrated strong performance in predicting the three-dimensional relationship between the M3M and MC.

X-Ray Classification Retrospective Clinical In Silico Academic Lab

Linking dynamic connectivity states to cognitive decline and anatomical changes in Alzheimer's disease.

Tessadori J, Galazzo IB, Storti SF, Pini L, Brusini L, Cruciani F, Sona D, Menegaz G, Murino V

•papers•Sep 22 2025

Alterations in brain connectivity provide early indications of neurodegenerative diseases like Alzheimer's disease (AD). Here, we present a novel framework that integrates a Hidden Markov Model (HMM) within the architecture of a convolutional neural network (CNN) to analyze dynamic functional connectivity (dFC) in resting-state functional magnetic resonance imaging (rs-fMRI). Our unsupervised approach captures recurring connectivity states in a large cohort of subjects spanning the Alzheimer's disease continuum, including healthy controls, individuals with mild cognitive impairment (MCI), and patients with clinically diagnosed AD. We propose a deep neural model with embedded HMM dynamics to identify stable recurring brain states from resting-state fMRI. These states exhibit distinct connectivity patterns and are differentially expressed across the Alzheimer's disease continuum. Our analysis shows that the fraction of time each state is active varies systematically with disease severity, highlighting dynamic network alterations that track neurodegeneration. Our findings suggest that the disruption of dynamic connectivity patterns in AD may follow a two-stage trajectory, where early shifts toward integrative network states give way to reduced connectivity organization as the disease progresses. This framework offers a promising tool for early diagnosis and monitoring of AD, and may have broader applications in the study of other neurodegenerative conditions.

MRI Classification Neurological Methodology In Silico

Comprehensive Assessment of Tumor Stromal Heterogeneity in Bladder Cancer by Deep Learning and Habitat Radiomics.

Du Y, Sui Y, Tao Y, Cao J, Jiang X, Yu J, Wang B, Wang Y, Li H

•papers•Sep 22 2025

Tumor stromal heterogeneity plays a pivotal role in bladder cancer progression. The tumor-stroma ratio (TSR) is a key pathological marker reflecting stromal heterogeneity. This study aimed to develop a preoperative, CT-based machine learning model for predicting TSR in bladder cancer, comparing various radiomic approaches, and evaluating their utility in prognostic assessment and immunotherapy response prediction. A total of 477 bladder urothelial carcinoma patients from two centers were retrospectively included. Tumors were segmented on preoperative contrast-enhanced CT, and radiomic features were extracted. K-means clustering was used to divide tumors into subregions. Radiomics models were constructed: a conventional model (Intra), a multi-subregion model (Habitat), and single-subregion models (HabitatH1/H2/H3). A deep transfer learning model (DeepL) based on the largest tumor cross-section was also developed. Model performance was evaluated in training, testing, and external validation cohorts, and associations with recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response were analyzed. The HabitatH1 model demonstrated robust diagnostic performance with favorable calibration and clinical utility. The DeepL model surpassed all radiomics models in predictive accuracy. A nomogram combining DeepL and clinical variables effectively predicted recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response. Imaging-predicted TSR showed significant associations with the tumor immune microenvironment and treatment outcomes. CT-based habitat radiomics and deep learning models enable non-invasive, quantitative assessment of TSR in bladder cancer. The DeepL model provides superior diagnostic and prognostic value, supporting personalized treatment decisions and prediction of immunotherapy response.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Enhancing Instance Feature Representation: A Foundation Model-Based Multi-Instance Approach for Neonatal Retinal Screening.

Guo J, Wang K, Tan G, Li G, Zhang X, Chen J, Hu J, Liang Y, Jiang B

•papers•Sep 22 2025

Automated analysis of neonatal fundus images presents a uniquely intricate challenge in medical imaging. Existing methodologies predominantly focus on diagnosing abnormalities from individual images, often leading to inaccuracies due to the diverse and subtle nature of neonatal retinal features. Consequently, clinical standards frequently mandate the acquisition of retinal images from multiple angles to ensure the detection of minute lesions. To accommodate this, we propose leveraging multiple fundus images captured from various regions of the retina to comprehensively screen for a wide range of neonatal ocular pathologies. We employ Multiple Instance Learning (MIL) for this task, and introduce a simple yet effective learnable structure on the existing MIL method, called Learnable Dense to Global (LD2G-MIL). Different from other methods that focus on instance-to-bag feature aggregation, the proposed method focuses on generating better instance-level representations that are co-optimized with downstream MIL targets in a learnable way. Additionally, it incorporates a bag prior-based similarity loss (BP loss) mechanism, leveraging prior knowledge to enhance performance in neonatal retinal screening. To validate the efficacy of our LD2G-MIL method, we compiled the Neonatal Fundus Images (NFI) dataset, an extensive collection comprising 115,621 retinal images from 8,886 neonatal clinical episodes. Empirical evaluations on this dataset demonstrate that our approach consistently outperforms stateof-the-art (SOTA) generic and specialized methods. The code and trained models are publicly available at https: //github.com/CVIU-CSU/LD2G-MIL.

OCT Classification Methodology In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

SeruNet-MS: A Two-Stage Interpretable Framework for Multiple Sclerosis Risk Prediction with SHAP-Based Explainability.

Aksoy S, Demircioglu P, Bogrekci I

•papers•Sep 22 2025

Background/Objectives: Multiple sclerosis (MS) is a chronic demyelinating disease where early identification of patients at risk of conversion from clinically isolated syndrome (CIS) to clinically definite MS remains a critical unmet clinical need. Existing machine learning approaches often lack interpretability, limiting clinical trust and adoption. The objective of this research was to develop a novel two-stage machine learning framework with comprehensive explainability to predict CIS-to-MS conversion while addressing demographic bias and interpretability limitations. Methods: A cohort of 177 CIS patients from the National Institute of Neurology and Neurosurgery in Mexico City was analyzed using SeruNet-MS, a two-stage framework that separates demographic baseline risk from clinical risk modification. Stage 1 applied logistic regression to demographic features, while Stage 2 incorporated 25 clinical and symptom features, including MRI lesions, cerebrospinal fluid biomarkers, electrophysiological tests, and symptom characteristics. Patient-level interpretability was achieved through SHAP (SHapley Additive exPlanations) analysis, providing transparent attribution of each factor's contribution to risk assessment. Results: The two-stage model achieved a ROC-AUC of 0.909, accuracy of 0.806, precision of 0.842, and recall of 0.800, outperforming baseline machine learning methods. Cross-validation confirmed stable performance (0.838 ± 0.095 AUC) with appropriate generalization. SHAP analysis identified periventricular lesions, oligoclonal bands, and symptom complexity as the strongest predictors, with clinical examples illustrating transparent patient-specific risk communication. Conclusions: The two-stage approach effectively mitigates demographic bias by separating non-modifiable factors from actionable clinical findings. SHAP explanations provide clinicians with clear, individualized insights into prediction drivers, enhancing trust and supporting decision making. This framework demonstrates that high predictive performance can be achieved without sacrificing interpretability, representing a significant step forward for explainable AI in MS risk stratification and real-world clinical adoption.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Ethics

Explainable AI-driven analysis of radiology reports using text and image data: An experimental study.

Zamir MT, Khan SU, Gelbukh A, Felipe Riverón EM, Gelbukh I

•papers•Sep 22 2025

Artificial intelligence is increasingly being integrated into clinical diagnostics, yet its lack of transparency hinders trust and adoption among healthcare professionals. The explainable AI (XAI) has the potential to improve interpretability and reliability of AI-based decisions in clinical practice. This study evaluates the use of Explainable AI (XAI) for interpreting radiology reports to improve healthcare practitioners' confidence and comprehension of AI-assisted diagnostics. This study employed the Indiana University chest X-ray Dataset containing 3169 textual reports and 6471 images. Textual were being classified as either normal or abnormal by using a range of machine learning approaches. This includes traditional machine learning models and ensemble methods, deep learning models (LSTM), and advanced transformer-based language models (GPT-2, T5, LLaMA-2, LLaMA-3.1). For image-based classifications, convolution neural networks (CNNs) including DenseNet121, and DenseNet169 were used. Top performing models were interpreted using Explainable AI (XAI) methods SHAP and LIME to support clinical decision making by enhancing transparency and trust in model predictions. LLaMA-3.1 model achieved highest accuracy of 98% in classifying the textual radiology reports. Statistical analysis confirmed the model robustness, with Cohen's kappa (k=0.981) indicating near perfect agreement beyond chance, both Chi-Square and Fisher's Exact test revealing a high significant association between actual and predicted labels (p<0.0001). Although McNemar's Test yielded a non-significant result (p=0.25) suggests balance class performance. While the highest accuracy of 84% was achieved in the analysis of imaging data using the DenseNet169 and DenseNet121 models. To assess explainability, LIME and SHAP were applied to best performing models. These models consistently highlighted the medical related terms such as "opacity", "consolidation" and "pleural" are clear indication for abnormal finding in textual reports. The research underscores that explainability is an essential component of any AI systems used in diagnostics and helpful in the design and implementation of AI in the healthcare sector. Such approach improves the accuracy of the diagnosis and builds confidence in health workers, who in the future will use explainable AI in clinical settings, particularly in the application of AI explainability for medical purposes.

X-Ray Classification Chest Methodology In Silico Academic Lab GenAI

Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models

Dingxin Lu, Shurui Wu, Xinyi Huang

•preprint•Sep 22 2025

With the rising global burden of chronic diseases and the multimodal and heterogeneous clinical data (medical imaging, free-text recordings, wearable sensor streams, etc.), there is an urgent need for a unified multimodal AI framework that can proactively predict individual health risks. We propose VL-RiskFormer, a hierarchical stacked visual-language multimodal Transformer with a large language model (LLM) inference head embedded in its top layer. The system builds on the dual-stream architecture of existing visual-linguistic models (e.g., PaLM-E, LLaVA) with four key innovations: (i) pre-training with cross-modal comparison and fine-grained alignment of radiological images, fundus maps, and wearable device photos with corresponding clinical narratives using momentum update encoders and debiased InfoNCE losses; (ii) a time fusion block that integrates irregular visit sequences into the causal Transformer decoder through adaptive time interval position coding; (iii) a disease ontology map adapter that injects ICD-10 codes into visual and textual channels in layers and infers comorbid patterns with the help of a graph attention mechanism. On the MIMIC-IV longitudinal cohort, VL-RiskFormer achieved an average AUROC of 0.90 with an expected calibration error of 2.7 percent.

Mixed Modality Classification Retrospective Clinical In Silico GenAI

Filter Papers

Tags

Machine learning predicts severe adverse events and salvage success of CT-guided lung biopsy after nondiagnostic transbronchial lung biopsy.

Deep-learning-based prediction of significant portal hypertension with single cross-sectional non-enhanced CT.

Multitask radioclinical decision stratification in non-metastatic colon cancer: integrating MMR status, pT staging, and high-risk pathological factors.

The optimal diagnostic assistance system for predicting three-dimensional contact between mandibular third molars and the mandibular canal on panoramic radiographs.

Linking dynamic connectivity states to cognitive decline and anatomical changes in Alzheimer's disease.

Comprehensive Assessment of Tumor Stromal Heterogeneity in Bladder Cancer by Deep Learning and Habitat Radiomics.

Enhancing Instance Feature Representation: A Foundation Model-Based Multi-Instance Approach for Neonatal Retinal Screening.

SeruNet-MS: A Two-Stage Interpretable Framework for Multiple Sclerosis Risk Prediction with SHAP-Based Explainability.

Explainable AI-driven analysis of radiology reports using text and image data: An experimental study.

Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models

Ready to Sharpen Your Edge?