Latest Papers on Radiology AI. Tags: Classification

AI-driven reclassification of multiple sclerosis progression.

Ganjgahi H, Häring DA, Aarden P, Graham G, Sun Y, Gardiner S, Su W, Berge C, Bischof A, Fisher E, Gaetano L, Thoma SP, Kieseier BC, Nichols TE, Thompson AJ, Montalban X, Lublin FD, Kappos L, Arnold DL, Bermel RA, Wiendl H, Holmes CC

•papers•Aug 20 2025

Multiple sclerosis (MS) affects 2.9 million people. Traditional classification of MS into distinct subtypes poorly reflects its pathobiology and has limited value for prognosticating disease evolution and treatment response, thereby hampering drug discovery. Here we report a data-driven classification of MS disease evolution by analyzing a large clinical trial database (approximately 8,000 patients, 118,000 patient visits and more than 35,000 magnetic resonance imaging scans) using probabilistic machine learning. Four dimensions define MS disease states: physical disability, brain damage, relapse and subclinical disease activity. Early/mild/evolving (EME) MS and advanced MS represent two poles of a disease severity spectrum. Patients with EME MS show limited clinical impairment and minor brain damage. Transitions to advanced MS occur via brain damage accumulation through inflammatory states, with or without accompanying symptoms. Advanced MS is characterized by moderate to high disability levels, radiological disease burden and risk of disease progression independent of relapses, with little probability of returning to earlier MS states. We validated these results in an independent clinical trial database and a real-world cohort, totaling more than 4,000 patients with MS. Our findings support viewing MS as a disease continuum. We propose a streamlined disease classification to offer a unifying understanding of the disease, improve patient management and enhance drug discovery efficiency and precision.

MRI Classification Neurological Retrospective Clinical In Silico Benchmark SOTA

Deep learning approach for screening neonatal cerebral lesions on ultrasound in China.

Lin Z, Zhang H, Duan X, Bai Y, Wang J, Liang Q, Zhou J, Xie F, Shentu Z, Huang R, Chen Y, Yu H, Weng Z, Ni D, Liu L, Zhou L

•papers•Aug 20 2025

Timely and accurate diagnosis of severe neonatal cerebral lesions is critical for preventing long-term neurological damage and addressing life-threatening conditions. Cranial ultrasound is the primary screening tool, but the process is time-consuming and reliant on operator's proficiency. In this study, a deep-learning powered neonatal cerebral lesions screening system capable of automatically extracting standard views from cranial ultrasound videos and identifying cases with severe cerebral lesions is developed based on 8,757 neonatal cranial ultrasound images. The system demonstrates an area under the curve of 0.982 and 0.944, with sensitivities of 0.875 and 0.962 on internal and external video datasets, respectively. Furthermore, the system outperforms junior radiologists and performs on par with mid-level radiologists, with 55.11% faster examination efficiency. In conclusion, the developed system can automatically extract standard views and make correct diagnosis with efficiency from cranial ultrasound videos and might be useful to deploy in multiple application scenarios.

Ultrasound Classification Neurological Retrospective Clinical In Silico Academic Lab

Differentiation of Suspicious Microcalcifications Using Deep Learning: DCIS or IDC.

Xu W, Deng S, Mao G, Wang N, Huang Y, Zhang C, Sa G, Wu S, An Y

•papers•Aug 20 2025

To explore the value of a deep learning-based model in distinguishing between ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC) manifesting suspicious microcalcifications on mammography. A total of 294 breast cancer cases (106 DCIS and 188 IDC) from two centers were randomly allocated into training, internal validation and external validation sets in this retrospective study. Clinical variables differentiating DCIS from IDC were identified through univariate and multivariate analyses and used to build a clinical model. Deep learning features were extracted using Resnet101 and selected by minimum redundancy maximum correlation (mRMR) and least absolute shrinkage and selection operator (LASSO). A deep learning model was developed using deep learning features, and a combined model was constructed by combining these features with clinical variables. The area under the receiver operating characteristic curve (AUC) was used to assess the performance of each model. Multivariate logistic regression identified lesion type and BI-RADS category as independent predictors for differentiating DCIS from IDC. The clinical model incorporating these factors achieved an AUC of 0.67, sensitivity of 0.53, specificity of 0.81, and accuracy of 0.63 in the external validation set. In comparison, the deep learning model showed an AUC of 0.97, sensitivity of 0.94 and specificity of 0.92, accuracy of 0.93. For the combined model, the AUC, sensitivity, specificity and accuracy were 0.97, 0.96, 0.92 and 0.95, respectively. The diagnostic efficacy of the deep learning model and combined model was comparable (p>0.05), and both models outperformed the clinical model (p<0.05). Deep learning provides an effective non-invasive approach to differentiate DCIS from IDC presenting as suspicious microcalcifications on mammography.

Mammography Classification Breast Retrospective Clinical In Silico Academic Lab

Cohort-Aware Agents for Individualized Lung Cancer Risk Prediction Using a Retrieval-Augmented Model Selection Framework

Chongyu Qu, Allen J. Luna, Thomas Z. Li, Junchao Zhu, Junlin Guo, Juming Xiong, Kim L. Sandler, Bennett A. Landman, Yuankai Huo

•preprint•Aug 20 2025

Accurate lung cancer risk prediction remains challenging due to substantial variability across patient populations and clinical settings -- no single model performs best for all cohorts. To address this, we propose a personalized lung cancer risk prediction agent that dynamically selects the most appropriate model for each patient by combining cohort-specific knowledge with modern retrieval and reasoning techniques. Given a patient's CT scan and structured metadata -- including demographic, clinical, and nodule-level features -- the agent first performs cohort retrieval using FAISS-based similarity search across nine diverse real-world cohorts to identify the most relevant patient population from a multi-institutional database. Second, a Large Language Model (LLM) is prompted with the retrieved cohort and its associated performance metrics to recommend the optimal prediction algorithm from a pool of eight representative models, including classical linear risk models (e.g., Mayo, Brock), temporally-aware models (e.g., TDVIT, DLSTM), and multi-modal computer vision-based approaches (e.g., Liao, Sybil, DLS, DLI). This two-stage agent pipeline -- retrieval via FAISS and reasoning via LLM -- enables dynamic, cohort-aware risk prediction personalized to each patient's profile. Building on this architecture, the agent supports flexible and cohort-driven model selection across diverse clinical populations, offering a practical path toward individualized risk assessment in real-world lung cancer screening.

CT Classification Chest Methodology In Silico GenAI

A machine learning-based decision support tool for standardizing intracavitary versus interstitial brachytherapy technique selection in high-dose-rate cervical cancer.

Kajikawa T, Masui K, Sakai K, Takenaka T, Suzuki G, Yoshino Y, Nemoto H, Yamazaki H, Yamada K

•papers•Aug 20 2025

To develop and evaluate a machine-learning (ML) decision-support tool that standardizes selection of intracavitary brachytherapy (ICBT) versus hybrid intracavitary/interstitial brachytherapy (IC/ISBT) in high-dose-rate (HDR) cervical cancer. We retrospectively analyzed 159 HDR brachytherapy plans from 50 consecutive patients treated between April 2022 and June 2024. Brachytherapy techniques (ICBT or IC/ISBT) were determined by an experienced radiation oncologist using CT/MRI-based 3-D image-guided brachytherapy. For each plan, 144 shape- and distance-based geometric features describing the high-risk clinical target volume (HR-CTV), bladder, rectum, and applicator were extracted. Nested five-fold cross-validation combined minimum-redundancy-maximum-relevance feature selection with five classifiers (k-nearest neighbors, logistic regression, naïve Bayes, random forest, support-vector classifier) and two voting ensembles (hard and soft voting). Model performance was benchmarked against single-factor rules (HR-CTV > 30 cm³; maximum lateral HR-CTV-tandem distance > 25 mm). Logistic regression achieved the highest test accuracy 0.849 ± 0.023 and a mean area-under-the-curve (AUC) 0.903 ± 0.033, outperforming the volume rule and matching the distance rule's AUC 0.907 ± 0.057 while providing greater accuracy 0.805 ± 0.114. These differences were not statistically significant. Feature-importance analysis showed that the maximum HR-CTV-tandem lateral distance and the bladder's minimal short-axis length consistently dominated model decisions. CONCLUSIONS: A compact ML tool using two readily measurable geometric features can reliably assist clinicians in choosing between ICBT and IC/ISBT, thereby reducing inter-physician variability and promoting standardized HDR cervical brachytherapy technique selection.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab

AlzFormer: Multi-modal framework for Alzheimer's classification using MRI and graph-embedded demographics guided by adaptive attention gating.

Hussain SS, Degang X, Shah PM, Khan H, Zeb A

•papers•Aug 20 2025

Alzheimer's disease (AD) is the most common neurodegenerative progressive disorder and the fifth-leading cause of death in older people. The detection of AD is a very challenging task for clinicians and radiologists due to the complex nature of this disease, thus requiring automatic data-driven machine-learning models to enhance diagnostic accuracy and support expert decision-making. However, machine learning models are hindered by three key limitations, in AD classification:(i) diffuse and subtle structural changes in the brain that make it difficult to capture global pathology (ii) non-uniform alterations across MRI planes, which limit single-view learning and (iii) the lack of deep integration of demographic context, which is often ignored despite its clinical importance. To address these challenges in this paper, we propose a novel multi-modal deep learning framework, named AlzFormer, that dynamically integrates 3D MRI with demographic features represented as knowledge graph embeddings for AD classification. Specifically, (i) to capture global and volumetric features, a 3D CNN is employed; (ii) to model plane-specific information, three parallel 2D CNNs are used for tri-planar processing (axial, coronal, sagittal), combined with a Transformer encoder; and (iii) to incorporate demographic context, we integrate demographic features as knowledge graph embeddings through a novel Adaptive Attention Gating mechanism that balances contributions from both modalities (i.e., MRI and demographics). Comprehensive experiments on two real-world datasets, including generalization tests, ablation studies, and robustness evaluation under noisy conditions, demonstrate that the proposed model provides a robust and effective solution for AD diagnosis. These results suggest strong potential for integration into Clinical Decision Support Systems (CDSS), offering a more interpretable and personalized approach to early Alzheimer's detection.

MRI Classification Neurological Methodology In Silico

Classification of familial and non-familial ADHD using auto-encoding network and binary hypothesis testing

Baboli, R., Martin, E., Qiu, Q., Zhao, L., Liu, T., Li, X.

•preprint•Aug 19 2025

Family history is one the most powerful risk factor for attention-deficit/hyperactivity disorder (ADHD), yet no study has tested whether multimodal Magnetic Resonance Imaging (MRI) combined with deep learning can separate familial ADHD (ADHD-F) and non-familial ADHD (ADHD-NF). T1-weighted and diffusion-weighted MRI data from 438 children (129 ADHD-F, 159 ADHD-NF, and 150 controls) were parcellated into 425 cortical and white-matter metrics. Our pipeline combined three feature-selection steps (t-test filtering, mutual-information ranking, and Lasso) with an auto-encoder and applied the binary-hypothesis strategy throughout; each held-out subject was assigned both possible labels in turn and evaluated under leave-one-out testing nested within five-fold cross-validation. Accuracy, sensitivity, specificity, and area under the curve (AUC) quantified performance. The model achieved accuracies/AUCs of 0.66 / 0.67 for ADHD-F vs controls, 0.67 / 0.70 for ADHD-NF vs controls, and 0.62 / 0.67 for ADHD-F vs ADHD-NF. In classification between ADHD-F and controls, the most informative metrics were the mean diffusivity (MD) of the right fornix, the MD of the left parahippocampal cingulum, and the cortical thickness of the right inferior parietal cortex. In classification between ADHD-NF and controls, the key contributors were the fractional anisotropy (FA) of the left inferior fronto-occipital fasciculus, the MD of the right fornix, and the cortical thickness of the right medial orbitofrontal cortex. In classification between ADHD-F and ADHD-NF, the highlighted features were the volume of the left cingulate cingulum tract, the volume of the right parietal segment of the superior longitudinal fasciculus, and the cortical thickness of the right fusiform cortex. Our binary hypothesis semi-supervised deep learning framework reliably separates familial and non-familial ADHD and shows that advanced semi-supervised deep learning techniques can deliver robust, generalizable neurobiological markers for neurodevelopmental disorders.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A Multimodal Large Language Model as an End-to-End Classifier of Thyroid Nodule Malignancy Risk: Usability Study.

Sng GGR, Xiang Y, Lim DYZ, Tung JYM, Tan JH, Chng CL

•papers•Aug 19 2025

Thyroid nodules are common, with ultrasound imaging as the primary modality for their assessment. Risk stratification systems like the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) have been developed but suffer from interobserver variability and low specificity. Artificial intelligence, particularly large language models (LLMs) with multimodal capabilities, presents opportunities for efficient end-to-end diagnostic processes. However, their clinical utility remains uncertain. This study evaluates the accuracy and consistency of multimodal LLMs for thyroid nodule risk stratification using the ACR TI-RADS system, examining the effects of model fine-tuning, image annotation, prompt engineering, and comparing open-source versus commercial models. In total, 3 multimodal vision-language models were evaluated: Microsoft's open-source Large Language and Visual Assistant (LLaVA) model, its medically fine-tuned variant (Large Language and Vision Assistant for bioMedicine [LLaVA-Med]), and OpenAI's commercial o3 model. A total of 192 thyroid nodules from publicly available ultrasound image datasets were assessed. Each model was evaluated using 2 prompts (basic and modified) and 2 image scenarios (unlabeled vs radiologist-annotated), yielding 6912 responses. Model outputs were compared with expert ratings for accuracy and consistency. Statistical comparisons included Chi-square tests, Mann-Whitney U tests, and Fleiss' kappa for interrater reliability. Overall, 88.4% (6110/6912) of responses were valid, with the o3 model producing the highest validity rate (2273/2304, 98.6%), followed by LLaVA (2108/2304, 91.5%) and LLaVA-Med (1729/2304, 75%; P<.001). The o3 model demonstrated the highest accuracy overall, achieving up to 57.3% accuracy in Thyroid Imaging Reporting and Data System (TI-RADS) classification, although still remaining suboptimal. Labeled images improved accuracy marginally in nodule margin assessment only when evaluating LLaVA models (407/768, 53% to 447/768, 58.2%; P=.04). Prompt engineering improved accuracy for composition (649/1,152, 56.3% vs 483/1152, 41.9%; P<.001), but significantly reduced accuracy for shape, margins, and overall classification. Consistency was the highest with the o3 model (up to 85.4%), but was comparable for LLaVA and significantly improved with image labeling and modified prompts across multiple TI-RADS categories (P<.001). Subgroup analysis for o3 alone showed prompt engineering did not affect accuracy significantly but markedly improved consistency across all TI-RADS categories (up to 97.1% for shape, P<.001). Interrater reliability was consistently poor across all combinations (Fleiss' kappa<0.60). The study demonstrates the comparative advantages and limitations of multimodal LLMs for thyroid nodule risk stratification. While the commercial model (o3) consistently outperformed open-source models in accuracy and consistency, even the best-performing model outputs remained suboptimal for direct clinical deployment. Prompt engineering significantly enhanced output consistency, particularly in the commercial model. These findings underline the importance of strategic model optimization techniques and highlight areas requiring further development before multimodal LLMs can be reliably used in clinical thyroid imaging workflows.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Ferroelectric/Antiferroelectric HfZrOx Artificial Synapses/Neurons for Convolutional Neural Network-Spiking Neural Network Neuromorphic Computing.

Zhang J, Xu K, Lu L, Lu C, Tao X, Liu Y, Yu J, Meng J, Zhang DW, Wang T, Chen L

•papers•Aug 19 2025

Brain-inspired neuromorphic computing offers significant potential for efficient and adaptive computational platforms. Emerging ferroelectric and antiferroelectric HfZrOx devices provide key roles in convolutional neural network (CNN) and spiking neural network (SNN) computing with unique polarization switching characteristics. Here, we present ferroelectric/antiferroelectric HfZrOx devices to realize functions of artificial synapse/neurons by element doping engineering. The HfZrOx-based ferroelectric and antiferroelectric devices exhibit excellent endurance characteristics of 1 × 109 cycles. Based on the non-volatile polarization switching and spontaneous depolarization nature of ferroelectric and antiferroelectric devices, integrate-and-fire behaviors were constructed for neuromorphic computing. For the first time, a complementary ferroelectric/antiferroelectric HfZrOx artificial synapse/neuron-based hybrid CNN-SNN framework was constructed for energy-efficient cardiac magnetic resonance imaging (MRI) classification. The hybrid neural network breaks the limitation of pure SNN in 3D image recognition and improves the accuracy from 82.3 to 92.7% compared to pure CNN, highlighting the potential of composition-engineered ferroelectric materials to implement high-efficiency neuromorphic computing.

MRI Classification Cardiac Methodology Concept GenAI

ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery

Mohammad Izadi, Mehran Safayani

•preprint•Aug 19 2025

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition marked by disruptions in brain connectivity. Functional MRI (fMRI) offers a non-invasive window into large-scale neural dynamics by measuring blood-oxygen-level-dependent (BOLD) signals across the brain. These signals can be modeled as interactions among Regions of Interest (ROIs), which are grouped into functional communities based on their underlying roles in brain function. Emerging evidence suggests that connectivity patterns within and between these communities are particularly sensitive to ASD-related alterations. Effectively capturing these patterns and identifying interactions that deviate from typical development is essential for improving ASD diagnosis and enabling biomarker discovery. In this work, we introduce ASDFormer, a Transformer-based architecture that incorporates a Mixture of Pooling-Classifier Experts (MoE) to capture neural signatures associated with ASD. By integrating multiple specialized expert branches with attention mechanisms, ASDFormer adaptively emphasizes different brain regions and connectivity patterns relevant to autism. This enables both improved classification performance and more interpretable identification of disorder-related biomarkers. Applied to the ABIDE dataset, ASDFormer achieves state-of-the-art diagnostic accuracy and reveals robust insights into functional connectivity disruptions linked to ASD, highlighting its potential as a tool for biomarker discovery.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Filter Papers

Tags

AI-driven reclassification of multiple sclerosis progression.

Deep learning approach for screening neonatal cerebral lesions on ultrasound in China.

Differentiation of Suspicious Microcalcifications Using Deep Learning: DCIS or IDC.

Cohort-Aware Agents for Individualized Lung Cancer Risk Prediction Using a Retrieval-Augmented Model Selection Framework

A machine learning-based decision support tool for standardizing intracavitary versus interstitial brachytherapy technique selection in high-dose-rate cervical cancer.

AlzFormer: Multi-modal framework for Alzheimer's classification using MRI and graph-embedded demographics guided by adaptive attention gating.

Classification of familial and non-familial ADHD using auto-encoding network and binary hypothesis testing

A Multimodal Large Language Model as an End-to-End Classifier of Thyroid Nodule Malignancy Risk: Usability Study.

Ferroelectric/Antiferroelectric HfZrO<sub><i>x</i></sub> Artificial Synapses/Neurons for Convolutional Neural Network-Spiking Neural Network Neuromorphic Computing.

ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery

Ready to Sharpen Your Edge?