Latest Papers on Radiology AI. Tags: Classification.

A Multimodal Classification Method for Nasal Obstruction Severity Based on Computed Tomography and Nasal Resistance.

Wang Q, Li S, Sun H, Cui S, Song W

•papers•Oct 3 2025

The assessment of the degree of nasal obstruction is valuable in disease diagnosis, quality of life assessment, and epidemiological studies. To this end, this article proposes a multimodal nasal obstruction degree classification model based on cone beam computed tomography (CBCT) images and nasal resistance measurements. The model consists of four modules: image feature extraction, table feature extraction, feature fusion, and classification. In the image feature extraction module, this article proposes a strategy of using the trained MedicalNet large model to get the pre-training parameters and then migrating them to the three-dimensional convolutional neural network (3D CNN) feature extraction model. For the nasal resistance measurement form data, a method based on extreme gradient boosting (XGBoost) feature importance analysis is proposed to filter key features to reduce the data dimension. In order to fuse the two types of modal data, a feature fusion method based on local and global features was designed. Finally, the fused features are classified using the tabular network (TabNet) model. In order to verify the effectiveness of the proposed method, comparison experiments and ablation experiments are designed, and the experimental results show that the accuracy and recall of the proposed multimodal classification model reach 0.93 and 0.9, respectively, which are significantly higher than other methods.

CT Classification Methodology In Silico Academic Lab

Brain metabolic imaging with 18 F-PET-CT and machine-learning clustering analysis reveal divergent metabolic phenotypes in patients with amyotrophic lateral sclerosis.

Zhang J, Han F, Wang X, Wu F, Song X, Liu Q, Wang J, Grecucci A, Zhang Y, Yi X, Chen BT

•papers•Oct 3 2025

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disorder characterized by significant clinicopathologic heterogeneity. This study aimed to identify distinct ALS phenotypes by integrating brain 18 F-fluorodeoxyglucose positron emission tomography-computed tomography (18 F-FDG PET-CT) metabolic imaging with consensus clustering data. This study prospectively enrolled 127 patients with ALS and 128 healthy controls. All participants underwent a brain 18 F-FDG-PET-CT metabolic imaging, psychological questionnaires, and functional screening. K-means consensus clustering was applied to define neuroimaging-based phenotypes. Survival analyses were also performed. Whole exome sequencing (WES) was utilized to detect ALS-related genetic mutations, followed by GO/KEGG pathway enrichment and imaging-transcriptome analysis based on the brain metabolic activity on the 18 F-FDG-PET-CT imaging. Consensus clustering identified two metabolic phenotypes, i.e., the metabolic attenuation phenotype and the metabolic non-attenuation phenotype according to their glucose metabolic activity pattern. The metabolic attenuation phenotype was associated with worse survival (p = 0.022), poorer physical function (p = 0.005), more severe depression (p = 0.026) and greater anxiety level (p = 0.05). WES testing and neuroimaging-transcriptome analysis identified specific gene mutations and molecular pathways with each phenotype. We identified two distinct ALS phenotypes with varying clinicopathologic features, indicating that the unsupervised machine learning applied to PET imaging may effectively classify metabolic subtypes of ALS. These findings contributed novel insights into the heterogeneous pathophysiology of ALS, which should inform personalized therapeutic strategies for patients with ALS.

PET Classification Neurological Prospective Clinical Pilot Benchmark SOTA

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Yıldırım C, Aykut A, Günsoy E, Öncül MV

•papers•Oct 2 2025

Large Language Models (LLMs), such as GPT-4o, are increasingly investigated for clinical decision support in emergency medicine. However, their real-world performance in disposition prediction remains insufficiently studied. This study evaluated the diagnostic accuracy of GPT-4o in predicting ED disposition-discharge, ward admission, or ICU admission-in complex emergency respiratory cases requiring pulmonology consultation and chest CT, representing a selective high-acuity subgroup of ED patients. We conducted a retrospective observational study in a tertiary ED between November 2024 and February 2025. We retrospectively included ED patients with complex respiratory presentations who underwent pulmonology consultation and chest CT, representing a selective high-acuity subgroup rather than the general ED respiratory population. GPT-4o was prompted to predict the most appropriate ED disposition using three progressively enriched input models: Model 1 (age, sex, oxygen saturation, home oxygen therapy, and venous blood gas parameters); Model 2 (Model 1 plus laboratory data); and Model 3 (Model 2 plus chest CT findings). Model performance was assessed using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. Among the 221 patients included, 69.2% were admitted to the ward, 9.0% to the intensive care unit (ICU), and 21.7% were discharged. For hospital admission prediction, Model 3 demonstrated the highest sensitivity (91.9%) and overall accuracy (76.5%), but the lowest specificity (20.8%). In contrast, for discharge prediction, Model 3 achieved the highest specificity (91.9%) but the lowest sensitivity (20.8%). Numerical improvements were observed across models, but none reached statistical significance (all p > 0.22). Model 1 therefore performed comparably to Models 2-3 while being less complex. Among patients who were discharged despite GPT-4o predicting admission, the 14-day ED re-presentation rates were 23.8% (5/21) for Model 1, 30.0% (9/30) for Model 2, and 28.9% (11/38) for Model 3. GPT-4o demonstrated high sensitivity in identifying ED patients requiring hospital admission, particularly those needing intensive care, when provided with progressively enriched clinical input. However, its low sensitivity for discharge prediction resulted in frequent overtriage, limiting its utility for autonomous decision-making. This proof-of-concept study demonstrates GPT-4o's capacity to stratify disposition decisions in complex respiratory cases under varying levels of limited input data. However, these findings should be interpreted in light of key limitations, including the selective high-acuity cohort and the absence of vital signs, and require prospective validation before clinical implementation.

CT Classification Chest Retrospective Clinical In Silico GenAI

GFSR-Net: Guided Focus via Segment-Wise Relevance Network for Interpretable Deep Learning in Medical Imaging

Jhonatan Contreras, Thomas Bocklitz

•preprint•Oct 2 2025

Deep learning has achieved remarkable success in medical image analysis, however its adoption in clinical practice is limited by a lack of interpretability. These models often make correct predictions without explaining their reasoning. They may also rely on image regions unrelated to the disease or visual cues, such as annotations, that are not present in real-world conditions. This can reduce trust and increase the risk of misleading diagnoses. We introduce the Guided Focus via Segment-Wise Relevance Network (GFSR-Net), an approach designed to improve interpretability and reliability in medical imaging. GFSR-Net uses a small number of human annotations to approximate where a person would focus within an image intuitively, without requiring precise boundaries or exhaustive markings, making the process fast and practical. During training, the model learns to align its focus with these areas, progressively emphasizing features that carry diagnostic meaning. This guidance works across different types of natural and medical images, including chest X-rays, retinal scans, and dermatological images. Our experiments demonstrate that GFSR achieves comparable or superior accuracy while producing saliency maps that better reflect human expectations. This reduces the reliance on irrelevant patterns and increases confidence in automated diagnostic tools.

X-Ray Classification Chest Methodology In Silico Ethics

Multilevel Correlation-aware and Modal-aware Graph Convolutional Network for Diagnosing Neurodevelopmental Disorders.

Zuo S, Li Y, Qi Y, Liu A

•papers•Oct 2 2025

Graph-based methods using resting-state functional magnetic resonance imaging demonstrate strong capabilities in modeling brain networks. However, existing graph-based methods often overlook inter-graph relationships, limiting their ability to capture the intrinsic features shared across individuals. Additionally, their simplistic integration strategies may fail to take full advantage of multimodal information. To address these challenges, this paper proposes a Multilevel Correlation-aware and Modal-aware Graph Convolutional Network (MCM-GCN) for the reliable diagnosis of neurodevelopmental disorders. At the individual level, we design a correlation-driven feature generation module that incorporates a pooling layer with external graph attention to perceive inter-graph correlations, generating discriminative brain embeddings and identifying disease-related regions. At the population level, to deeply integrate multimodal and multi-atlas information, a multimodal-decoupled feature enhancement module learns unique and shared embeddings from brain graphs and phenotypic data and then fuses them adaptively with graph channel attention for reliable disease classification. Extensive experiments on two public datasets for Autism Spectrum Disorder (ASD) and Attention Deficit Hyperactivity Disorder (ADHD) demonstrate that MCM-GCN outperforms other competing methods, with an accuracy of 92.88% for ASD and 76.55% for ADHD. The MCM-GCN framework integrates individual-level and population-level analyses, offering a comprehensive perspective for neurodevelopmental disorder diagnosis, significantly improving diagnostic accuracy while identifying key indicators. These findings highlight the potential of the MCM-GCN for imaging-assisted diagnosis of neurodevelopmental diseases, advancing interpretable deep learning in medical imaging analysis.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring

Han-Jay Shu, Wei-Ning Chiu, Shun-Ting Chang, Meng-Ping Huang, Takeshi Tohyama, Ahram Han, Po-Chih Kuo

•preprint•Oct 2 2025

Deep learning models achieve strong performance in chest radiograph (CXR) interpretation, yet fairness and reliability concerns persist. Models often show uneven accuracy across patient subgroups, leading to hidden failures not reflected in aggregate metrics. Existing error detection approaches -- based on confidence calibration or out-of-distribution (OOD) detection -- struggle with subtle within-distribution errors, while image- and representation-level consistency-based methods remain underexplored in medical imaging. We propose an augmentation-sensitivity risk scoring (ASRS) framework to identify error-prone CXR cases. ASRS applies clinically plausible rotations ($\pm 15^\circ$/$\pm 30^\circ$) and measures embedding shifts with the RAD-DINO encoder. Sensitivity scores stratify samples into stability quartiles, where highly sensitive cases show substantially lower recall ($-0.2$ to $-0.3$) despite high AUROC and confidence. ASRS provides a label-free means for selective prediction and clinician review, improving fairness and safety in medical AI.

X-Ray Classification Chest Methodology In Silico Academic Lab Ethics

Machine learning and quantitative computed tomography radiomics prediction of postoperative functional recovery in paraplegic dogs.

Low D, Rutherford S

•papers•Oct 2 2025

To develop a computed tomography (CT)-radiomics-based machine-learning algorithm for prediction of functional recovery in paraplegic dogs with acute intervertebral disc extrusion (IVDE). Multivariable prediction model development. Paraplegic dogs with acute IVDE: 128 deep-pain positive and 86 deep-pain negative (DPN). Radiomics features from noncontrast CT were combined with deep-pain perception in an extreme gradient algorithm using an 80:20 train-test split. Model performance was assessed on the independent test set (Testfull) and on the test set of DPN dogs (TestDPN). Deep-pain perception alone served as the control. Recovery of ambulation was recorded in 165/214 dogs (77.1%) after decompressive surgery. The model had an area under the receiver operating characteristic curve (AUC) of .9118 (95% CI: .8366-.9872), accuracy of 86.1% (95% CI: 74.4%-95.4%), sensitivity of 82.4% (95% CI: 68.6%-93.9%), and specificity of 100.0% (95% CI: 100.0%-100.0%) on Testfull, and an AUC of .7692 (95% CI: .6250-.9000), accuracy of 72.7% (95% CI: 50.0%-90.9%), sensitivity of 53.8% (95% CI: 25.0%-80.0%), and specificity of 100.0% (95% CI: 100.0%-100.0%) on TestDPN. Deep-pain perception had an AUC of .8088 (95% CI: .7273-.8871), accuracy of 69.8% (95% CI: 55.8%-83.7%), sensitivity of 61.8% (95% CI: 45.5%-77.4%), and specificity of 100.0% (95% CI: 100.0%-100.0%), which was different from that of the model (p = .02). Noncontrast CT-based radiomics provided prognostic information in dogs with severe spinal cord injury secondary to acute intervertebral disc extrusion. The model outperformed deep-pain perception alone in identifying dogs that recovered ambulation following decompressive surgery. Radiomics features from noncontrast CT, when integrated into a multimodal machine-learning algorithm, may be useful as an assistive tool for surgical decision making.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab

Disrupted network integrity and therapeutic plasticity in drug-naive panic disorders: Insights from network homogeneity.

Han Y, Yan H, Li H, Liu F, Li P, Yuan Y, Guo W

•papers•Oct 2 2025

This study intended to examine network homogeneity (NH) alterations in drug-naive patients with panic disorder (PD) before and after treatment and whether NH could serve as a potential biomarker. Fifty-eight patients and 85 healthy controls (HCs) underwent resting-state functional magnetic resonance imaging. Patients were rescanned following a 4-week course of paroxetine monotherapy. NH was computed to evaluate intra-network functional integration across the Yeo 7-Network. Machine learning (ML) was employed to assess the diagnostic and prognostic potential of NH metrics. Transcriptome-neuroimaging association analyses were conducted to explore the molecular correlates of NH alterations. Compared with HCs, patients showed disrupted intra-network integration in the frontoparietal, default mode, sensorimotor, limbic, and ventral attention networks, with prominent NH alterations in the superior frontal gyrus (SFG), middle temporal gyrus (MTG), superior temporal gyrus (STG), somatosensory cortex, insular, and anterior cingulate cortex. Importantly, the SFG, MTG, and STG demonstrated cross-network abnormalities. After treatment, clinical improvement correlated with normalized NH in the SFG and additional changes in the inferior occipital gyrus and calcarine sulcus within the visual network. ML demonstrated the utility of NH for PD classification and treatment outcome prediction. Transcriptome-neuroimaging analysis identified specific gene profiles related to NH alterations. NH reflects both pathological features and treatment-related changes in PD, providing a measure of network dysfunction and therapeutic response. Cross-network NH disruptions in hub regions and visual processing may reflect core neuropharmacological mechanisms underlying PD. ML findings support the potential of NH as a neuroimaging biomarker for diagnosis and treatment monitoring in PD.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multimodal Foundation Models for Early Disease Detection

Md Talha Mohsin, Ismail Abdulrashid

•preprint•Oct 2 2025

Healthcare generates diverse streams of data, including electronic health records (EHR), medical imaging, genetics, and ongoing monitoring from wearable devices. Traditional diagnostic models frequently analyze these sources in isolation, which constrains their capacity to identify cross-modal correlations essential for early disease diagnosis. Our research presents a multimodal foundation model that consolidates diverse patient data through an attention-based transformer framework. At first, dedicated encoders put each modality into a shared latent space. Then, they combine them using multi-head attention and residual normalization. The architecture is made for pretraining on many tasks, which makes it easy to adapt to new diseases and datasets with little extra work. We provide an experimental strategy that uses benchmark datasets in oncology, cardiology, and neurology, with the goal of testing early detection tasks. The framework includes data governance and model management tools in addition to technological performance to improve transparency, reliability, and clinical interpretability. The suggested method works toward a single foundation model for precision diagnostics, which could improve the accuracy of predictions and help doctors make decisions.

Mixed Modality Classification Methodology In Silico GenAI

SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification

Jong Bum Won, Wesley De Neve, Joris Vankerschaver, Utku Ozbulak

•preprint•Oct 2 2025

Deep neural networks (DNNs) have demonstrated remarkable success in medical imaging, yet their real-world deployment remains challenging due to spurious correlations, where models can learn non-clinical features instead of meaningful medical patterns. Existing medical imaging datasets are not designed to systematically study this issue, largely due to restrictive licensing and limited supplementary patient data. To address this gap, we introduce SpurBreast, a curated breast MRI dataset that intentionally incorporates spurious correlations to evaluate their impact on model performance. Analyzing over 100 features involving patient, device, and imaging protocol, we identify two dominant spurious signals: magnetic field strength (a global feature influencing the entire image) and image orientation (a local feature affecting spatial alignment). Through controlled dataset splits, we demonstrate that DNNs can exploit these non-clinical signals, achieving high validation accuracy while failing to generalize to unbiased test data. Alongside these two datasets containing spurious correlations, we also provide benchmark datasets without spurious correlations, allowing researchers to systematically investigate clinically relevant and irrelevant features, uncertainty estimation, adversarial robustness, and generalization strategies. Models and datasets are available at https://github.com/utkuozbulak/spurbreast.

MRI Classification Breast Dataset Release In Silico Academic Lab Open Dataset Reproducibility

Filter Papers

Tags

A Multimodal Classification Method for Nasal Obstruction Severity Based on Computed Tomography and Nasal Resistance.

Brain metabolic imaging with 18 F-PET-CT and machine-learning clustering analysis reveal divergent metabolic phenotypes in patients with amyotrophic lateral sclerosis.

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

GFSR-Net: Guided Focus via Segment-Wise Relevance Network for Interpretable Deep Learning in Medical Imaging

Multilevel Correlation-aware and Modal-aware Graph Convolutional Network for Diagnosing Neurodevelopmental Disorders.

Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring

Machine learning and quantitative computed tomography radiomics prediction of postoperative functional recovery in paraplegic dogs.

Disrupted network integrity and therapeutic plasticity in drug-naive panic disorders: Insights from network homogeneity.

Multimodal Foundation Models for Early Disease Detection

SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification

Ready to Sharpen Your Edge?