Latest Papers on Radiology AI. Tags: None

Assessing Inter-rater Reliability of ChatGPT-4 and Orthopaedic Clinicians in Radiographic Fracture Classification.

Walker AN, Smith JB, Simister SK, Patel O, Choudhary S, Seidu M, Dallas-Orr D, Tse S, Shahzad H, Wise P, Scott M, Saiz AM, Lum ZC

•papers•Sep 19 2025

To assess the inter-rater reliability of ChatGPT-4 to that of orthopaedic surgery attendings and residents in classifying fractures on upper extremity (UE) and lower extremity (LE) radiographs. 84 radiographs of various fracture patterns were collected from publicly available online repositories. These images were presented to ChatGPT-4 with the prompt asking it to identify the view, body location, fracture type, and AO/OTA fracture classification. Two orthopaedic surgery residents and two attending orthopaedic surgeons also independently reviewed the images and identified the same categories. Fleiss' Kappa values were calculated to determine inter-rater reliability (IRR) for the following: All Raters Combined, AI vs. Residents (AIR); AI vs. Attendings (AIA); Attendings vs. Residents (AR). ChatGPT-4 achieved substantial to almost perfect agreement with clinicians on location (UE: κ = 0.655-0.708, LE: κ = 0.834-0.909) and fracture type (UE: κ = 0.546-0.563, LE: κ = 0.58-0.697). For view, ChatGPT-4 showed consistent fair agreement for both UE (κ = 0.370-0.404) and LE (κ = 0.309-0.390). ChatGPT-4 struggled the most with AO/OTA classification achieving slight agreement for UE (κ = -0.062-0.159) and moderate agreement for LE (κ = 0.418-0.455). IRR for AIR was consistently lower than IRR for AR. For AR comparisons, almost perfect agreement was observed for location (UE: κ = 0.896, LE: κ = 0.912) and fracture type (UE: κ = 0.948, LE: κ = 0.859), while AO/OTA classification showed fair agreement for UE (κ = 0.257) and moderate for LE (κ = 0.517). The p-values for all comparison groups were significant except for LE AO/OTA classification between AI and residents (p = 0.051). Although ChatcGPT-4 showed promise in classifying basic fracture features, it was not yet at a level comparable to experts, especially with more nuanced interpretations. These findings suggest that the use of AI is more effective as an adjunct to the judgment of trained clinicians rather than a replacement for it.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Lightweight Transfer Learning Models for Multi-Class Brain Tumor Classification: Glioma, Meningioma, Pituitary Tumors, and No Tumor MRI Screening.

Gorenshtein A, Liba T, Goren A

•papers•Sep 19 2025

Glioma, pituitary tumors, and meningiomas constitute the major types of primary brain tumors. The challenge in achieving a definitive diagnosis stem from the brain's complex structure, limited accessibility for precise imaging, and the resemblance between different types of tumors. An alternative and promising solution is the application of artificial intelligence (AI), specifically through deep learning models. We developed multiple lightweight deep learning models ResNet-18 (both pretrained on ImageNet and trained from scratch), ResNet-34, ResNet-50, and a custom CNN to classify glioma, meningioma, pituitary tumor, and no tumor MRI scans. A dataset of 7023 images was employed, split into 5712 for training and 1311 for validation. Each model was evaluated via accuracy, area under the curve (AUC), sensitivity, specificity, and confusion matrices. We compared our models to SOTA methods such as SAlexNet and TumorGANet, highlighting computational efficiency and classification performance. ResNet pretrained achieved 98.5-99.2% accuracy and near-perfect validation metrics, with an overall AUC of 1.0 and average sensitivity and specificity both exceeding 97% across the four classes. In comparison, ResNet-18 trained from scratch and the custom CNN achieved 91.99% and 87.03% accuracy, respectively, with AUCs ranging from 0.94 to 1.00. Error analysis revealed moderate misclassification of meningiomas as gliomas in non-pretrained models. Learning rate optimization facilitated stable convergence, and loss metrics indicated effective generalization with minimal overfitting. Our findings confirm that a moderately sized, transfer-learned network (ResNet-18) can deliver high diagnostic accuracy and robust performance for four-class brain tumor classification. This approach aligns with the goal of providing efficient, accurate, and easily deployable AI solutions, particularly for smaller clinical centers with limited computational resources. Future studies should incorporate multi-sequence MRI and extended patient cohorts to further validate these promising results.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

Tran HH, Thu A, Twayana AR, Fuertes A, Gonzalez M, Basta M, James M, Mehta KA, Elias D, Figaro YM, Islek D, Frishman WH, Aronow WS

•papers•Sep 19 2025

Artificial intelligence (AI)-enabled multimodal cardiovascular imaging holds significant promise for improving diagnostic accuracy, enhancing risk stratification, and supporting clinical decision-making. However, its translation into routine practice remains limited by multiple technical, infrastructural, and clinical barriers. This review synthesizes current challenges, including variability in image quality, alignment, and acquisition protocols; scarcity of large, annotated multimodality datasets; interoperability limitations across vendors and institutions; clinical skepticism due to limited prospective validation; and substantial development and implementation costs. Drawing from recent advances, we outline future research priorities to bridge the gap between technical feasibility and clinical utility. Key strategies include developing unified, vendor-agnostic AI models resilient to inter-institutional variability; integrating diverse data types such as genomics, wearable biosensors, and longitudinal clinical records; leveraging reinforcement learning for adaptive decision-support systems; and employing longitudinal imaging fusion for disease tracking and predictive analytics. We emphasize the need for rigorous prospective clinical trials, harmonized imaging standards, and collaborative data-sharing frameworks to ensure robust, equitable, and scalable deployment. Addressing these challenges through coordinated multidisciplinary efforts will be essential to realize the full potential of AI-driven multimodal cardiovascular imaging in advancing precision cardiovascular care.

Mixed Modality Registration Cardiac Review Concept Academic Lab Benchmark SOTA

MUSCLE: A New Perspective to Multi-scale Fusion for Medical Image Classification based on the Theory of Evidence.

Qiu J, Cao J, Huang Y, Zhu Z, Wang F, Lu C, Li Y, Zheng Y

•papers•Sep 19 2025

In the field of medical image analysis, medical image classification is one of the most fundamental and critical tasks. Current researches often rely on the off-the-shelf backbone networks derived from the field of computer vision, hoping to achieve satisfactory classification performance for medical images. However, given the characteristics of medical images, such as scattered distribution and varying sizes of lesions, features extracted with a single scale from the existing backbones often fail to perform accurate medical image classification. To this end, we propose a novel multi-scale learning paradigm, namely MUlti-SCale Learning with trusted Evidences (MUSCLE), which extracts and integrates features from different scales based on the theory of evidence, to generate the more comprehensive feature representation for the medical image classification task. Particularly, the proposed MUSCLE first estimates the uncertainties of features extracted from different scales/stages of the classification backbone as the evidences, and accordingly form the opinions regarding to the feature trustworthiness via a set of evidential deep neural networks. Then, these opinions on different scales of features are ensembled to yield an aggregated opinion, which can be used to adaptively tune the weights of multi-scale features for scatteredly distributed and size-varying lesions, and consequently improve the network capacity for accurate medical image classification. Our MUSCLE paradigm has been evaluated on five publicly available medical image datasets. The experimental results show that the proposed MUSCLE not only improves the accuracy of the original backbone network, but also enhances the reliability and interpretability of model decisions with the trusted evidences (https://github.com/Q4CS/MUSCLE).

Classification Methodology In Silico Open Code

MFFC-Net: Multi-feature Fusion Deep Networks for Classifying Pulmonary Edema of a Pilot Study by Using Lung Ultrasound Image with Texture Analysis and Transfer Learning Technique.

Bui NT, Luoma CE, Zhang X

•papers•Sep 19 2025

Lung ultrasound (LUS) has been widely used by point-of-care systems in both children and adult populations to provide different clinical diagnostics. This research aims to develop an interpretable system that uses a deep fusion network for classifying LUS video/patients based on extracted features by using texture analysis and transfer learning techniques to assist physicians. The pulmonary edema dataset includes 56 LUS videos and 4234 LUS frames. The COVID-BLUES dataset includes 294 LUS videos and 15,826 frames. The proposed multi-feature fusion classification network (MFFC-Net) includes the following: (1) two features extracted from Inception-ResNet-v2, Inception-v3, and 9 texture features of gray-level co-occurrence matrix (GLCM) and histogram of the region of interest (ROI); (2) a neural network for classifying LUS images with feature fusion input; and (3) four models (i.e., ANN, SVM, XGBoost, and kNN) used for classifying COVID/NON COVID patients. The training process was evaluated based on accuracy (0.9969), F1-score (0.9968), sensitivity (0.9967), specificity (0.9990), and precision (0.9970) metrics after the fivefold cross-validation stage. The results of the ANOVA analysis with 9 features of LUS images show that there was a significant difference between pulmonary edema and normal lungs (p < 0.01). The test results at the frame level of the MFFC-Net model achieved an accuracy of 100% and ROC-AUC (1.000) compared with ground truth at the video level with 4 groups of LUS videos. Test results at the patient level with the COVID-BLUES dataset achieved the highest accuracy of 81.25% with the kNN model. The proposed MFFC-Net model has 125 times higher information density (ID) compared to Inception-ResNet-v2 and 53.2 times compared with Inception-v3.

Ultrasound Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

MDFNet: a multi-dimensional feature fusion model based on structural magnetic resonance imaging representations for brain age estimation.

Zhang C, Nan P, Song L, Wang Y, Su K, Zheng Q

•papers•Sep 18 2025

Brain age estimation plays a significant role in understanding the aging process and its relationship with neurodegenerative diseases. The aim of the study is to devise a unified multi-dimensional feature fusion model (MDFNet) to enhance the brain age estimation solely on structural MRI but with a diverse representation of whole brain, tissue segmentation of gray matter volume, node message passing of brain network, edge-based graph path convolution of brain connectivity, and demographic data. The MDFNet was developed by devising and integrating a whole-brain-level Euclidean-Convolution channel (WBEC-channel), a tissue-level Euclidean-convolution channel (TEC-channel), a Graph-convolution channel based on node message passing (nodeGCN-channel) and an edge-based graph path convolution channel on brain connectivity (edgeGCN-channel), and a multilayer perceptron (MLP) channel for demographic data (MLP-channel) to enhance the multi-dimensional feature fusion. The MDFNet was validated on 1872 healthy subjects from four public datasets, and applied to an independent cohort of Alzheimer's Disease (AD) patients. The interpretability analysis and normative modeling of the MDFNet in brain age estimation were also performed. The MDFNet achieved a superior performance of Mean Absolute Error (MAE) of 4.396 ± 0.244 years, a Pearson Correlation Coefficient (PCC) of 0.912 ± 0.002, and a Spearman's Rank Correlation (SRCC) of 0.819 ± 0.015 when comparing with the state-of-the-art deep learning models. The AD group exhibited a significantly greater brain age gap (BAG) than health group (P < 0.05), and the normative modeling also exhibited a significantly higher mean Z-scores of AD patients than healthy subjects (P < 0.05). The interpretability was also visualized at both the group and individual level, enhancing the reliability of the MDFNet. The MDFNet enhanced the brain age estimation solely on structural MRI by employing a multi-dimensional feature integration strategy.

MRI Registration Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Mamba-Enhanced Diffusion Model for Perception-Aware Blind Super-Resolution of Magnetic Resonance Imaging.

Zhao X, Yang X, Song Z

•papers•Sep 18 2025

High-resolution magnetic resonance imaging (HR MRI) can provide accurate and rich information for doctors to better detect subtle lesions, delineate tumor boundaries, evaluate small anatomical structures, and assess early-stage pathological changes that might be obscured in lower resolution images. However, the acquisition of HR MRI images often requires prolonged scanning time, which causes the patient's physical and mental discomfort. The patient's slight movement may produce the motion artifacts and make the obtained MRI image become blurry, affecting the accuracy of clinical diagnosis. To tackle these problems, we propose a novel method, Mamba-enhanced Diffusion Model (MDM) for perception-aware blind super-resolution of Magnetic Resonance Imaging, which includes two important components: kernel noise estimator and SR reconstructor. Specifically, we propose a Perception-aware Blur Kernel Noise estimator (PBKN estimator), which takes advantage of the diffusion model to estimate the blur kernel from lowresolution images. Meanwhile, we construct a novel progressive feature reconstructor, which takes the estimated blur kernel and the content information of LR images as prior knowledge to reconstruct more accurate SR MRI images by using diffusion model. Moreover, we design a novel Semantic Information Fusion Mamba (SIF-Mamba) module for the SR reconstruction task. SIF-Mamba is specifically designed in the progressive feature reconstructor to capture the global context of MRI images and improve the feature reconstruction. The extensive experiments demonstrate that our proposed MDM achieves better SR reconstruction results than several outstanding methods. Our codes are available at https://github.com/YXDBright/MDM.

MRI Reconstruction Methodology In Silico Academic Lab Open Code

Optimized deep learning-accelerated single-breath-hold abdominal HASTE with and without fat saturation improves and accelerates abdominal imaging at 3 Tesla.

Tan Q, Kubicka F, Nickel D, Weiland E, Hamm B, Geisel D, Wagner M, Walter-Rittel TC

•papers•Sep 18 2025

Deep learning-accelerated single-shot turbo-spin-echo techniques (DL-HASTE) enable single-breath-hold T2-weighted abdominal imaging. However, studies evaluating the image quality of DL-HASTE with and without fat saturation (FS) remain limited. This study aimed to prospectively evaluate the technical feasibility and image quality of abdominal DL-HASTE with and without FS at 3 Tesla. DL-HASTE of the upper abdomen was acquired with variable sequence parameters regarding FS, flip angle (FA) and field of view (FOV) in 10 healthy volunteers and 50 patients. DL-HASTE sequences were compared to clinical sequences (HASTE, HASTE-FS and T2-TSE-FS BLADE). Two radiologists independently assessed the sequences regarding scores of overall image quality, delineation of abdominal organs, artifacts and fat saturation using a Likert scale (range: 1-5). Breath-hold time of DL-HASTE and DL-HASTE-FS was 21 ± 2 s with fixed FA and 20 ± 2 s with variable FA (p < 0.001), with no overall image quality difference (p > 0.05). DL-HASTE required a 10% larger FOV than DL-HASTE-FS to avoid aliasing artifacts from subcutaneous fat. Both DL-HASTE and DL-HASTE-FS had significantly higher overall image quality scores than standard HASTE acquisitions (DL-HASTE vs. HASTE: 4.8 ± 0.40 vs. 4.1 ± 0.50; DL-HASTE-FS vs. HASTE-FS: 4.6 ± 0.50 vs. 3.6 ± 0.60; p < 0.001). Compared to the T2-TSE-FS BLADE, DL-HASTE-FS provided higher overall image quality (4.6 ± 0.50 vs. 4.3 ± 0.63, p = 0.011). DL-HASTE achieved significant higher image quality (p = 0.006) and higher sharpness score of organs compared to DL-HASTE-FS (p < 0.001). Deep learning-accelerated HASTE with and without fat saturation were both feasible at 3 Tesla and showed improved image quality compared to conventional sequences. Not applicable.

MRI Reconstruction Abdominal Prospective Clinical Pilot Academic Lab

HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation

Weitong Wu, Zhaohu Xing, Jing Gong, Qin Peng, Lei Zhu

•preprint•Sep 18 2025

In the domain of 3D biomedical image segmentation, Mamba exhibits the superior performance for it addresses the limitations in modeling long-range dependencies inherent to CNNs and mitigates the abundant computational overhead associated with Transformer-based frameworks when processing high-resolution medical volumes. However, attaching undue importance to global context modeling may inadvertently compromise critical local structural information, thus leading to boundary ambiguity and regional distortion in segmentation outputs. Therefore, we propose the HybridMamba, an architecture employing dual complementary mechanisms: 1) a feature scanning strategy that progressively integrates representations both axial-traversal and local-adaptive pathways to harmonize the relationship between local and global representations, and 2) a gated module combining spatial-frequency analysis for comprehensive contextual modeling. Besides, we collect a multi-center CT dataset related to lung cancer. Experiments on MRI and CT datasets demonstrate that HybridMamba significantly outperforms the state-of-the-art methods in 3D medical image segmentation.

Mixed Modality Segmentation Chest Methodology In Silico

Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model

Sina Amirrajab, Zohaib Salahuddin, Sheng Kuang, Henry C. Woodruff, Philippe Lambin

•preprint•Sep 18 2025

Text to image latent diffusion models have recently advanced medical image synthesis, but applications to 3D CT generation remain limited. Existing approaches rely on simplified prompts, neglecting the rich semantic detail in full radiology reports, which reduces text image alignment and clinical fidelity. We propose Report2CT, a radiology report conditional latent diffusion framework for synthesizing 3D chest CT volumes directly from free text radiology reports, incorporating both findings and impression sections using multiple text encoder. Report2CT integrates three pretrained medical text encoders (BiomedVLP CXR BERT, MedEmbed, and ClinicalBERT) to capture nuanced clinical context. Radiology reports and voxel spacing information condition a 3D latent diffusion model trained on 20000 CT volumes from the CT RATE dataset. Model performance was evaluated using Frechet Inception Distance (FID) for real synthetic distributional similarity and CLIP based metrics for semantic alignment, with additional qualitative and quantitative comparisons against GenerateCT model. Report2CT generated anatomically consistent CT volumes with excellent visual quality and text image alignment. Multi encoder conditioning improved CLIP scores, indicating stronger preservation of fine grained clinical details in the free text radiology reports. Classifier free guidance further enhanced alignment with only a minor trade off in FID. We ranked first in the VLM3D Challenge at MICCAI 2025 on Text Conditional CT Generation and achieved state of the art performance across all evaluation metrics. By leveraging complete radiology reports and multi encoder text conditioning, Report2CT advances 3D CT synthesis, producing clinically faithful and high quality synthetic data.

CT Image Synthesis Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA

Filter Papers

Tags

Assessing Inter-rater Reliability of ChatGPT-4 and Orthopaedic Clinicians in Radiographic Fracture Classification.

Lightweight Transfer Learning Models for Multi-Class Brain Tumor Classification: Glioma, Meningioma, Pituitary Tumors, and No Tumor MRI Screening.

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

MUSCLE: A New Perspective to Multi-scale Fusion for Medical Image Classification based on the Theory of Evidence.

MFFC-Net: Multi-feature Fusion Deep Networks for Classifying Pulmonary Edema of a Pilot Study by Using Lung Ultrasound Image with Texture Analysis and Transfer Learning Technique.

MDFNet: a multi-dimensional feature fusion model based on structural magnetic resonance imaging representations for brain age estimation.

Mamba-Enhanced Diffusion Model for Perception-Aware Blind Super-Resolution of Magnetic Resonance Imaging.

Optimized deep learning-accelerated single-breath-hold abdominal HASTE with and without fat saturation improves and accelerates abdominal imaging at 3 Tesla.

HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation

Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model

Ready to Sharpen Your Edge?