Latest Papers on Radiology AI.

Artificial intelligence aided ultrasound imaging of foetal congenital heart disease: A scoping review.

Norris L, Lockwood P

•papers•Sep 16 2025

Congenital heart diseases (CHD) are a significant cause of neonatal mortality and morbidity. Detecting these abnormalities during pregnancy increases survival rates, enhances prognosis, and improves pregnancy management and quality of life for the affected families. Foetal echocardiography can be considered an accurate method for detecting CHDs. However, the detection of CHDs can be limited by factors such as the sonographer's skill, expertise and patient specific variables. Using artificial intelligence (AI) has the potential to address these challenges, increasing antenatal CHD detection during prenatal care. A scoping review was conducted using Google Scholar, PubMed, and ScienceDirect databases, employing keywords, Boolean operators, and inclusion and exclusion criteria to identify peer-reviewed studies. Thematic mapping and synthesis of the found literature were conducted to review key concepts, research methods and findings. A total of n = 233 articles were identified, after exclusion criteria, the focus was narrowed to n = 7 that met the inclusion criteria. Themes in the literature identified the potential of AI to assist clinicians and trainees, alongside emerging new ethical limitations in ultrasound imaging. AI-based tools in ultrasound imaging offer great potential in assisting sonographers and doctors with decision-making in CHD diagnosis. However, due to the paucity of data and small sample sizes, further research and technological advancements are needed to improve reliability and integrate AI into routine clinical practice. This scoping review identified the reported accuracy and limitations of AI-based tools within foetal cardiac ultrasound imaging. AI has the potential to aid in reducing missed diagnoses, enhance training, and improve pregnancy management. There is a need to understand and address the ethical and legal considerations involved with this new paradigm in imaging.

Ultrasound Detection Cardiac Review Concept Academic Lab Ethics

Diagnostic Performance of Large Language Models in Multimodal Analysis of Radiolucent Jaw Lesions.

Kim K, Kim BC

•papers•Sep 16 2025

Large language models (LLMs), such as ChatGPT and Gemini, are increasingly being used in medical domains, including dental diagnostics. Despite advancements in image-based deep learning systems, LLM diagnostic capabilities in oral and maxillofacial surgery (OMFS) for processing multi-modal imaging inputs remain underexplored. Radiolucent jaw lesions represent a particularly challenging diagnostic category due to their varied presentations and overlapping radiographic features. This study evaluated diagnostic performance of ChatGPT 4o and Gemini 2.5 Pro using real-world OMFS radiolucent jaw lesion cases, presented in multiple-choice (MCQ) and short-answer (SAQ) formats across 3 imaging conditions: panoramic radiography only, panoramic + CT, and panoramic + CT + pathology. Data from 100 anonymized patients at Wonkwang University Daejeon Dental Hospital were analyzed, including demographics, panoramic radiographs, CBCT images, histopathology slides, and confirmed diagnoses. Sample size was determined based on institutional case availability and statistical power requirements for comparative analysis. ChatGPT and Gemini diagnosed each case under 6 conditions using 3 imaging modalities (P, P+C, P+C+B) in MCQ and SAQ formats. Model accuracy was scored against expert-confirmed diagnoses by 2 independent evaluators. McNemar's and Cochran's Q tests evaluated statistical differences across models and imaging modalities. For MCQ tasks, ChatGPT achieved 66%, 73%, and 82% accuracies across the P, P+C, and P+C+B conditions, respectively, while Gemini achieved 57%, 62%, and 63%, respectively. In SAQ tasks, ChatGPT achieved 34%, 45%, and 48%; Gemini achieved 15%, 24%, and 28%, respectively. Accuracy improved significantly with additional imaging data for ChatGPT; ChatGPT consistently outperformed Gemini across all conditions (P < .001 for MCQ; P = .008 to < .001 for SAQ). MCQ format, which incorporates a human-in-the-loop (HITL) structure, showed higher overall performance than SAQ. ChatGPT demonstrated superior diagnostic performance compared to Gemini in OMFS diagnostic tasks when provided with richer multimodal inputs. Diagnostic accuracy increased with additional imaging data, especially in MCQ formats, suggesting LLMs can effectively synthesize radiographic and pathological data. LLMs have potential as diagnostic support tools for OMFS, especially in settings with limited specialist access. Presenting clinical cases in structured formats using curated imaging data enhances LLM accuracy and underscores HITL integration. Although current LLMs show promising results, further validation using larger datasets and hybrid AI systems are necessary for broader contextualised, clinical adoption.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab GenAI

Multi-filter stacking in inception V3 for enhanced Alzheimer's severity classification.

Iqbal A, Iqbal K, Shah YA, Ullah F, Khan J, Yaqoob S

•papers•Sep 16 2025

Alzheimer's disease, a progressive neurodegenerative disorder, is characterized by a decline in brain volume and neuronal loss, with early symptoms often presenting as short-term memory impairment. Automated classification of Alzheimer's disease remains a significant challenge due to inter-patient variability in brain morphology, aging effects, and overlapping anatomical features across different stages. While traditional machine learning techniques, such as Support Vector Machines (SVMs) and various Deep Neural Network (DNN) models, have been explored, the need for more accurate and efficient classification techniques persists. In this study, we propose a novel approach that integrates Multi-Filter Stacking with the Inception V3 architecture, referred to as CASFI (Classifying Alzheimer's Severity using Filter Integration). This method leverages diverse convolutional filter sizes to capture multiscale spatial features, enhancing the model's ability to detect subtle structural variations associated with different Alzheimer's disease stages. Applied to MRI data, CASFI achieved an accuracy of 97.27%, outperforming baseline deep learning models and traditional classifiers in both accuracy and robustness. This approach supports early diagnosis and informed clinical decision-making, providing a valuable tool to assist healthcare professionals in managing and planning treatment for Alzheimer's patients.

MRI Classification Neurological Methodology In Silico

Developing and Validation of a Multimodal-based Machine Learning Model for Diagnosis of Usual Interstitial Pneumonia: a Prospective Multicenter Study.

Wang H, Liu A, Ni Y, Wang J, Du J, Xi L, Qiang Y, Xie B, Ren Y, Wang S, Geng J, Deng Y, Huang S, Zhang R, Liu M, Dai H

•papers•Sep 16 2025

Usual Interstitial Pneumonia (UIP) indicates poor prognosis, and there is significant heterogeneity in the diagnosis of UIP, necessitating an auxiliary diagnostic tool. Can a machine learning (ML) classifier using radiomics features and clinical data accurately identify UIP from patients with interstitial lung diseases (ILD)? This dataset from a prospective cohort consists of 5321 sets of high-resolution computed tomography (HRCT) images from 2901 patients with ILD (male: 63.5%, age: 61.7 ± 10.8 years) across three medical centers. Multimodal data, including whole-lung radiomics features on HRCT and demographics, smoking, lung function, and comorbidity data, were extracted. An eXtreme Gradient Boosting (XGBoost) and logistic regression were used to design a nomogram predicting UIP or not. Area under the receiver operating characteristic curve (AUC) and Cox's regression for all-cause mortality were used to assess the diagnostic performance and prognostic value of models, respectively. 5213 HRCT image datasets were divided into the training group (n=3639), the internal testing group (n=785), and the external validation group (n=789). UIP prevalence was 43.7% across the whole dataset, with 42.7% and 41.3% for the internal validation set and external validation set. The radiomics-based classifier had an AUC of 0.790 in the internal testing set and 0.786 for the external validation dataset. Integrating multimodal data improved AUCs to 0.802 and 0.794, respectively. The performance of the integration model was comparable to pulmonologist with over 10 years of experience in ILD. Within 522 patients deceased during a median follow-up period of 3.37 years, the multimodal-based ML model-predicted UIP status was associated with high all-cause mortality risk (hazard ratio: 2.52, p<0.001). The classifier combining radiomics and clinical features showed strong performance across varied UIP prevalence. This multimodal-based ML model could serve as an adjunct in the diagnosis of UIP.

CT Classification Chest Prospective Clinical Pilot Academic Lab

Challenging the Status Quo Regarding the Benefit of Chest Radiographic Screening.

Yankelevitz DF, Yip R, Henschke CI

•papers•Sep 15 2025

Chest radiographic (CXR) screening is currently not recommended in the United States by any major guideline organization. Multiple randomized controlled trials done in the United States and also in Europe, with the largest being the Prostate, Lung, Colorectal and Ovarian (PLCO) trial, all failed to show a benefit and are used as evidence to support the current recommendation. Nevertheless, there is renewed interest in CXR screening, especially in low- and middle-resourced countries around the world. Reasons for this are multi-factorial, including the continued concern that those trials still may have missed a benefit, but perhaps more importantly, it is now established conclusively that finding smaller cancers is better than finding larger ones. This was the key finding in those large randomized controlled trials for CT screening. So, while CT finds cancers smaller than CXR, both clearly perform better than waiting for cancers to be larger and detected by symptom prompting. Without it being well understood that treating cancers found in the asymptomatic state by CXR, there would also be no basis for treating them when found incidentally. In addition, advances in artificial intelligence are allowing for nodules to be found earlier and more reliably with CXR than in those prior studies, and in many countries around the world, TB screening is already taking place on a large scale. This presents a major opportunity for integration with lung screening programs.

X-Ray Detection Chest GenAI

MambaDiff: Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation.

Liu Y, Feng Y, Cheng J, Zhan H, Zhu Z

•papers•Sep 15 2025

Accurate 3D medical image segmentation is crucial for diagnosis and treatment. Diffusion models demonstrate promising performance in medical image segmentation tasks due to the progressive nature of the generation process and the explicit modeling of data distributions. However, the weak guidance of conditional information and insufficient feature extraction in diffusion models lead to the loss of fine-grained features and structural consistency in the segmentation results, thereby affecting the accuracy of medical image segmentation. To address this challenge, we propose a Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation. We extract multilevel semantic features from the original images using an encoder and tightly integrate them with the denoising process of the diffusion model through a Semantic Hierarchical Embedding (SHE) mechanism, to capture the intricate relationship between the noisy label and image data. Meanwhile, we design a Global-Slice Perception Mamba (GSPM) layer, which integrates multi-dimensional perception mechanisms to endow the model with comprehensive spatial reasoning and feature extraction capabilities. Experimental results show that our proposed MambaDiff achieves more competitive performance compared to prior arts with substantially fewer parameters on four public medical image segmentation datasets including BraTS 2021, BraTS 2024, LiTS and MSD Hippocampus. The source code of our method is available at https://github.com/yuliu316316/MambaDiff.

Mixed Modality Segmentation Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models.

Chen Q, Yao X, Ye H, Hong Y

•papers•Sep 15 2025

Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance. Our source code, generated datasets, and pre-trained models will be available upon acceptance.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA Open Code

Trade-Off Analysis of Classical Machine Learning and Deep Learning Models for Robust Brain Tumor Detection: Benchmark Study.

Tian Y

•papers•Sep 15 2025

Medical image analysis plays a critical role in brain tumor detection, but training deep learning models often requires large, labeled datasets, which can be time-consuming and costly. This study explores a comparative analysis of machine learning and deep learning models for brain tumor classification, focusing on whether deep learning models are necessary for small medical datasets and whether self-supervised learning can reduce annotation costs. The primary goal is to evaluate trade-offs between traditional machine learning and deep learning, including self-supervised models under small medical image data. The secondary goal is to assess model robustness, transferability, and generalization through evaluation of unseen data within- and cross-domains. Four models were compared: (1) support vector machine (SVM) with histogram of oriented gradients (HOG) features, (2) a convolutional neural network based on ResNet18, (3) a transformer-based model using vision transformer (ViT-B/16), and (4) a self-supervised learning approach using Simple Contrastive Learning of Visual Representations (SimCLR). These models were selected to represent diverse paradigms. SVM+HOG represents traditional feature engineering with low computational cost, ResNet18 serves as a well-established convolutional neural network with strong baseline performance, ViT-B/16 leverages self-attention to capture long-range spatial features, and SimCLR enables learning from unlabeled data, potentially reducing annotation costs. The primary dataset consisted of 2870 brain magnetic resonance images across 4 classes: glioma, meningioma, pituitary, and nontumor. All models were trained under consistent settings, including data augmentation, early stopping, and 3 independent runs using the different random seeds to account for performance variability. Performance metrics included accuracy, precision, recall, F1-score, and convergence. To assess robustness and generalization capability, evaluation was performed on unseen test data from both the primary and cross datasets. No retraining or test augmentations were applied to the external data, thereby reflecting realistic deployment conditions. The models demonstrated consistently strong performance in both within-domain and cross-domain evaluations. The results revealed distinct trade-offs; ResNet18 achieved the highest validation accuracy (mean 99.77%, SD 0.00%) and the lowest validation loss, along with a weighted test accuracy of 99% within-domain and 95% cross-domain. SimCLR reached a mean validation accuracy of 97.29% (SD 0.86%) and achieved up to 97% weighted test accuracy within-domain and 91% cross-domain, despite requiring 2-stage training phases involving contrastive pretraining followed by linear evaluation. ViT-B/16 reached a mean validation accuracy of 97.36% (SD 0.11%), with a weighted test accuracy of 98% within-domain and 93% cross-domain. SVM+HOG maintained a competitive validation accuracy of 96.51%, with 97% within-domain test accuracy, though its accuracy dropped to 80% cross-domain. The study reveals meaningful trade-offs between model complexity, annotation requirements, and deployment feasibility-critical factors for selecting models in real-world medical imaging applications.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Prediction of Cardiovascular Events Using Fully Automated Global Longitudinal and Circumferential Strain in Patients Undergoing Stress CMR.

Afana AS, Garot J, Duhamel S, Hovasse T, Champagne S, Unterseeh T, Garot P, Akodad M, Chitiboi T, Sharma P, Jacob A, Gonçalves T, Florence J, Unger A, Sanguineti F, Militaru S, Pezel T, Toupin S

•papers•Sep 15 2025

Stress perfusion cardiovascular magnetic resonance (CMR) is widely used to detect myocardial ischemia, mostly through visual assessment. Recent studies suggest that strain imaging at rest and during stress can also help in prognostic stratification. However, the additional prognostic value of combining both rest and stress strain imaging has not been fully established. This study examined the incremental benefit of combining these strain measures with traditional risk prognosticators and CMR findings to predict major adverse clinical events (MACE) in a cohort of consecutive patients referred for stress CMR. This retrospective, single-center observational study included all consecutive patients with known or suspected coronary artery disease referred for stress CMR between 2016 and 2018. Fully automated machine learning was used to obtain global longitudinal strain at rest (rest-GLS) and global circumferential strain at stress (stress-GCS). The primary outcome was MACE, including cardiovascular death or hospitalization for heart failure. Cox models were used to assess the incremental prognostic value of combining these strain features with traditional prognosticators. Of 2778 patients (age 65±12 years, 68% male), 96% had feasible, fully automated rest-GLS and stress-GCS measurements. After a median follow-up of 5.2 (4.8-5.5) years, 316 (11.1%) patients experienced MACE. After adjustment for traditional prognosticators, both rest-GLS (hazard ratio, 1.09 [95% CI, 1.05-1.13]; P<0.001) and stress-GCS (hazard ratio, 1.08 [95% CI, 1.03-1.12]; P<0.001) were independently associated with MACE. The best cutoffs for MACE prediction were >-10% for rest-GLS and stress-GCS, with a C-index improvement of 0.02, continuous net reclassification improvement of 15.6%, and integrative discrimination index of 2.2% (all P<0.001). The combination of rest-GLS and stress-GCS, with a cutoff of >-10% provided an incremental prognostic value over and above traditional prognosticators, including CMR parameters, for predicting MACE in patients undergoing stress CMR.

MRI Classification Cardiac Retrospective Clinical In Silico Academic Lab

Fractal-driven self-supervised learning enhances early-stage lung cancer GTV segmentation: a novel transfer learning framework.

Tozuka R, Kadoya N, Yasunaga A, Saito M, Komiyama T, Nemoto H, Ando H, Onishi H, Jingu K

•papers•Sep 15 2025

To develop and evaluate a novel deep learning strategy for automated early-stage lung cancer gross tumor volume (GTV) segmentation, utilizing pre-training with mathematically generated non-natural fractal images. This retrospective study included 104 patients (36-91 years old; 81 males; 23 females) with peripheral early-stage non-small cell lung cancer who underwent radiotherapy at our institution from December 2017 to March 2025. First, we utilized encoders from a Convolutional Neural Network and a Vision Transformer (ViT), pre-trained with four learning strategies: from scratch, ImageNet-1K (1,000 classes of natural images), FractalDB-1K (1,000 classes of fractal images), and FractalDB-10K (10,000 classes of fractal images), with the latter three utilizing publicly available models. Second, the models were fine-tuned using CT images and physician-created contour data. Model accuracy was then evaluated using the volumetric Dice Similarity Coefficient (vDSC), surface Dice Similarity Coefficient (sDSC), and 95th percentile Hausdorff Distance (HD95) between the predicted and ground truth GTV contours, averaged across the fourfold cross-validation. Additionally, the segmentation accuracy was compared between simple and complex groups, categorized by the surface-to-volume ratio, to assess the impact of GTV shape complexity. Pre-trained with FractalDB-10K yielded the best segmentation accuracy across all metrics. For the ViT model, the vDSC, sDSC, and HD95 results were 0.800 ± 0.079, 0.732 ± 0.152, and 2.04 ± 1.59 mm for FractalDB-10K; 0.779 ± 0.093, 0.688 ± 0.156, and 2.72 ± 3.12 mm for FractalDB-1K; 0.764 ± 0.102, 0.660 ± 0.156, and 3.03 ± 3.47 mm for ImageNet-1K, respectively. In conditions FractalDB-1K and ImageNet-1K, there was no significant difference in the simple group, whereas the complex group showed a significantly higher vDSC (0.743 ± 0.095 vs 0.714 ± 0.104, p = 0.006). Pre-training with fractal structures achieved comparable or superior accuracy to ImageNet pre-training for early-stage lung cancer GTV auto-segmentation.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Artificial intelligence aided ultrasound imaging of foetal congenital heart disease: A scoping review.

Diagnostic Performance of Large Language Models in Multimodal Analysis of Radiolucent Jaw Lesions.

Multi-filter stacking in inception V3 for enhanced Alzheimer's severity classification.

Developing and Validation of a Multimodal-based Machine Learning Model for Diagnosis of Usual Interstitial Pneumonia: a Prospective Multicenter Study.

Challenging the Status Quo Regarding the Benefit of Chest Radiographic Screening.

MambaDiff: Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation.

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models.

Trade-Off Analysis of Classical Machine Learning and Deep Learning Models for Robust Brain Tumor Detection: Benchmark Study.

Prediction of Cardiovascular Events Using Fully Automated Global Longitudinal and Circumferential Strain in Patients Undergoing Stress CMR.

Fractal-driven self-supervised learning enhances early-stage lung cancer GTV segmentation: a novel transfer learning framework.

Ready to Sharpen Your Edge?