Latest Papers on Radiology AI.

Accuracy and reproducibility of large language model measurements of liver metastases: comparison with radiologist measurements.

Sugawara H, Takada A, Kato S

•papers•Oct 4 2025

To compare the accuracy and reproducibility of lesion-diameter measurements performed by three state-of-the-art LLMs with those obtained by radiologists. In this retrospective study using a public database, 83 patients with solitary colorectal-cancer liver metastases were identified. From each CT series, a radiologist extracted the single axial slice showing the maximal tumor diameter and converted it to a 512 × 512-pixel PNG image (window level 50 HU, window width 400 HU) with pixel size encoded in the filename. Three LLMs-ChatGPT-o3 (OpenAI), Gemini 2.5 Pro (Google), and Claude 4 Opus (Anthropic)-were prompted to estimate the longest lesion diameter twice, ≥ 1 week apart. Two board-certified radiologists (12 years' experience each) independently measured the same single slice images and one radiologist repeated the measurements after ≥ 1 week. Agreement was assessed with intraclass correlation coefficients (ICC); 95% confidence intervals were obtained by bootstrap resampling (5 000 iterations). Radiologist inter-observer agreement was excellent (ICC = 0.95, 95% CI 0.86-0.99); intra-observer agreement was 0.98 (95% CI 0.94-0.99). Gemini achieved good model-to-radiologist agreement (ICC = 0.81, 95% CI 0.68-0.89) and intra-model reproducibility (ICC = 0.78, 95% CI 0.65-0.87). GPT-o3 showed moderate agreement (ICC = 0.52) and poor reproducibility (ICC = 0.25); Claude showed poor agreement (ICC = 0.07) and reproducibility (ICC = 0.47). LLMs do not yet match radiologists in measuring colorectal cancer liver metastasis; however, Gemini's good agreement and reproducibility highlight the rapid progress of image interpretation capability of LLMs.

CT Detection Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Relevance of advanced imaging analysis units in radiology departments: a narrative review.

Martín-Noguerol T, Paulano-Godino F, López-Úbeda P, Riascos RF, Luna A

•papers•Oct 4 2025

Radiology departments (RDs) face an increasing volume of data, images, and information, leading to a higher workload for radiologists. The integration of artificial intelligence (AI) presents an opportunity to optimize workflows and reduce the burden on radiologists. This review explores the role of advanced imaging analysis units (AIAUs) in enhancing radiological processes and improving overall patient outcomes. A literature review was conducted to assess the impact of AI-driven AIAUs on RD workflows. The study examines the collaboration between radiologists, technicians, and biomedical engineers in the extraction and processing of imaging data. Additionally, the integration of AI algorithms for task automation is analyzed. The implementation of AIAUs in RDs has the potential to enhance workflow efficiency by minimizing radiologists' workload and improving imaging analysis. These units facilitate collaborative work among radiologists, technicians, and engineers, fostering continuous communication, feedback, and training. AI algorithms incorporated into AIAUs support automation, streamlining pre- and postprocessing imaging tasks. AIAUs represent a promising approach to optimizing RD workflows and improving patient outcomes. Their successful implementation requires a multidisciplinary approach, integrating AI technologies with the expertise of radiologists, technicians, and biomedical engineers. Continuous collaboration and education within these units will be essential to maximize the benefits of emerging digital technologies in radiology.

Review Concept

Application of Large Language Models in TN Staging and Treatment Response Evaluation for Patients With Nasopharyngeal Carcinoma: A Comparative Performance Analysis of ChatGPT-4o-Latest and DeepSeek-V3-0324.

Yang Y, Yang F, Xiao S, Hou K, Chen K, Liu Z, Liang C, Chen X, Wang G

•papers•Oct 4 2025

Accurate tumor staging and treatment response evaluation (TRE) are critical for nasopharyngeal carcinoma (NPC) clinical decisions. Conventional methods relying on manual imaging analysis are expertise-dependent, time-consuming, and prone to inter-observer variability and errors. To assess the performance of two large language models (LLMs): ChatGPT-4o-latest and DeepSeek-V3-0324 in automating T, N staging and TRE for NPC patients. Retrospective. Three hundred seven NPC patients from three centers (mean age: 45.5 ± 11.3 years; 216 men, 91 women). All imaging was conducted using 3.0T or 1.5T scanners. The imaging sequence included axial T1-weighted fast spin-echo, T2-weighted fast spin-echo, T2-weighted fat-suppressed spin-echo, and Contrast-Enhanced T1-weighted fast spin-echo. Two radiologists established the reference standards for TN staging at baseline and for TRE at two time points: post-induction chemotherapy (TRE-1) and post-concurrent chemoradiotherapy (TRE-2), based on the 9th version of AJCC/UICC guidelines and the RECIST1.1 criteria. LLMs were via few-shot chain-of-thought prompting and tested on 277 patients with 831 reports. Additionally, four radiologists independently assessed 68 cases both with and without the assistance of LLMs and compared the performance and efficiency in both conditions. McNemar-Bowker test, Wilcoxon signed-rank test. p < 0.05 was considered statistically significant. DeepSeek-V3-0324 significantly outperformed GPT-4o-latest in TRE-1 staging (96.5% vs. 82.9%, p < 0.001). For T staging (95.3% vs. 93.5%, p = 0.24), N staging (93.8% vs. 89.6%, p = 0.265), and TRE-2 (94.9% vs. 93.2%, p = 0.556), the accuracy between DeepSeek-V3-0324 and ChatGPT-4o-latest showed no significant difference. DeepSeek-V3-0324 also showed stronger agreement with expert annotation (κ = 0.85-0.90), compared to ChatGPT-4o-latest (κ = 0.49-0.86). Significant improvements in time efficiency were observed across all radiologists with LLM assistance (p < 0.001). LLMs, particularly DeepSeek-V3-0324, can automate NPC TN staging and TRE with high accuracy, enhancing clinical efficiency. LLMs integration may improve diagnostic consistency, especially for junior clinicians. Stage 4.

MRI LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Advancing modified barium swallow pre-sorting with deep learning: a new paradigm for the first step analysis in X-ray swallowing study.

Mao S, Naser MA, Buoy S, Brock KK, Hutcheson KA

•papers•Oct 4 2025

Modified barium swallow (MBS) exams are pivotal for assessing swallowing function and include diagnostic video segments imaged in various planes, such as anteroposterior (AP or coronal plane) and lateral (or mid-sagittal plane), alongside non-diagnostic 'scout' image segments used for anatomic reference and image set-up that do not include bolus swallows. These variations in imaging files necessitate manual sorting and labeling, complicating the pre-analysis workflow. Our study introduces a deep learning approach to automate the categorization of swallow videos in MBS exams, distinguishing between the different types of diagnostic videos and identifying non-diagnostic scout videos to streamline the MBS review workflow. Our algorithms were developed on a dataset that included 3,740 video segments with a total of 986,808 frames from 285 MBS exams in 216 patients (average age 60 ± 9). Our model achieved an accuracy of 99.68% at the frame level and 100% at the video level in differentiating AP from lateral planes. For distinguishing scout from bolus swallowing videos, the model reached an accuracy of 90.26% at the frame level and 93.86% at the video level. Incorporating a multi-task learning approach notably enhanced the video-level accuracy to 96.35% for scout/bolus video differentiation. Our analysis highlighted the importance of leveraging inter-frame connectivity for improving the model performance. These findings significantly boost MBS exam processing efficiency, minimizing manual sorting efforts and allowing raters to allocate greater focus to clinical interpretation and patient care.

X-Ray Classification Methodology In Silico Academic Lab

A systematic review on automatic segmentation of renal tumors and cysts using various convolutional neural network architectures in radiological images.

Anusha C, Rao KN, Rao TL

•papers•Oct 4 2025

Premature diagnosis of kidney cancer is crucial for saving lives and enabling better treatment. Medical experts utilize radiological images, such as CT, MRI, US, and histopathological analysis, to identify kidney tumors and cysts, providing valuable information on their size, shape, location, and metabolism, thus aiding in diagnosis. In radiological image processing, precise segmentation remains difficult when done manually, despite numerous noteworthy efforts and encouraging results in this field. Thus, there's an emergent need for automatic methods for renal and renal mass segmentation. In this regard, this article reviews studies on utilizing deep learning models to detect renal masses early in medical imaging examinations, particularly various CNN (Convolutional Neural Network) models that have demonstrated excellent outcomes in the segmentation of radiological images. Furthermore, we addressed the detailed dataset characteristics that the researchers adapted, as well as the accuracy and efficiency metrics obtained using various parameters. However, several studies employed datasets with limited images, whereas only a handful used hundreds of thousands of images. Those examinations did not fully determine the tumor and cyst diagnosis. The key goals are to describe recent accomplishments, examine the methodological approaches used by researchers, and recommend potential future research directions.

Mixed Modality Segmentation Abdominal Review Concept

Breast cancer prediction using mammography exams for real hospital settings.

Pathak S, Schlötterer J, Geerdink J, Veltman J, van Keulen M, Strisciuglio N, Seifert C

•papers•Oct 4 2025

Breast cancer prediction models for mammography assume that annotations are available for individual images or regions of interest (ROIs), and that there is a fixed number of images per patient. These assumptions do not hold in real hospital settings, where clinicians provide only a final diagnosis for the entire mammography exam (case). Since data in real hospital settings scales with continuous patient intake, while manual annotation efforts do not, we develop a framework for case-level breast cancer prediction that does not require any manual annotation and can be trained with case labels readily available at the hospital. Specifically, we propose a two-level multi-instance learning (MIL) approach at patch and image level for case-level breast cancer prediction and evaluate it on two public and one private dataset. We propose a novel domain-specific MIL pooling observing that breast cancer may or may not occur in both sides, while images of both breasts are taken as a precaution during mammography. We propose a dynamic training procedure for training our MIL framework on a variable number of images per case. We show that our two-level MIL model can be applied in real hospital settings where only case labels, and a variable number of images per case are available, without any loss in performance compared to models trained on image labels. Only trained with weak (case-level) labels, it has the capability to point out in which breast side, mammography view and view region the abnormality lies.

Mammography Classification Breast Methodology In Silico Benchmark SOTA

Machine learning based prediction of single-frequency viscoelastic brain white matter - A data science framework.

Agarwal M, Pelegri AA

•papers•Oct 4 2025

Characterizing brain white matter (BWM) using in vivo Magnetic Resonance Elastography (MRE) and Diffusion Tensor Imaging (DTI) is a costly, time-intensive process. Numerical modeling approaches, such as finite element models (FEMs), also face limitations in fidelity, computational resources, and accurately capturing the complex bio-physical behavior of brain tissues. To address the scarcity of experimental data, researchers are exploring machine learning (ML) as a surrogate for predicting the mechanical properties of brain tissues. Here in, an ML workflow is proposed for predicting the homogenized viscoelastic properties of BWM using FEM-derived data. The synthetic FE dataset originates from a sensitivity analysis, whereby a triphasic 2D composite model, consisting of axons, myelin, and glial matrix, was used to simulate transverse mechanical behavior under harmonic shear stress. This dataset is utilized to train and validate machine learning models aimed at predicting the frequency-dependent mechanical response. The proposed ML pipeline incorporates microstructural features such as fiber volume fraction, intrinsic phase moduli, and axonal geometry to build and train regression models. Feature selection and hyperparameter optimization were applied to improve prediction accuracy. Decision tree-based models outperformed other approaches, while SHAP interpretation revealed that glial moduli and fiber volume fraction significantly influenced the predictions. This framework offers a cost-effective alternative to in vivo characterization and computationally expensive physics based direct numerical simulation methods (FEM). It would also provide a basis for future ML-driven inverse models to explore the impact of various brain matter constituents on neuroimaging characteristics, potentially informing studies on aging, dementia, and traumatic brain injuries.

MRI Registration Neurological Methodology In Silico

Multi-Modal Oral Cancer Detection Using Weighted Ensemble Convolutional Neural Networks

Ajo Babu George, Sreehari J R Ajo Babu George, Sreehari J R Ajo Babu George, Sreehari J R

•preprint•Oct 4 2025

Aims Late diagnosis of Oral Squamous Cell Carcinoma (OSCC) contributes significantly to its high global mortality rate, with over 50\% of cases detected at advanced stages and a 5-year survival rate below 50\% according to WHO statistics. This study aims to improve early detection of OSCC by developing a multimodal deep learning framework that integrates clinical, radiological, and histopathological images using a weighted ensemble of DenseNet-121 convolutional neural networks (CNNs). Material and Methods A retrospective study was conducted using publicly available datasets representing three distinct medical imaging modalities. Each modality-specific dataset was used to train a DenseNet-121 CNN via transfer learning. Augmentation and modality-specific preprocessing were applied to increase robustness. Predictions were fused using a validation-weighted ensemble strategy. Evaluation was performed using accuracy, precision, recall, F1-score. Results High validation accuracy was achieved for radiological (100\%) and histopathological (95.12\%) modalities, with clinical images performing lower (63.10\%) due to visual heterogeneity. The ensemble model demonstrated improved diagnostic robustness with an overall accuracy of 84.58\% on a multimodal validation dataset of 55 samples. Conclusion The multimodal ensemble framework bridges gaps in the current diagnostic workflow by offering a non-invasive, AI-assisted triage tool that enhances early identification of high-risk lesions. It supports clinicians in decision-making, aligning with global oncology guidelines to reduce diagnostic delays and improve patient outcomes.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab

AI-Assisted Pleural Effusion Volume Estimation from Contrast-Enhanced CT Images

Sanhita Basu, Tomas Fröding, Ali Teymur Kahraman, Dimitris Toumpanakis, Tobias Sjöblom

•preprint•Oct 4 2025

Background: Pleural Effusions (PE) is a common finding in many different clinical conditions, but accurately measuring their volume from CT scans is challenging. Purpose: To improve PE segmentation and quantification for enhanced clinical management, we have developed and trained a semi-supervised deep learning framework on contrast-enhanced CT volumes. Materials and Methods: This retrospective study collected CT Pulmonary Angiogram (CTPA) data from internal and external datasets. A subset of 100 cases was manually annotated for model training, while the remaining cases were used for testing and validation. A novel semi-supervised deep learning framework, Teacher-Teaching Assistant-Student (TTAS), was developed and used to enable efficient training in non-segmented examinations. Segmentation performance was compared to that of state-of-the-art models. Results: 100 patients (mean age, 72 years, 28 [standard deviation]; 55 men) were included in the study. The TTAS model demonstrated superior segmentation performance compared to state-of-the-art models, achieving a mean Dice score of 0.82 (95% CI, 0.79 - 0.84) versus 0.73 for nnU-Net (p < 0.0001, Student's T test). Additionally, TTAS exhibited a four-fold lower mean Absolute Volume Difference (AbVD) of 6.49 mL (95% CI, 4.80 - 8.20) compared to nnU-Net's AbVD of 23.16 mL (p < 0.0001). Conclusion: The developed TTAS framework offered superior PE segmentation, aiding accurate volume determination from CT scans.

CT Segmentation Chest Retrospective Clinical In Silico

MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation

T-Mai Bui, Fares Bougourzi, Fadi Dornaika, Vinh Truong Hoang

•preprint•Oct 4 2025

In recent years, deep learning has shown near-expert performance in segmenting complex medical tissues and tumors. However, existing models are often task-specific, with performance varying across modalities and anatomical regions. Balancing model complexity and performance remains challenging, particularly in clinical settings where both accuracy and efficiency are critical. To address these issues, we propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion (MAF) mechanism to capture local, global, and long-range dependencies. A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency. Additionally, a co-attention gate enhances feature selection by emphasizing relevant spatial and semantic information across scales during both encoding and decoding, improving feature interaction and cross-scale communication. Extensive experiments on multiple benchmark datasets show that our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity. By effectively balancing efficiency and effectiveness, our architecture offers a practical and scalable solution for diverse medical imaging tasks. Source code and trained models will be publicly released upon acceptance to support reproducibility and further research.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA Open Code Reproducibility

Filter Papers

Tags

Accuracy and reproducibility of large language model measurements of liver metastases: comparison with radiologist measurements.

Relevance of advanced imaging analysis units in radiology departments: a narrative review.

Application of Large Language Models in TN Staging and Treatment Response Evaluation for Patients With Nasopharyngeal Carcinoma: A Comparative Performance Analysis of ChatGPT-4o-Latest and DeepSeek-V3-0324.

Advancing modified barium swallow pre-sorting with deep learning: a new paradigm for the first step analysis in X-ray swallowing study.

A systematic review on automatic segmentation of renal tumors and cysts using various convolutional neural network architectures in radiological images.

Breast cancer prediction using mammography exams for real hospital settings.

Machine learning based prediction of single-frequency viscoelastic brain white matter - A data science framework.

Multi-Modal Oral Cancer Detection Using Weighted Ensemble Convolutional Neural Networks

AI-Assisted Pleural Effusion Volume Estimation from Contrast-Enhanced CT Images

MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation

Ready to Sharpen Your Edge?