Latest Papers on Radiology AI. Tags: None

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

Haibo Jin, Haoxuan Che, Sunan He, Hao Chen

•preprint•Aug 13 2025

Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Mixed Modality Report Generation Methodology In Silico Academic Lab Open Dataset Benchmark SOTA GenAI

GazeLT: Visual attention-guided long-tailed disease classification in chest radiographs

Moinak Bhattacharya, Gagandeep Singh, Shubham Jain, Prateek Prasanna

•preprint•Aug 13 2025

In this work, we present GazeLT, a human visual attention integration-disintegration approach for long-tailed disease classification. A radiologist's eye gaze has distinct patterns that capture both fine-grained and coarser level disease related information. While interpreting an image, a radiologist's attention varies throughout the duration; it is critical to incorporate this into a deep learning framework to improve automated image interpretation. Another important aspect of visual attention is that apart from looking at major/obvious disease patterns, experts also look at minor/incidental findings (few of these constituting long-tailed classes) during the course of image interpretation. GazeLT harnesses the temporal aspect of the visual search process, via an integration and disintegration mechanism, to improve long-tailed disease classification. We show the efficacy of GazeLT on two publicly available datasets for long-tailed disease classification, namely the NIH-CXR-LT (n=89237) and the MIMIC-CXR-LT (n=111898) datasets. GazeLT outperforms the best long-tailed loss by 4.1% and the visual attention-based baseline by 21.7% in average accuracy metrics for these datasets. Our code is available at https://github.com/lordmoinak1/gazelt.

X-Ray Classification Chest Methodology In Silico Open Code

SKOOTS: Skeleton oriented object segmentation for mitochondria

Buswinka, C. J., Osgood, R. T., Nitta, H., Indzhykulian, A. A.

•preprint•Aug 13 2025

Segmenting individual instances of mitochondria from imaging datasets can provide rich quantitative information, but is prohibitively time-consuming when done manually, prompting interest in the development of automated algorithms using deep neural networks. Existing solutions for various segmentation tasks are optimized for either: high-resolution three-dimensional imaging, relying on well-defined object boundaries (e.g., whole neuron segmentation in volumetric electron microscopy datasets); or low-resolution two-dimensional imaging, boundary-invariant but poorly suited to large 3D objects (e.g., whole-cell segmentation of light microscopy images). Mitochondria in whole-cell 3D electron microscopy datasets often lie in the middle ground - large, yet with ambiguous borders, challenging current segmentation tools. To address this, we developed skeleton-oriented object segmentation (SKOOTS) - a novel approach that efficiently segments large, densely packed mitochondria. SKOOTS accurately and efficiently segments mitochondria in previously difficult contexts and can also be applied to segment other objects in 3D light microscopy datasets. This approach bridges a critical gap between existing segmentation approaches, improving the utility of automated analysis of three-dimensional biomedical imaging data. We demonstrate the utility of SKOOTS by applying it to segment over 15,000 cochlear hair cell mitochondria across experimental conditions in under 2 hours on a consumer-grade PC, enabling downstream morphological analysis that revealed subtle structural changes following aminoglycoside exposure - differences not detectable using analysis approaches currently used in the field.

CT Segmentation Methodology In Silico Academic Lab Breakthrough

BSA-Net: Boundary-prioritized spatial adaptive network for efficient left atrial segmentation.

Xu F, Tu W, Feng F, Yang J, Gunawardhana M, Gu Y, Huang J, Zhao J

•papers•Aug 13 2025

Atrial fibrillation, a common cardiac arrhythmia with rapid and irregular atrial electrical activity, requires accurate left atrial segmentation for effective treatment planning. Recently, deep learning methods have gained encouraging success in left atrial segmentation. However, current methodologies critically depend on the assumption of consistently complete centered left atrium as input, which neglects the structural incompleteness and boundary discontinuities arising from random-crop operations during inference. In this paper, we propose BSA-Net, which exploits an adaptive adjustment strategy in both feature position and loss optimization to establish long-range feature relationships and strengthen robust intermediate feature representations in boundary regions. Specifically, we propose a Spatial-adaptive Convolution (SConv) that employs a shuffle operation combined with lightweight convolution to directly establish cross-positional relationships within regions of potential relevance. Moreover, we develop the dual Boundary Prioritized loss, which enhances boundary precision by differentially weighting foreground and background boundaries, thus optimizing complex boundary regions. With the above technologies, the proposed method enjoys a better speed-accuracy trade-off compared to current methods. BSA-Net attains Dice scores of 92.55%, 91.42%, and 84.67% on the LA, Utah, and Waikato datasets, respectively, with a mere 2.16 M parameters-approximately 80% fewer than other contemporary state-of-the-art models. Extensive experimental results on three benchmark datasets have demonstrated that BSA-Net, consistently and significantly outperforms existing state-of-the-art methods.

CT Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

Comparative evaluation of CAM methods for enhancing explainability in veterinary radiography.

Dusza P, Banzato T, Burti S, Bendazzoli M, Müller H, Wodzinski M

•papers•Aug 13 2025

Explainable Artificial Intelligence (XAI) encompasses a broad spectrum of methods that aim to enhance the transparency of deep learning models, with Class Activation Mapping (CAM) methods widely used for visual interpretability. However, systematic evaluations of these methods in veterinary radiography remain scarce. This study presents a comparative analysis of eleven CAM methods, including GradCAM, XGradCAM, ScoreCAM, and EigenCAM, on a dataset of 7362 canine and feline X-ray images. A ResNet18 model was chosen based on the specificity of the dataset and preliminary results where it outperformed other models. Quantitative and qualitative evaluations were performed to determine how well each CAM method produced interpretable heatmaps relevant to clinical decision-making. Among the techniques evaluated, EigenGradCAM achieved the highest mean score and standard deviation (SD) of 2.571 (SD = 1.256), closely followed by EigenCAM at 2.519 (SD = 1.228) and GradCAM++ at 2.512 (SD = 1.277), with methods such as FullGrad and XGradCAM achieving worst scores of 2.000 (SD = 1.300) and 1.858 (SD = 1.198) respectively. Despite variations in saliency visualization, no single method universally improved veterinarians' diagnostic confidence. While certain CAM methods provide better visual cues for some pathologies, they generally offered limited explainability and didn't substantially improve veterinarians' diagnostic confidence.

X-Ray Classification Methodology In Silico Reproducibility

Multimodal ensemble machine learning predicts neurological outcome within three hours after out of hospital cardiac arrest.

Kawai Y, Yamamoto K, Tsuruta K, Miyazaki K, Asai H, Fukushima H

•papers•Aug 13 2025

This study aimed to determine if an ensemble (stacking) model that integrates three independently developed base models can reliably predict patients' neurological outcomes following out-of-hospital cardiac arrest (OHCA) within 3 h of arrival and outperform each individual model. This retrospective study included patients with OHCA (≥ 18 years) admitted directly to Nara Medical University between April 2015 and March 2024 who remained comatose for ≥ 3 h after arrival and had suitable head computed tomography (CT) images. The area under the receiver operating characteristic curve (AUC) and Briers scores were used to evaluate the performance of four models (resuscitation-related background OHCA score factors, bilateral pupil diameter, single-slice head CT within 3 h of arrival, and an ensemble stacked model combining these three models) in predicting favourable neurological outcomes at hospital discharge or 1 month, as defined by a Cerebral Performance Category scale of 1-2. Among 533 patients, 82 (15%) had favourable outcomes. The OHCA, pupil, and head CT models yielded AUCs of 0.76, 0.65, and 0.68 with Brier scores of 0.11, 0.13, and 0.12, respectively. The ensemble model outperformed the other models (AUC, 0.82; Brier score, 0.10), thereby supporting its application for early clinical decision-making and optimising resource allocation.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab

In vivo variability of MRI radiomics features in prostate lesions assessed by a test-retest study with repositioning.

Zhang KS, Neelsen CJO, Wennmann M, Hielscher T, Kovacs B, Glemser PA, Görtz M, Stenzinger A, Maier-Hein KH, Huber J, Schlemmer HP, Bonekamp D

•papers•Aug 13 2025

Despite academic success, radiomics-based machine learning algorithms have not reached clinical practice, partially due to limited repeatability/reproducibility. To address this issue, this work aims to identify a stable subset of radiomics features in prostate MRI for radiomics modelling. A prospective study was conducted in 43 patients who received a clinical MRI examination and a research exam with repetition of T2-weighted and two different diffusion-weighted imaging (DWI) sequences with repositioning in between. Radiomics feature (RF) extraction was performed from MRI segmentations accounting for intra-rater and inter-rater effects, and three different image normalization methods were compared. Stability of RFs was assessed using the concordance correlation coefficient (CCC) for different comparisons: rater effects, inter-scan (before and after repositioning) and inter-sequence (between the two diffusion-weighted sequences) variability. In total, only 64 out of 321 (~ 20%) extracted features demonstrated stability, defined as CCC ≥ 0.75 in all settings (5 high-b value, 7 ADC- and 52 T2-derived features). For DWI, primarily intensity-based features proved stable with no shape feature passing the CCC threshold. T2-weighted images possessed the largest number of stable features with multiple shape (7), intensity-based (7) and texture features (28). Z-score normalization for high-b value images and muscle-normalization for T2-weighted images were identified as suitable.

MRI Segmentation Abdominal Prospective In Silico Academic Lab

Machine Learning-Driven Radiomic Profiling of Thalamus-Amygdala Nuclei for Prediction of Postoperative Delirium After STN-DBS in Parkinson's Disease Patients: A Pilot Study.

Radziunas A, Davidavicius G, Reinyte K, Pranckeviciene A, Fedaravicius A, Kucinskas V, Laucius O, Tamasauskas A, Deltuva V, Saudargiene A

•papers•Aug 13 2025

Postoperative delirium is a common complication following sub-thalamic nucleus deep brain stimulation surgery in Parkinson's disease patients. Postoperative delirium has been shown to prolong hospital stays, harm cognitive function, and negatively impact outcomes. Utilizing radiomics as a predictive tool for identifying patients at risk of delirium is a novel and personalized approach. This pilot study analyzed preoperative T1-weighted and T2-weighted magnetic resonance images from 34 Parkinson's disease patients, which were used to segment the thalamus, amygdala, and hippocampus, resulting in 10,680 extracted radiomic features. Feature selection using the minimum redundancy maximal relevance method identified the 20 most informative features, which were input into eight different machine learning algorithms. A high predictive accuracy of postoperative delirium was achieved by applying regularized binary logistic regression and linear discriminant analysis and using 10 most informative radiomic features. Regularized logistic regression resulted in 96.97% (±6.20) balanced accuracy, 99.5% (±4.97) sensitivity, 94.43% (±10.70) specificity, and area under the receiver operating characteristic curve of 0.97 (±0.06). Linear discriminant analysis showed 98.42% (±6.57) balanced accuracy, 98.00% (±9.80) sensitivity, 98.83% (±4.63) specificity, and area under the receiver operating characteristic curve of 0.98 (±0.07). The feed-forward neural network also demonstrated strong predictive capacity, achieving 96.17% (±10.40) balanced accuracy, 94.5% (±19.87) sensitivity, 97.83% (±7.87) specificity, and an area under the receiver operating characteristic curve of 0.96 (±0.10). However, when the feature set was extended to 20 features, both logistic regression and linear discriminant analysis showed reduced performance, while the feed-forward neural network achieved the highest predictive accuracy of 99.28% (±2.71), with 100.0% (±0.00) sensitivity, 98.57% (±5.42) specificity, and an area under the receiver operating characteristic curve of 0.99 (±0.03). Selected radiomic features might indicate network dysfunction between thalamic laterodorsal, reuniens medial ventral, and amygdala basal nuclei with hippocampus cornu ammonis 4 in these patients. This finding expands previous research suggesting the importance of the thalamic-hippocampal-amygdala network for postoperative delirium due to alterations in neuronal activity.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

A stacking ensemble framework integrating radiomics and deep learning for prognostic prediction in head and neck cancer.

Wang B, Liu J, Zhang X, Lin J, Li S, Wang Z, Cao Z, Wen D, Liu T, Ramli HRH, Harith HH, Hasan WZW, Dong X

•papers•Aug 13 2025

Radiomics models frequently face challenges related to reproducibility and robustness. To address these issues, we propose a multimodal, multi-model fusion framework utilizing stacking ensemble learning for prognostic prediction in head and neck cancer (HNC). This approach seeks to improve the accuracy and reliability of survival predictions. A total of 806 cases from nine centers were collected; 143 cases from two centers were assigned as the external validation cohort, while the remaining 663 were stratified and randomly split into training (n = 530) and internal validation (n = 133) sets. Radiomics features were extracted according to IBSI standards, and deep learning features were obtained using a 3D DenseNet-121 model. Following feature selection, the selected features were input into Cox, SVM, RSF, DeepCox, and DeepSurv models. A stacking fusion strategy was employed to develop the prognostic model. Model performance was evaluated using Kaplan-Meier survival curves and time-dependent ROC curves. On the external validation set, the model using combined PET and CT radiomics features achieved superior performance compared to single-modality models, with the RSF model obtaining the highest concordance index (C-index) of 0.7302. When using deep features extracted by 3D DenseNet-121, the PET + CT-based models demonstrated significantly improved prognostic accuracy, with Deepsurv and DeepCox achieving C-indices of 0.9217 and 0.9208, respectively. In stacking models, the PET + CT model using only radiomics features reached a C-index of 0.7324, while the deep feature-based stacking model achieved 0.9319. The best performance was obtained by the multi-feature fusion model, which integrated both radiomics and deep learning features from PET and CT, yielding a C-index of 0.9345. Kaplan-Meier survival analysis further confirmed the fusion model's ability to distinguish between high-risk and low-risk groups. The stacking-based ensemble model demonstrates superior performance compared to individual machine learning models, markedly improving the robustness of prognostic predictions.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

ES-UNet: efficient 3D medical image segmentation with enhanced skip connections in 3D UNet.

Park M, Oh S, Park J, Jeong T, Yu S

•papers•Aug 13 2025

Deep learning has significantly advanced medical image analysis, particularly in semantic segmentation, which is essential for clinical decisions. However, existing 3D segmentation models, like the traditional 3D UNet, face challenges in balancing computational efficiency and accuracy when processing volumetric medical data. This study aims to develop an improved architecture for 3D medical image segmentation with enhanced learning strategies to improve accuracy and address challenges related to limited training data. We propose ES-UNet, a 3D segmentation architecture that achieves superior segmentation performance while offering competitive efficiency across multiple computational metrics, including memory usage, inference time, and parameter count. The model builds upon the full-scale skip connection design of UNet3+ by integrating channel attention modules into each encoder-to-decoder path and incorporating full-scale deep supervision to enhance multi-resolution feature learning. We further introduce Region Specific Scaling (RSS), a data augmentation method that adaptively applies geometric transformations to annotated regions, and a Dynamically Weighted Dice (DWD) loss to improve the balance between precision and recall. The model was evaluated on the MICCAI HECKTOR dataset, and additional validation was conducted on selected tasks from the Medical Segmentation Decathlon (MSD). On the HECKTOR dataset, ES-UNet achieved a Dice Similarity Coefficient (DSC) of 76.87%, outperforming baseline models including 3D UNet, 3D UNet 3+, nnUNet, and Swin UNETR. Ablation studies showed that RSS and DWD contributed up to 1.22% and 1.06% improvement in DSC, respectively. A sensitivity analysis demonstrated that the chosen scaling range in RSS offered a favorable trade-off between deformation and anatomical plausibility. Cross-dataset evaluation on MSD Heart and Spleen tasks also indicated strong generalization. Computational analysis revealed that ES-UNet achieves superior segmentation performance with moderate computational demands. Specifically, the enhanced skip connection design with lightweight channel attention modules integrated throughout the network architecture enables this favorable balance between high segmentation accuracy and computational efficiency. ES-UNet integrates architectural and algorithmic improvements to achieve robust 3D medical image segmentation. While the framework incorporates established components, its core contributions lie in the optimized skip connection strategy and supporting techniques like RSS and DWD. Future work will explore adaptive scaling strategies and broader validation across diverse imaging modalities.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

GazeLT: Visual attention-guided long-tailed disease classification in chest radiographs

SKOOTS: Skeleton oriented object segmentation for mitochondria

BSA-Net: Boundary-prioritized spatial adaptive network for efficient left atrial segmentation.

Comparative evaluation of CAM methods for enhancing explainability in veterinary radiography.

Multimodal ensemble machine learning predicts neurological outcome within three hours after out of hospital cardiac arrest.

In vivo variability of MRI radiomics features in prostate lesions assessed by a test-retest study with repositioning.

Machine Learning-Driven Radiomic Profiling of Thalamus-Amygdala Nuclei for Prediction of Postoperative Delirium After STN-DBS in Parkinson's Disease Patients: A Pilot Study.

A stacking ensemble framework integrating radiomics and deep learning for prognostic prediction in head and neck cancer.

ES-UNet: efficient 3D medical image segmentation with enhanced skip connections in 3D UNet.

Ready to Sharpen Your Edge?