Latest Papers on Radiology AI. Tags: None

Longitudinal Validation of a Deep Learning Index for Aortic Stenosis Progression

Park, J., Kim, J., Yoon, Y. E., Jeon, J., Lee, S.-A., Choi, H.-M., Hwang, I.-C., Cho, G.-Y., Chang, H.-J., Park, J.-H.

•preprint•May 19 2025

AimsAortic stenosis (AS) is a progressive disease requiring timely monitoring and intervention. While transthoracic echocardiography (TTE) remains the diagnostic standard, deep learning (DL)-based approaches offer potential for improved disease tracking. This study examined the longitudinal changes in a previously developed DL-derived index for AS continuum (DLi-ASc) and assessed its value in predicting progression to severe AS. Methods and ResultsWe retrospectively analysed 2,373 patients a(7,371 TTEs) from two tertiary hospitals. DLi-ASc (scaled 0-100), derived from parasternal long- and/or short-axis views, was tracked longitudinally. DLi-ASc increased in parallel with worsening AS stages (p for trend <0.001) and showed strong correlations with AV maximal velocity (Vmax) (Pearson correlation coefficients [PCC] = 0.69, p<0.001) and mean pressure gradient (mPG) (PCC = 0.66, p<0.001). Higher baseline DLi-ASc was associated with a faster AS progression rate (p for trend <0.001). Additionally, the annualised change in DLi-ASc, estimated using linear mixed-effect models, correlated strongly with the annualised progression of AV Vmax (PCC = 0.71, p<0.001) and mPG (PCC = 0.68, p<0.001). In Fine-Gray competing risk models, baseline DLi-ASc independently predicted progression to severe AS, even after adjustment for AV Vmax or mPG (hazard ratio per 10-point increase = 2.38 and 2.80, respectively) ConclusionDLi-ASc increased in parallel with AS progression and independently predicted severe AS progression. These findings support its role as a non-invasive imaging-based digital marker for longitudinal AS monitoring and risk stratification.

Ultrasound Classification Cardiac Retrospective Clinical In Silico Academic Lab

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

Yinghao Zhu, Ziyi He, Haoran Hu, Xiaochen Zheng, Xichen Zhang, Zixiang Wang, Junyi Gao, Liantao Ma, Lequan Yu

•preprint•May 18 2025

The rapid advancement of Large Language Models (LLMs) has stimulated interest in multi-agent collaboration for addressing complex medical tasks. However, the practical advantages of multi-agent collaboration approaches remain insufficiently understood. Existing evaluations often lack generalizability, failing to cover diverse tasks reflective of real-world clinical practice, and frequently omit rigorous comparisons against both single-LLM-based and established conventional methods. To address this critical gap, we introduce MedAgentBoard, a comprehensive benchmark for the systematic evaluation of multi-agent collaboration, single-LLM, and conventional approaches. MedAgentBoard encompasses four diverse medical task categories: (1) medical (visual) question answering, (2) lay summary generation, (3) structured Electronic Health Record (EHR) predictive modeling, and (4) clinical workflow automation, across text, medical images, and structured EHR data. Our extensive experiments reveal a nuanced landscape: while multi-agent collaboration demonstrates benefits in specific scenarios, such as enhancing task completeness in clinical workflow automation, it does not consistently outperform advanced single LLMs (e.g., in textual medical QA) or, critically, specialized conventional methods that generally maintain better performance in tasks like medical VQA and EHR-based prediction. MedAgentBoard offers a vital resource and actionable insights, emphasizing the necessity of a task-specific, evidence-based approach to selecting and developing AI solutions in medicine. It underscores that the inherent complexity and overhead of multi-agent collaboration must be carefully weighed against tangible performance gains. All code, datasets, detailed prompts, and experimental results are open-sourced at https://medagentboard.netlify.app/.

Mixed Modality Classification Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset Open Code

CTLformer: A Hybrid Denoising Model Combining Convolutional Layers and Self-Attention for Enhanced CT Image Reconstruction

Zhiting Zheng, Shuqi Wu, Wen Ding

•preprint•May 18 2025

Low-dose CT (LDCT) images are often accompanied by significant noise, which negatively impacts image quality and subsequent diagnostic accuracy. To address the challenges of multi-scale feature fusion and diverse noise distribution patterns in LDCT denoising, this paper introduces an innovative model, CTLformer, which combines convolutional structures with transformer architecture. Two key innovations are proposed: a multi-scale attention mechanism and a dynamic attention control mechanism. The multi-scale attention mechanism, implemented through the Token2Token mechanism and self-attention interaction modules, effectively captures both fine details and global structures at different scales, enhancing relevant features and suppressing noise. The dynamic attention control mechanism adapts the attention distribution based on the noise characteristics of the input image, focusing on high-noise regions while preserving details in low-noise areas, thereby enhancing robustness and improving denoising performance. Furthermore, CTLformer integrates convolutional layers for efficient feature extraction and uses overlapping inference to mitigate boundary artifacts, further strengthening its denoising capability. Experimental results on the 2016 National Institutes of Health AAPM Mayo Clinic LDCT Challenge dataset demonstrate that CTLformer significantly outperforms existing methods in both denoising performance and model efficiency, greatly improving the quality of LDCT images. The proposed CTLformer not only provides an efficient solution for LDCT denoising but also shows broad potential in medical image analysis, especially for clinical applications dealing with complex noise patterns.

CT Reconstruction Methodology In Silico Academic Lab

Harnessing Artificial Intelligence for Accurate Diagnosis and Radiomics Analysis of Combined Pulmonary Fibrosis and Emphysema: Insights from a Multicenter Cohort Study

Zhang, S., Wang, H., Tang, H., Li, X., Wu, N.-W., Lang, Q., Li, B., Zhu, H., Chen, X., Chen, K., Xie, B., Zhou, A., Mo, C.

•preprint•May 18 2025

Combined Pulmonary Fibrosis and Emphysema (CPFE), formally recognized as a distinct pulmonary syndrome in 2022, is characterized by unique clinical features and pathogenesis that may lead to respiratory failure and death. However, the diagnosis of CPFE presents significant challenges that hinder effective treatment. Here, we assembled three-dimensional (3D) reconstruction data of the chest High-Resolution Computed Tomography (HRCT) of patients from multiple hospitals across different provinces in China, including Xiangya Hospital, West China Hospital, and Fujian Provincial Hospital. Using this dataset, we developed CPFENet, a deep learning-based diagnostic model for CPFE. It accurately differentiates CPFE from COPD, with performance comparable to that of professional radiologists. Additionally, we developed a CPFE score based on radiomic analysis of 3D CT images to quantify disease characteristics. Notably, female patients demonstrated significantly higher CPFE scores than males, suggesting potential sex-specific differences in CPFE. Overall, our study establishes the first diagnostic framework for CPFE, providing a diagnostic model and clinical indicators that enable accurate classification and characterization of the syndrome.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

Jingkun Yue, Siqi Zhang, Zinan Jia, Huihuan Xu, Zongbo Han, Xiaohong Liu, Guangyu Wang

•preprint•May 17 2025

Visual grounding is essential for precise perception and reasoning in multimodal large language models (MLLMs), especially in medical imaging domains. While existing medical visual grounding benchmarks primarily focus on single-image scenarios, real-world clinical applications often involve sequential images, where accurate lesion localization across different modalities and temporal tracking of disease progression (e.g., pre- vs. post-treatment comparison) require fine-grained cross-image semantic alignment and context-aware reasoning. To remedy the underrepresentation of image sequences in existing medical visual grounding benchmarks, we propose MedSG-Bench, the first benchmark tailored for Medical Image Sequences Grounding. It comprises eight VQA-style tasks, formulated into two paradigms of the grounding tasks, including 1) Image Difference Grounding, which focuses on detecting change regions across images, and 2) Image Consistency Grounding, which emphasizes detection of consistent or shared semantics across sequential images. MedSG-Bench covers 76 public datasets, 10 medical imaging modalities, and a wide spectrum of anatomical structures and diseases, totaling 9,630 question-answer pairs. We benchmark both general-purpose MLLMs (e.g., Qwen2.5-VL) and medical-domain specialized MLLMs (e.g., HuatuoGPT-vision), observing that even the advanced models exhibit substantial limitations in medical sequential grounding tasks. To advance this field, we construct MedSG-188K, a large-scale instruction-tuning dataset tailored for sequential visual grounding, and further develop MedSeq-Grounder, an MLLM designed to facilitate future research on fine-grained understanding across medical sequential images. The benchmark, dataset, and model are available at https://huggingface.co/MedSG-Bench

Mixed Modality Detection Whole Body Dataset Release In Silico Academic Lab Open Dataset Open Code GenAI

Patient-Specific Autoregressive Models for Organ Motion Prediction in Radiotherapy

Yuxiang Lai, Jike Zhong, Vanessa Su, Xiaofeng Yang

•preprint•May 17 2025

Radiotherapy often involves a prolonged treatment period. During this time, patients may experience organ motion due to breathing and other physiological factors. Predicting and modeling this motion before treatment is crucial for ensuring precise radiation delivery. However, existing pre-treatment organ motion prediction methods primarily rely on deformation analysis using principal component analysis (PCA), which is highly dependent on registration quality and struggles to capture periodic temporal dynamics for motion modeling.In this paper, we observe that organ motion prediction closely resembles an autoregressive process, a technique widely used in natural language processing (NLP). Autoregressive models predict the next token based on previous inputs, naturally aligning with our objective of predicting future organ motion phases. Building on this insight, we reformulate organ motion prediction as an autoregressive process to better capture patient-specific motion patterns. Specifically, we acquire 4D CT scans for each patient before treatment, with each sequence comprising multiple 3D CT phases. These phases are fed into the autoregressive model to predict future phases based on prior phase motion patterns. We evaluate our method on a real-world test set of 4D CT scans from 50 patients who underwent radiotherapy at our institution and a public dataset containing 4D CT scans from 20 patients (some with multiple scans), totaling over 1,300 3D CT phases. The performance in predicting the motion of the lung and heart surpasses existing benchmarks, demonstrating its effectiveness in capturing motion dynamics from CT images. These results highlight the potential of our method to improve pre-treatment planning in radiotherapy, enabling more precise and adaptive radiation delivery.

CT Detection Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Foundation versus Domain-Specific Models for Left Ventricular Segmentation on Cardiac Ultrasound

Chao, C.-J., Gu, Y., Kumar, W., Xiang, T., Appari, L., Wu, J., Farina, J. M., Wraith, R., Jeong, J., Arsanjani, R., Garvan, K. C., Oh, J. K., Langlotz, C. P., Banerjee, I., Li, F.-F., Adeli, E.

•preprint•May 17 2025

The Segment Anything Model (SAM) was fine-tuned on the EchoNet-Dynamic dataset and evaluated on external transthoracic echocardiography (TTE) and Point-of-Care Ultrasound (POCUS) datasets from CAMUS (University Hospital of St Etienne) and Mayo Clinic (99 patients: 58 TTE, 41 POCUS). Fine-tuned SAM was superior or comparable to MedSAM. The fine-tuned SAM also outperformed EchoNet and U-Net models, demonstrating strong generalization, especially on apical 2-chamber (A2C) images (fine-tuned SAM vs. EchoNet: CAMUS-A2C: DSC 0.891 {+/-} 0.040 vs. 0.752 {+/-} 0.196, p<0.0001) and POCUS (DSC 0.857 {+/-} 0.047 vs. 0.667 {+/-} 0.279, p<0.0001). Additionally, SAM-enhanced workflow reduced annotation time by 50% (11.6 {+/-} 4.5 sec vs. 5.7 {+/-} 1.7 sec, p<0.0001) while maintaining segmentation quality. We demonstrated an effective strategy for fine-tuning a vision foundation model for enhancing clinical workflow efficiency and supporting human-AI collaboration.

Ultrasound Segmentation Cardiac Retrospective Clinical In Silico Academic Lab GenAI

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

Ramadanov N, John P, Hable R, Schreyer AG, Shabo S, Prill R, Salzmann M

•papers•May 16 2025

The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. We retrospectively analysed all wrist radiographs taken in our hospital since the introduction of AI-guided fracture detection from 11 September 2023 to 10 September 2024. The ground truth was defined by the radiological report of a board-certified radiologist based solely on conventional radiographs. The following parameters were calculated: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), accuracy (%), Cohen's Kappa coefficient, F1 score, sensitivity (%), specificity (%), Youden Index (J Statistic). In total 1145 plain radiographs of the wrist were taken between 11 September 2023 and 10 September 2024. The mean age of the included patients was 46.6 years (± 27.3), ranging from 2 to 99 years and 59.0% were female. According to the ground truth, of the 556 anteroposterior (AP) radiographs, 225 cases (40.5%) had a DRF, and of the 589 lateral view radiographs, 240 cases (40.7%) had a DRF. The AI system showed the following results on AP radiographs: accuracy (%): 95.90; Cohen's Kappa: 0.913; F1 score: 0.947; sensitivity (%): 92.02; specificity (%): 98.45; Youden Index: 90.47. The orthopedic surgeon achieved a sensitivity of 91.5%, specificity of 97.8%, an overall accuracy of 95.1%, F1 score of 0.943, and Cohen's kappa of 0.901. These results were comparable to those of the AI model. AI-guided detection of DRF demonstrated diagnostic performance nearly identical to that of an experienced orthopedic surgeon across all key metrics. The marginal differences observed in sensitivity and specificity suggest that AI can reliably support clinical fracture assessment based solely on conventional radiographs.

X-Ray Detection Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab

UGoDIT: Unsupervised Group Deep Image Prior Via Transferable Weights

Shijun Liang, Ismail R. Alkhouri, Siddhant Gautam, Qing Qu, Saiprasad Ravishankar

•preprint•May 16 2025

Recent advances in data-centric deep generative models have led to significant progress in solving inverse imaging problems. However, these models (e.g., diffusion models (DMs)) typically require large amounts of fully sampled (clean) training data, which is often impractical in medical and scientific settings such as dynamic imaging. On the other hand, training-data-free approaches like the Deep Image Prior (DIP) do not require clean ground-truth images but suffer from noise overfitting and can be computationally expensive as the network parameters need to be optimized for each measurement set independently. Moreover, DIP-based methods often overlook the potential of learning a prior using a small number of sub-sampled measurements (or degraded images) available during training. In this paper, we propose UGoDIT, an Unsupervised Group DIP via Transferable weights, designed for the low-data regime where only a very small number, M, of sub-sampled measurement vectors are available during training. Our method learns a set of transferable weights by optimizing a shared encoder and M disentangled decoders. At test time, we reconstruct the unseen degraded image using a DIP network, where part of the parameters are fixed to the learned weights, while the remaining are optimized to enforce measurement consistency. We evaluate UGoDIT on both medical (multi-coil MRI) and natural (super resolution and non-linear deblurring) image recovery tasks under various settings. Compared to recent standalone DIP methods, UGoDIT provides accelerated convergence and notable improvement in reconstruction quality. Furthermore, our method achieves performance competitive with SOTA DM-based and supervised approaches, despite not requiring large amounts of clean training data.

MRI Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Diff-Unfolding: A Model-Based Score Learning Framework for Inverse Problems

Yuanhao Wang, Shirin Shoushtari, Ulugbek S. Kamilov

•preprint•May 16 2025

Diffusion models are extensively used for modeling image priors for inverse problems. We introduce \emph{Diff-Unfolding}, a principled framework for learning posterior score functions of \emph{conditional diffusion models} by explicitly incorporating the physical measurement operator into a modular network architecture. Diff-Unfolding formulates posterior score learning as the training of an unrolled optimization scheme, where the measurement model is decoupled from the learned image prior. This design allows our method to generalize across inverse problems at inference time by simply replacing the forward operator without retraining. We theoretically justify our unrolling approach by showing that the posterior score can be derived from a composite model-based optimization formulation. Extensive experiments on image restoration and accelerated MRI show that Diff-Unfolding achieves state-of-the-art performance, improving PSNR by up to 2 dB and reducing LPIPS by $22.7\%$, while being both compact (47M parameters) and efficient (0.72 seconds per $256 \times 256$ image). An optimized C++/LibTorch implementation further reduces inference time to 0.63 seconds, underscoring the practicality of our approach.

MRI Reconstruction Methodology In Silico Academic Lab Benchmark SOTA Open Code

Filter Papers

Tags

Longitudinal Validation of a Deep Learning Index for Aortic Stenosis Progression

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

CTLformer: A Hybrid Denoising Model Combining Convolutional Layers and Self-Attention for Enhanced CT Image Reconstruction

Harnessing Artificial Intelligence for Accurate Diagnosis and Radiomics Analysis of Combined Pulmonary Fibrosis and Emphysema: Insights from a Multicenter Cohort Study

MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

Patient-Specific Autoregressive Models for Organ Motion Prediction in Radiotherapy

Foundation versus Domain-Specific Models for Left Ventricular Segmentation on Cardiac Ultrasound

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

UGoDIT: Unsupervised Group Deep Image Prior Via Transferable Weights

Diff-Unfolding: A Model-Based Score Learning Framework for Inverse Problems

Ready to Sharpen Your Edge?