Latest Papers on Radiology AI. Tags: Mixed Modality

Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Meritxell Riera-Marin, Sikha O K, Julia Rodriguez-Comas, Matthias Stefan May, Zhaohong Pan, Xiang Zhou, Xiaokun Liang, Franciskus Xaverius Erick, Andrea Prenner, Cedric Hemon, Valentin Boussot, Jean-Louis Dillenseger, Jean-Claude Nunes, Abdul Qayyum, Moona Mazher, Steven A Niederer, Kaisar Kushibar, Carlos Martin-Isla, Petia Radeva, Karim Lekadir, Theodore Barfoot, Luis C. Garcia Peraza Herrera, Ben Glocker, Tom Vercauteren, Lucas Gago, Justin Englemann, Joy-Marie Kleiss, Anton Aubanell, Andreu Antolin, Javier Garcia-Lopez, Miguel A. Gonzalez Ballester, Adrian Galdran

•preprint•May 13 2025

Deep learning (DL) has become the dominant approach for medical image segmentation, yet ensuring the reliability and clinical applicability of these models requires addressing key challenges such as annotation variability, calibration, and uncertainty estimation. This is why we created the Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS), which highlights the critical role of multiple annotators in establishing a more comprehensive ground truth, emphasizing that segmentation is inherently subjective and that leveraging inter-annotator variability is essential for robust model evaluation. Seven teams participated in the challenge, submitting a variety of DL models evaluated using metrics such as Dice Similarity Coefficient (DSC), Expected Calibration Error (ECE), and Continuous Ranked Probability Score (CRPS). By incorporating consensus and dissensus ground truth, we assess how DL models handle uncertainty and whether their confidence estimates align with true segmentation performance. Our findings reinforce the importance of well-calibrated models, as better calibration is strongly correlated with the quality of the results. Furthermore, we demonstrate that segmentation models trained on diverse datasets and enriched with pre-trained knowledge exhibit greater robustness, particularly in cases deviating from standard anatomical structures. Notably, the best-performing models achieved high DSC and well-calibrated uncertainty estimates. This work underscores the need for multi-annotator ground truth, thorough calibration assessments, and uncertainty-aware evaluations to develop trustworthy and clinically reliable DL-based medical image segmentation models.

Mixed Modality Segmentation Whole Body Retrospective Clinical In Silico Consortium Benchmark SOTA Reproducibility

Deep Learning-Derived Cardiac Chamber Volumes and Mass From PET/CT Attenuation Scans: Associations With Myocardial Flow Reserve and Heart Failure.

Hijazi W, Shanbhag A, Miller RJH, Kavanagh PB, Killekar A, Lemley M, Wopperer S, Knight S, Le VT, Mason S, Acampa W, Rosamond T, Dey D, Berman DS, Chareonthaitawee P, Di Carli MF, Slomka PJ

•papers•May 13 2025

Computed tomography (CT) attenuation correction scans are an intrinsic part of positron emission tomography (PET) myocardial perfusion imaging using PET/CT, but anatomic information is rarely derived from these ultralow-dose CT scans. We aimed to assess the association between deep learning-derived cardiac chamber volumes (right atrial, right ventricular, left ventricular, and left atrial) and mass (left ventricular) from these scans with myocardial flow reserve and heart failure hospitalization. We included 18 079 patients with consecutive cardiac PET/CT from 6 sites. A deep learning model estimated cardiac chamber volumes and left ventricular mass from computed tomography attenuation correction imaging. Associations between deep learning-derived CT mass and volumes with heart failure hospitalization and reduced myocardial flow reserve were assessed in a multivariable analysis. During a median follow-up of 4.3 years, 1721 (9.5%) patients experienced heart failure hospitalization. Patients with 3 or 4 abnormal chamber volumes were 7× more likely to be hospitalized for heart failure compared with patients with normal volumes. In adjusted analyses, left atrial volume (hazard ratio [HR], 1.25 [95% CI, 1.19-1.30]), right atrial volume (HR, 1.29 [95% CI, 1.23-1.35]), right ventricular volume (HR, 1.25 [95% CI, 1.20-1.31]), left ventricular volume (HR, 1.27 [95% CI, 1.23-1.35]), and left ventricular mass (HR, 1.25 [95% CI, 1.18-1.32]) were independently associated with heart failure hospitalization. In multivariable analyses, left atrial volume (odds ratio, 1.14 [95% CI, 1.0-1.19]) and ventricular mass (odds ratio, 1.12 [95% CI, 1.6-1.17]) were independent predictors of reduced myocardial flow reserve. Deep learning-derived chamber volumes and left ventricular mass from computed tomography attenuation correction were predictive of heart failure hospitalization and reduced myocardial flow reserve in patients undergoing cardiac PET perfusion imaging. This anatomic data can be routinely reported along with other PET/CT parameters to improve risk prediction.

Mixed Modality Segmentation Cardiac Retrospective Clinical In Silico Academic Lab

A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.

Uldin H, Saran S, Gandikota G, Iyengar KP, Vaishya R, Parmar Y, Rasul F, Botchu R

•papers•May 12 2025

Artificial Intelligence (AI) has transformed society and chatbots using Large Language Models (LLM) are playing an increasing role in scientific research. This study aims to assess and compare the efficacy of newer DeepSeek R1 and ChatGPT-4 and 4o models in answering scientific questions about recent research. We compared output generated from ChatGPT-4, ChatGPT-4o, and DeepSeek-R1 in response to ten standardized questions in the setting of musculoskeletal (MSK) radiology. These were independently analyzed by one MSK radiologist and one final-year MSK radiology trainee and graded using a Likert scale from 1 to 5 (1 being inaccurate to 5 being accurate). Five DeepSeek answers were significantly inaccurate and provided fictitious references only on prompting. All ChatGPT-4 and 4o answers were well-written with good content, the latter including useful and comprehensive references. ChatGPT-4o generates structured research answers to questions on recent MSK radiology research with useful references in all our cases, enabling reliable usage. DeepSeek-R1 generates articles that, on the other hand, may appear authentic to the unsuspecting eye but contain a higher amount of falsified and inaccurate information in the current version. Further iterations may improve these accuracies.

Mixed Modality LLM Radiology Report Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI Reproducibility

[Pulmonary vascular interventions: innovating through adaptation and advancing through differentiation].

Li J, Wan J

•papers•May 12 2025

Pulmonary vascular intervention technology, with its minimally invasive and precise advantages, has been a groundbreaking advancement in the treatment of pulmonary vascular diseases. Techniques such as balloon pulmonary angioplasty (BPA), pulmonary artery stenting, and percutaneous pulmonary artery denervation (PADN) have significantly improved the prognoses for conditions such as chronic thromboembolic pulmonary hypertension (CTEPH), pulmonary artery stenosis, and pulmonary arterial hypertension (PAH). Although based on coronary intervention (PCI) techniques such as guidewire manipulation and balloon dilatation, pulmonary vascular interventions require specific modifications to address the unique characteristics of the pulmonary circulation, low pressure, thin-walled vessels, and complex branching, to mitigate risks of perforation and thrombosis. Future directions include the development of dedicated instruments, multi-modality imaging guidance, artificial intelligence-assisted procedures, and molecular interventional therapies. These innovations aim to establish an independent theoretical framework for pulmonary vascular interventions, facilitating their transition from "adjuvant therapies" to "core treatments" in clinical practice.

Mixed Modality Detection Cardiac Review Concept Academic Lab

BodyGPS: Anatomical Positioning System

Halid Ziya Yerebakan, Kritika Iyer, Xueqi Guo, Yoshihisa Shinagawa, Gerardo Hermosillo Valadez

•preprint•May 12 2025

We introduce a new type of foundational model for parsing human anatomy in medical images that works for different modalities. It supports supervised or unsupervised training and can perform matching, registration, classification, or segmentation with or without user interaction. We achieve this by training a neural network estimator that maps query locations to atlas coordinates via regression. Efficiency is improved by sparsely sampling the input, enabling response times of less than 1 ms without additional accelerator hardware. We demonstrate the utility of the algorithm in both CT and MRI modalities.

Mixed Modality Registration Whole Body Methodology In Silico Breakthrough

ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao

•preprint•May 12 2025

Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for preserving modality-specific edge and texture details, and Mamba's selective state-space modeling for efficient long- and short-range feature dependencies. Structurally, our dual-resolution framework leverages SAM2's image encoder to capture organ-scale semantics from high-resolution inputs, while a parallel CNNs branch extracts fine-grained local features. The Robust Feature Fusion Network (RFFN) integrates these epresentations, and the Bidirectional Mamba Residual Network (BMRN) models spatial dependencies using spiral scanning and bidirectional state-space dynamics. A three-stage skip fusion decoder enhances edge and texture fidelity. We employ Efficient Low-Rank Adaptation (LoRA+) fine-tuning to enable precise domain specialization while maintaining the foundational capabilities of the pre-trained components. Extensive experimental validation on the SynthRAD2023 and BraTS2019 datasets demonstrates that ABS-Mamba outperforms state-of-the-art methods, delivering high-fidelity cross-modal synthesis that preserves anatomical semantics and structural details to enhance diagnostic accuracy in clinical applications. The code is available at https://github.com/gatina-yone/ABS-Mamba

Mixed Modality Image Synthesis Neurological Methodology In Silico Open Code Benchmark SOTA

Automatic Quantification of Ki-67 Labeling Index in Pediatric Brain Tumors Using QuPath

Spyretos, C., Pardo Ladino, J. M., Blomstrand, H., Nyman, P., Snodahl, O., Shamikh, A., Elander, N. O., Haj-Hosseini, N.

•preprint•May 12 2025

AO_SCPLOWBSTRACTC_SCPLOWThe quantification of the Ki-67 labeling index (LI) is critical for assessing tumor proliferation and prognosis in tumors, yet manual scoring remains a common practice. This study presents an automated workflow for Ki-67 scoring in whole slide images (WSIs) using an Apache Groovy code script for QuPath, complemented by a Python-based post-processing script, providing cell density maps and summary tables. The tissue and cell segmentation are performed using StarDist, a deep learning model, and adaptive thresholding to classify Ki-67 positive and negative nuclei. The pipeline was applied to a cohort of 632 pediatric brain tumor cases with 734 Ki-67-stained WSIs from the Childrens Brain Tumor Network. Medulloblastoma showed the highest Ki-67 LI (median: 19.84), followed by atypical teratoid rhabdoid tumor (median: 19.36). Moderate values were observed in brainstem glioma-diffuse intrinsic pontine glioma (median: 11.50), high-grade glioma (grades 3 & 4) (median: 9.50), and ependymoma (median: 5.88). Lower indices were found in meningioma (median: 1.84), while the lowest were seen in low-grade glioma (grades 1 & 2) (median: 0.85), dysembryoplastic neuroepithelial tumor (median: 0.63), and ganglioglioma (median: 0.50). The results aligned with the consensus of the oncology, demonstrating a significant correlation in Ki-67 LI across most of the tumor families/types, with high malignancy tumors showing the highest proliferation indices and lower malignancy tumors exhibiting lower Ki-67 LI. The automated approach facilitates the assessment of large amounts of Ki-67 WSIs in research settings.

Mixed Modality Segmentation Neurological Retrospective Clinical In Silico Academic Lab Open Code

Benchmarking Radiology Report Generation From Noisy Free-Texts.

Yuan Y, Zheng Y, Qu L

•papers•May 12 2025

Automatic radiology report generation can enhance diagnostic efficiency and accuracy. However, clean open-source imaging scan-report pairs are limited in scale and variety. Moreover, the vast amount of radiological texts available online is often too noisy to be directly employed. To address this challenge, we introduce a novel task called Noisy Report Refinement (NRR), which generates radiology reports from noisy free-texts. To achieve this, we propose a report refinement pipeline that leverages large language models (LLMs) enhanced with guided self-critique and report selection strategies. To address the inability of existing radiology report generation metrics in measuring cleanliness, radiological usefulness, and factual correctness across various modalities of reports in NRR task, we introduce a new benchmark, NRRBench, for NRR evaluation. This benchmark includes two online-sourced datasets and four clinically explainable LLM-based metrics: two metrics evaluate the matching rate of radiology entities and modality-specific template attributes respectively, one metric assesses report cleanliness, and a combined metric evaluates overall NRR performance. Experiments demonstrate that guided self-critique and report selection strategies significantly improve the quality of refined reports. Additionally, our proposed metrics show a much higher correlation with noisy rate and error count of reports than radiology report generation metrics in evaluating NRR.

Mixed Modality LLM Radiology Report Methodology In Silico Benchmark SOTA GenAI

LiteMIL: A Computationally Efficient Transformer-Based MIL for Cancer Subtyping on Whole Slide Images.

Kussaibi, H.

•preprint•May 12 2025

PurposeAccurate cancer subtyping is crucial for effective treatment; however, it presents challenges due to overlapping morphology and variability among pathologists. Although deep learning (DL) methods have shown potential, their application to gigapixel whole slide images (WSIs) is often hindered by high computational demands and the need for efficient, context-aware feature aggregation. This study introduces LiteMIL, a computationally efficient transformer-based multiple instance learning (MIL) network combined with Phikon, a pathology-tuned self-supervised feature extractor, for robust and scalable cancer subtyping on WSIs. MethodsInitially, patches were extracted from TCGA-THYM dataset (242 WSIs, six subtypes) and subsequently fed in real-time to Phikon for feature extraction. To train MILs, features were arranged into uniform bags using a chunking strategy that maintains tissue context while increasing training data. LiteMIL utilizes a learnable query vector within an optimized multi-head attention module for effective feature aggregation. The models performance was evaluated against established MIL methods on the Thymic Dataset and three additional TCGA datasets (breast, lung, and kidney cancer). ResultsLiteMIL achieved 0.89 {+/-} 0.01 F1 score and 0.99 AUC on Thymic dataset, outperforming other MILs. LiteMIL demonstrated strong generalizability across the external datasets, scoring the best on breast and kidney cancer datasets. Compared to TransMIL, LiteMIL significantly reduces training time and GPU memory usage. Ablation studies confirmed the critical role of the learnable query and layer normalization in enhancing performance and stability. ConclusionLiteMIL offers a resource-efficient, robust solution. Its streamlined architecture, combined with the compact Phikon features, makes it suitable for integrating into routine histopathological workflows, particularly in resource-limited settings.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Deep learning diagnosis of hepatic echinococcosis based on dual-modality plain CT and ultrasound images: a large-scale, multicenter, diagnostic study.

Zhang J, Zhang J, Tang H, Meng Y, Chen X, Chen J, Chen Y

•papers•May 12 2025

Given the current limited accuracy of imaging screening for Hepatic Echinococcosis (HCE) in under-resourced areas, the authors developed and validated a Multimodal Imaging system (HEAC) based on plain Computed Tomography (CT) combined with ultrasound for HCE screening in those areas. In this study, we developed a multimodal deep learning diagnostic system by integrating ultrasound and plain CT imaging data to differentiate hepatic echinococcosis, liver cysts, liver abscesses, and healthy liver conditions. We collected a dataset of 8979 cases spanning 18 years from eight hospitals in Xinjiang China, including both retrospective and prospective data. To enhance the robustness and generalization of the diagnostic model, after modeling CT and ultrasound images using EfficientNet3D and EfficientNet-B0, external and prospective tests were conducted, and the model's performance was compared with diagnoses made by experienced physicians. Across internal and external test sets, the fused model of CT and ultrasound consistently outperformed the individual modality models and physician diagnoses. In the prospective test set from the same center, the fusion model achieved an accuracy of 0.816, sensitivity of 0.849, specificity of 0.942, and an AUC of 0.963, significantly exceeding physician performance (accuracy 0.900, sensitivity 0.800, specificity 0.933). The external test sets across seven other centers demonstrated similar results, with the fusion model achieving an overall accuracy of 0.849, sensitivity of 0.859, specificity of 0.942, and AUC of 0.961. The multimodal deep learning diagnostic system that integrates CT and ultrasound significantly increases the diagnosis accuracy of HCE, liver cysts, and liver abscesses. It beats standard single-modal approaches and physician diagnoses by lowering misdiagnosis rates and increasing diagnostic reliability. It emphasizes the promise of multimodal imaging systems in tackling diagnostic issues in low-resource areas, opening the path for improved medical care accessibility and outcomes.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Deep Learning-Derived Cardiac Chamber Volumes and Mass From PET/CT Attenuation Scans: Associations With Myocardial Flow Reserve and Heart Failure.

A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.

[Pulmonary vascular interventions: innovating through adaptation and advancing through differentiation].

BodyGPS: Anatomical Positioning System

ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

Automatic Quantification of Ki-67 Labeling Index in Pediatric Brain Tumors Using QuPath

Benchmarking Radiology Report Generation From Noisy Free-Texts.

LiteMIL: A Computationally Efficient Transformer-Based MIL for Cancer Subtyping on Whole Slide Images.

Deep learning diagnosis of hepatic echinococcosis based on dual-modality plain CT and ultrasound images: a large-scale, multicenter, diagnostic study.

Ready to Sharpen Your Edge?