Latest Papers on Radiology AI. Tags: Mixed Modality

DeepSeek-assisted LI-RADS classification: AI-driven precision in hepatocellular carcinoma diagnosis.

Zhang J, Liu J, Guo M, Zhang X, Xiao W, Chen F

•papers•Jun 24 2025

The clinical utility of the DeepSeek-V3 (DSV3) model in enhancing the accuracy of Liver Imaging Reporting and Data System (LI-RADS, LR) classification remains underexplored. This study aimed to evaluate the diagnostic performance of DSV3 in LR classifications compared to radiologists with varying levels of experience and to assess its potential as a decision-support tool in clinical practice. A dual-phase retrospective-prospective study analyzed 426 liver lesions (300 retrospective, 126 prospective) in high-risk HCC patients who underwent Magnetic Resonance Imaging (MRI) or Computed Tomography (CT). Three radiologists (one junior, two seniors) independently classified lesions using LR v2018 criteria, while DSV3 analyzed unstructured radiology reports to generate corresponding classifications. In the prospective cohort, DSV3 processed inputs in both Chinese and English to evaluate language impact. Performance was compared using chi-square test or Fisher's exact test, with pathology as the gold standard. In the retrospective cohort, DSV3 significantly outperformed junior radiologists in diagnostically challenging categories: LR-3 (17.8% vs. 39.7%, p<0.05), LR-4 (80.4% vs. 46.2%, p<0.05), and LR-5 (86.2% vs. 66.7%, p<0.05), while showing comparable accuracy in LR-1 (90.8% vs. 88.7%), LR-2 (11.9% vs. 25.6%), and LR-M (79.5% vs. 62.1%) classifications (all p>0.05). Prospective validation confirmed these findings, with DSV3 demonstrating superior performance for LR-3 (13.3% vs. 60.0%), LR-4 (93.3% vs. 66.7%), and LR-5 (93.5% vs. 67.7%) compared to junior radiologists (all p<0.05). Notably, DSV3 achieved diagnostic parity with senior radiologists across all categories (p>0.05) and maintained consistent performance between Chinese and English inputs. The DSV3 model effectively improves diagnostic accuracy of LR-3 to LR-5 classifications among junior radiologists . Its language-independent performance and ability to match senior-level expertise suggest strong potential for clinical implementation to standardize HCC diagnosis and optimize treatment decisions.

Mixed Modality Classification Abdominal Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Prompt learning with bounding box constraints for medical image segmentation.

Gaillochet M, Noori M, Dastani S, Desrosiers C, Lombaert H

•papers•Jun 24 2025

Pixel-wise annotations are notoriously labourious and costly to obtain in the medical domain. To mitigate this burden, weakly supervised approaches based on bounding box annotations-much easier to acquire-offer a practical alternative. Vision foundation models have recently shown noteworthy segmentation performance when provided with prompts such as points or bounding boxes. Prompt learning exploits these models by adapting them to downstream tasks and automating segmentation, thereby reducing user intervention. However, existing prompt learning approaches depend on fully annotated segmentation masks. This paper proposes a novel framework that combines the representational power of foundation models with the annotation efficiency of weakly supervised segmentation. More specifically, our approach automates prompt generation for foundation models using only bounding box annotations. Our proposed optimization scheme integrates multiple constraints derived from box annotations with pseudo-labels generated by the prompted foundation model. Extensive experiments across multi-modal datasets reveal that our weakly supervised method achieves an average Dice score of 84.90% in a limited data setting, outperforming existing fully-supervised and weakly-supervised approaches. The code will be available upon acceptance.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Refining cardiac segmentation from MRI volumes with CT labels for fine anatomy of the ascending aorta.

Oda H, Wakamori M, Akita T

•papers•Jun 24 2025

Magnetic resonance imaging (MRI) is time-consuming, posing challenges in capturing clear images of moving organs, such as cardiac structures, including complex structures such as the Valsalva sinus. This study evaluates a computed tomography (CT)-guided refinement approach for cardiac segmentation from MRI volumes, focused on preserving the detailed shape of the Valsalva sinus. Owing to the low spatial contrast around the Valsalva sinus in MRI, labels from separate computed tomography (CT) volumes are used to refine the segmentation. Deep learning techniques are employed to obtain initial segmentation from MRI volumes, followed by the detection of the ascending aorta's proximal point. This detected proximal point is then used to select the most similar label from CT volumes of other patients. Non-rigid registration is further applied to refine the segmentation. Experiments conducted on 20 MRI volumes with labels from 20 CT volumes exhibited a slight decrease in quantitative segmentation accuracy. The CT-guided method demonstrated the precision (0.908), recall (0.746), and Dice score (0.804) for the ascending aorta compared with those obtained by nnU-Net alone (0.903, 0.770, and 0.816, respectively). Although some outputs showed bulge-like structures near the Valsalva sinus, an improvement in quantitative segmentation accuracy could not be validated.

Mixed Modality Segmentation Cardiac Methodology In Silico Academic Lab

[Practical artificial intelligence for urology : Technical principles, current application and future implementation of AI in practice].

Rodler S, Hügelmann K, von Knobloch HC, Weiss ML, Buck L, Kohler J, Fabian A, Jarczyk J, Nuhn P

•papers•Jun 24 2025

Artificial intelligence (AI) is a disruptive technology that is currently finding widespread application after having long been confined to the domain of specialists. In urology, in particular, new fields of application are continuously emerging, which are being studied both in preclinical basic research and in clinical applications. Potential applications include image recognition in the operating room or interpreting images from radiology and pathology, the automatic measurement of urinary stones and radiotherapy. Certain medical devices, particularly in the field of AI-based predictive biomarkers, have already been incorporated into international guidelines. In addition, AI is playing an increasingly more important role in administrative tasks and is expected to lead to enormous changes, especially in the outpatient sector. For urologists, it is becoming increasingly more important to engage with this technology, to pursue appropriate training and therefore to optimally implement AI into the treatment of patients and in the management of their practices or hospitals.

Mixed Modality Classification Abdominal Review In Silico Academic Lab Policy

SAM2-SGP: Enhancing SAM2 for Medical Image Segmentation via Support-Set Guided Prompting

Yang Xing, Jiong Wu, Yuheng Bu, Kuang Gong

•preprint•Jun 24 2025

Although new vision foundation models such as Segment Anything Model 2 (SAM2) have significantly enhanced zero-shot image segmentation capabilities, reliance on human-provided prompts poses significant challenges in adapting SAM2 to medical image segmentation tasks. Moreover, SAM2's performance in medical image segmentation was limited by the domain shift issue, since it was originally trained on natural images and videos. To address these challenges, we proposed SAM2 with support-set guided prompting (SAM2-SGP), a framework that eliminated the need for manual prompts. The proposed model leveraged the memory mechanism of SAM2 to generate pseudo-masks using image-mask pairs from a support set via a Pseudo-mask Generation (PMG) module. We further introduced a novel Pseudo-mask Attention (PMA) module, which used these pseudo-masks to automatically generate bounding boxes and enhance localized feature extraction by guiding attention to relevant areas. Furthermore, a low-rank adaptation (LoRA) strategy was adopted to mitigate the domain shift issue. The proposed framework was evaluated on both 2D and 3D datasets across multiple medical imaging modalities, including fundus photography, X-ray, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and ultrasound. The results demonstrated a significant performance improvement over state-of-the-art models, such as nnUNet and SwinUNet, as well as foundation models, such as SAM2 and MedSAM2, underscoring the effectiveness of the proposed approach. Our code is publicly available at https://github.com/astlian9/SAM_Support.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

Assessing Risk of Stealing Proprietary Models for Medical Imaging Tasks

Ankita Raj, Harsh Swaika, Deepankar Varma, Chetan Arora

•preprint•Jun 24 2025

The success of deep learning in medical imaging applications has led several companies to deploy proprietary models in diagnostic workflows, offering monetized services. Even though model weights are hidden to protect the intellectual property of the service provider, these models are exposed to model stealing (MS) attacks, where adversaries can clone the model's functionality by querying it with a proxy dataset and training a thief model on the acquired predictions. While extensively studied on general vision tasks, the susceptibility of medical imaging models to MS attacks remains inadequately explored. This paper investigates the vulnerability of black-box medical imaging models to MS attacks under realistic conditions where the adversary lacks access to the victim model's training data and operates with limited query budgets. We demonstrate that adversaries can effectively execute MS attacks by using publicly available datasets. To further enhance MS capabilities with limited query budgets, we propose a two-step model stealing approach termed QueryWise. This method capitalizes on unlabeled data obtained from a proxy distribution to train the thief model without incurring additional queries. Evaluation on two medical imaging models for Gallbladder Cancer and COVID-19 classification substantiates the effectiveness of the proposed attack. The source code is available at https://github.com/rajankita/QueryWise.

Mixed Modality Classification Abdominal Methodology In Silico Academic Lab Open Code

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

Arzideh K, Schäfer H, Allende-Cid H, Baldini G, Hilser T, Idrissi-Yaghir A, Laue K, Chakraborty N, Doll N, Antweiler D, Klug K, Beck N, Giesselbach S, Friedrich CM, Nensa F, Schuler M, Hosch R

•papers•Jun 23 2025

Extracting clinical entities from unstructured medical documents is critical for improving clinical decision support and documentation workflows. This study examines the performance of various encoder and decoder models trained for Named Entity Recognition (NER) of clinical parameters in pathology and radiology reports, highlighting the applicability of Large Language Models (LLMs) for this task. Three NER methods were evaluated: (1) flat NER using transformer-based models, (2) nested NER with a multi-task learning setup, and (3) instruction-based NER utilizing LLMs. A dataset of 2013 pathology reports and 413 radiology reports, annotated by medical students, was used for training and testing. The performance of encoder-based NER models (flat and nested) was superior to that of LLM-based approaches. The best-performing flat NER models achieved F1-scores of 0.87-0.88 on pathology reports and up to 0.78 on radiology reports, while nested NER models performed slightly lower. In contrast, multiple LLMs, despite achieving high precision, yielded significantly lower F1-scores (ranging from 0.18 to 0.30) due to poor recall. A contributing factor appears to be that these LLMs produce fewer but more accurate entities, suggesting they become overly conservative when generating outputs. LLMs in their current form are unsuitable for comprehensive entity extraction tasks in clinical domains, particularly when faced with a high number of entity types per document, though instructing them to return more entities in subsequent refinements may improve recall. Additionally, their computational overhead does not provide proportional performance gains. Encoder-based NER models, particularly those pre-trained on biomedical data, remain the preferred choice for extracting information from unstructured medical documents.

Mixed Modality Classification Chest Methodology In Silico Academic Lab GenAI

Deep learning-quantified body composition from positron emission tomography/computed tomography and cardiovascular outcomes: a multicentre study.

Miller RJH, Yi J, Shanbhag A, Marcinkiewicz A, Patel KK, Lemley M, Ramirez G, Geers J, Chareonthaitawee P, Wopperer S, Berman DS, Di Carli M, Dey D, Slomka PJ

•papers•Jun 23 2025

Positron emission tomography (PET)/computed tomography (CT) myocardial perfusion imaging (MPI) is a vital diagnostic tool, especially in patients with cardiometabolic syndrome. Low-dose CT scans are routinely performed with PET for attenuation correction and potentially contain valuable data about body tissue composition. Deep learning and image processing were combined to automatically quantify skeletal muscle (SM), bone and adipose tissue from these scans and then evaluate their associations with death or myocardial infarction (MI). In PET MPI from three sites, deep learning quantified SM, bone, epicardial adipose tissue (EAT), subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and intermuscular adipose tissue (IMAT). Sex-specific thresholds for abnormal values were established. Associations with death or MI were evaluated using unadjusted and multivariable models adjusted for clinical and imaging factors. This study included 10 085 patients, with median age 68 (interquartile range 59-76) and 5767 (57%) male. Body tissue segmentations were completed in 102 ± 4 s. Higher VAT density was associated with an increased risk of death or MI in both unadjusted [hazard ratio (HR) 1.40, 95% confidence interval (CI) 1.37-1.43] and adjusted (HR 1.24, 95% CI 1.19-1.28) analyses, with similar findings for IMAT, SAT, and EAT. Patients with elevated VAT density and reduced myocardial flow reserve had a significantly increased risk of death or MI (adjusted HR 2.49, 95% CI 2.23-2.77). Volumetric body tissue composition can be obtained rapidly and automatically from standard cardiac PET/CT. This new information provides a detailed, quantitative assessment of sarcopenia and cardiometabolic health for physicians.

Mixed Modality Segmentation Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MOSCARD -- Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events

Jialu Pi, Juan Maria Farina, Rimita Lahiri, Jiwoong Jeong, Archana Gurudu, Hyung-Bok Park, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee

•preprint•Jun 23 2025

Major Adverse Cardiovascular Events (MACE) remain the leading cause of mortality globally, as reported in the Global Disease Burden Study 2021. Opportunistic screening leverages data collected from routine health check-ups and multimodal data can play a key role to identify at-risk individuals. Chest X-rays (CXR) provide insights into chronic conditions contributing to major adverse cardiovascular events (MACE), while 12-lead electrocardiogram (ECG) directly assesses cardiac electrical activity and structural abnormalities. Integrating CXR and ECG could offer a more comprehensive risk assessment than conventional models, which rely on clinical scores, computed tomography (CT) measurements, or biomarkers, which may be limited by sampling bias and single modality constraints. We propose a novel predictive modeling framework - MOSCARD, multimodal causal reasoning with co-attention to align two distinct modalities and simultaneously mitigate bias and confounders in opportunistic risk estimation. Primary technical contributions are - (i) multimodal alignment of CXR with ECG guidance; (ii) integration of causal reasoning; (iii) dual back-propagation graph for de-confounding. Evaluated on internal, shift data from emergency department (ED) and external MIMIC datasets, our model outperformed single modality and state-of-the-art foundational models - AUC: 0.75, 0.83, 0.71 respectively. Proposed cost-effective opportunistic screening enables early intervention, improving patient outcomes and reducing disparities.

Mixed Modality Classification Cardiac Methodology In Silico Academic Lab Benchmark SOTA

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination

Hirano, Y., Miki, S., Yamagishi, Y., Hanaoka, S., Nakao, T., Kikuchi, T., Nakamura, Y., Nomura, Y., Yoshikawa, T., Abe, O.

•preprint•Jun 23 2025

PurposeTo assess and compare the accuracy and legitimacy of multimodal large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE). Materials and methodsThe dataset comprised questions from JDRBE 2021, 2023, and 2024, with ground-truth answers established through consensus among multiple board-certified diagnostic radiologists. Questions without associated images and those lacking unanimous agreement on answers were excluded. Eight LLMs were evaluated: GPT-4 Turbo, GPT-4o, GPT-4.5, GPT-4.1, o3, o4-mini, Claude 3.7 Sonnet, and Gemini 2.5 Pro. Each model was evaluated under two conditions: with inputting images (vision) and without (text-only). Performance differences between the conditions were assessed using McNemars exact test. Two diagnostic radiologists (with 2 and 18 years of experience) independently rated the legitimacy of responses from four models (GPT-4 Turbo, Claude 3.7 Sonnet, o3, and Gemini 2.5 Pro) using a five-point Likert scale, blinded to model identity. Legitimacy scores were analyzed using Friedmans test, followed by pairwise Wilcoxon signed-rank tests with Holm correction. ResultsThe dataset included 233 questions. Under the vision condition, o3 achieved the highest accuracy at 72%, followed by o4-mini (70%) and Gemini 2.5 Pro (70%). Under the text-only condition, o3 topped the list with an accuracy of 67%. Addition of image input significantly improved the accuracy of two models (Gemini 2.5 Pro and GPT-4.5), but not the others. Both o3 and Gemini 2.5 Pro received significantly higher legitimacy scores than GPT-4 Turbo and Claude 3.7 Sonnet from both raters. ConclusionRecent multimodal LLMs, particularly o3 and Gemini 2.5 Pro, have demonstrated remarkable progress on JDRBE questions, reflecting their rapid evolution in diagnostic radiology. Secondary abstract Eight multimodal large language models were evaluated on the Japan Diagnostic Radiology Board Examination. OpenAIs o3 and Google DeepMinds Gemini 2.5 Pro achieved high accuracy rates (72% and 70%) and received good legitimacy scores from human raters, demonstrating steady progress.

Mixed Modality LLM Radiology Report Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

DeepSeek-assisted LI-RADS classification: AI-driven precision in hepatocellular carcinoma diagnosis.

Prompt learning with bounding box constraints for medical image segmentation.

Refining cardiac segmentation from MRI volumes with CT labels for fine anatomy of the ascending aorta.

[Practical artificial intelligence for urology : Technical principles, current application and future implementation of AI in practice].

SAM2-SGP: Enhancing SAM2 for Medical Image Segmentation via Support-Set Guided Prompting

Assessing Risk of Stealing Proprietary Models for Medical Imaging Tasks

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

Deep learning-quantified body composition from positron emission tomography/computed tomography and cardiovascular outcomes: a multicentre study.

MOSCARD -- Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination

Ready to Sharpen Your Edge?