Latest Papers on Radiology AI.

Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays

Hanbin Ko, Gihun Cho, Inhyeok Baek, Donguk Kim, Joonbeom Koo, Changi Kim, Dongheon Lee, Chang Min Park

•preprint•Sep 17 2025

Vision-language pretraining has advanced image-text alignment, yet progress in radiology remains constrained by the heterogeneity of clinical reports, including abbreviations, impression-only notes, and stylistic variability. Unlike general-domain settings where more data often leads to better performance, naively scaling to large collections of noisy reports can plateau or even degrade model learning. We ask whether large language model (LLM) encoders can provide robust clinical representations that transfer across diverse styles and better guide image-text alignment. We introduce LLM2VEC4CXR, a domain-adapted LLM encoder for chest X-ray reports, and LLM2CLIP4CXR, a dual-tower framework that couples this encoder with a vision backbone. LLM2VEC4CXR improves clinical text understanding over BERT-based baselines, handles abbreviations and style variation, and achieves strong clinical alignment on report-level metrics. LLM2CLIP4CXR leverages these embeddings to boost retrieval accuracy and clinically oriented scores, with stronger cross-dataset generalization than prior medical CLIP variants. Trained on 1.6M CXR studies from public and private sources with heterogeneous and noisy reports, our models demonstrate that robustness -- not scale alone -- is the key to effective multimodal learning. We release models to support further research in medical image-text representation learning.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab Open Code

Accuracy of Foundation AI Models for Hepatic Macrovesicular Steatosis Quantification in Frozen Sections

Koga, S., Guda, A., Wang, Y., Sahni, A., Wu, J., Rosen, A., Nield, J., Nandish, N., Patel, K., Goldman, H., Rajapakse, C., Walle, S., Kristen, S., Tondon, R., Alipour, Z.

•preprint•Sep 17 2025

IntroductionAccurate intraoperative assessment of macrovesicular steatosis in donor liver biopsies is critical for transplantation decisions but is often limited by inter-observer variability and freezing artifacts that can obscure histological details. Artificial intelligence (AI) offers a potential solution for standardized and reproducible evaluation. To evaluate the diagnostic performance of two self-supervised learning (SSL)-based foundation models, Prov-GigaPath and UNI, for classifying macrovesicular steatosis in frozen liver biopsy sections, compared with assessments by surgical pathologists. MethodsWe retrospectively analyzed 131 frozen liver biopsy specimens from 68 donors collected between November 2022 and September 2024. Slides were digitized into whole-slide images, tiled into patches, and used to extract embeddings with Prov-GigaPath and UNI; slide-level classifiers were then trained and tested. Intraoperative diagnoses by on-call surgical pathologists were compared with ground truth determined from independent reviews of permanent sections by two liver pathologists. Accuracy was evaluated for both five-category classification and a clinically significant binary threshold (<30% vs. [≥]30%). ResultsFor binary classification, Prov-GigaPath achieved 96.4% accuracy, UNI 85.7%, and surgical pathologists 84.0% (P = .22). In five-category classification, accuracies were lower: Prov-GigaPath 57.1%, UNI 50.0%, and pathologists 58.7% (P = .70). Misclassification primarily occurred in intermediate categories (5%-<30% steatosis). ConclusionsSSL-based foundation models performed comparably to surgical pathologists in classifying macrovesicular steatosis, at the clinically relevant <30% vs. [≥]30% threshold. These findings support the potential role of AI in standardizing intraoperative evaluation of donor liver biopsies; however, the small sample size limits generalizability and requires validation in larger, balanced cohorts.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.

Xia Z, Li H, Lan L

•papers•Sep 16 2025

Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is designed to explicitly attend to the most relevant content. Theoretical analysis demonstrates that MedFormer outperforms existing medical vision transformers in terms of generality and efficiency. Extensive experiments across various imaging modality datasets show that MedFormer consistently enhances performance in all three medical image recognition tasks mentioned above. MedFormer provides an efficient and versatile solution for medical image recognition, with strong potential for clinical application. The code is available on GitHub.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

Machine and deep learning for MRI-based quantification of liver iron overload: a systematic review and meta-analysis.

Elhaie M, Koozari A, Alshammari QT

•papers•Sep 16 2025

Liver iron overload, associated with conditions such as hereditary hemochromatosis and β‑thalassemia major, requires accurate quantification of liver iron concentration (LIC) to guide timely interventions and prevent complications. Magnetic resonance imaging (MRI) is the gold standard for noninvasive LIC assessment, but challenges in protocol variability and diagnostic consistency persist. Machine learning (ML) and deep learning (DL) offer potential to enhance MRI-based LIC quantification, yet their efficacy remains underexplored. This systematic review and meta-analysis evaluates the diagnostic accuracy, algorithmic performance, and clinical applicability of ML and DL techniques for MRI-based LIC quantification in liver iron overload, adhering to PRISMA guidelines. A comprehensive search across PubMed, Embase, Scopus, Web of Science, Cochrane Library, and IEEE Xplore identified studies applying ML/DL to MRI-based LIC quantification. Eligible studies were assessed for diagnostic accuracy (sensitivity, specificity, AUC), LIC quantification precision (correlation, mean absolute error), and clinical applicability (automation, processing time). Methodological quality was evaluated using the QUADAS‑2 tool, with qualitative synthesis and meta-analysis where feasible. Eight studies were included, employing algorithms such as convolutional neural networks (CNNs), radiomics, and fuzzy C‑mean clustering on T2*-weighted and multiparametric MRI. Pooled diagnostic accuracy from three studies showed a sensitivity of 0.79 (95% CI: 0.66-0.88) and specificity of 0.77 (95% CI: 0.64-0.86), with an AUC of 0.84. The DL methods demonstrated high precision (e.g., Pearson's r = 0.999) and automation, reducing processing times to as low as 0.1 s/slice. Limitations included heterogeneity, limited generalizability, and small external validation sets. Both ML and DL enhance MRI-based LIC quantification, offering high accuracy and efficiency. Standardized protocols and multicenter validation are needed to ensure clinical scalability and equitable access.

MRI Registration Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA

Mammographic features in screening mammograms with high AI scores but a true-negative screening result.

Koch HW, Bergan MB, Gjesvik J, Larsen M, Bartsch H, Haldorsen IHS, Hofvind S

•papers•Sep 16 2025

BackgroundThe use of artificial intelligence (AI) in screen-reading of mammograms has shown promising results for cancer detection. However, less attention has been paid to the false positives generated by AI.PurposeTo investigate mammographic features in screening mammograms with high AI scores but a true-negative screening result.Material and MethodsIn this retrospective study, 54,662 screening examinations from BreastScreen Norway 2010-2022 were analyzed with a commercially available AI system (Transpara v. 2.0.0). An AI score of 1-10 indicated the suspiciousness of malignancy. We selected examinations with an AI score of 10, with a true-negative screening result, followed by two consecutive true-negative screening examinations. Of the 2,124 examinations matching these criteria, 382 random examinations underwent blinded consensus review by three experienced breast radiologists. The examinations were classified according to mammographic features, radiologist interpretation score (1-5), and mammographic breast density (BI-RADS 5th ed. a-d).ResultsThe reviews classified 91.1% (348/382) of the examinations as negative (interpretation score 1). All examinations (26/26) categorized as BI-RADS d were given an interpretation score of 1. Classification of mammographic features: asymmetry = 30.6% (117/382); calcifications = 30.1% (115/382); asymmetry with calcifications = 29.3% (112/382); mass = 8.9% (34/382); distortion = 0.8% (3/382); spiculated mass = 0.3% (1/382). For examinations with calcifications, 79.1% (91/115) were classified with benign morphology.ConclusionThe majority of false-positive screening examinations generated by AI were classified as non-suspicious in a retrospective blinded consensus review and would likely not have been recalled for further assessment in a real screening setting using AI as a decision support.

Mammography Classification Breast Retrospective Clinical Post Market

Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion.

Xu J, Lan M, Dong X, He K, Zhang W, Bian Q, Ke Y

•papers•Sep 16 2025

Brain network analysis plays a crucial role in identifying distinctive patterns associated with neurological disorders. Functional magnetic resonance imaging (fMRI) enables the construction of brain networks by analyzing correlations in blood-oxygen-level-dependent (BOLD) signals across different brain regions, known as regions of interest (ROIs). These networks are typically constructed using atlases that parcellate the brain based on various hypotheses of functional and anatomical divisions. However, there is no standard atlas for brain network classification, leading to limitations in detecting abnormalities in disorders. Recent methods leveraging multiple atlases fail to ensure consistency across atlases and lack effective ROI-level information exchange, limiting their efficacy. To address these challenges, we propose the Atlas-Integrated Distillation and Fusion network (AIDFusion), a novel framework designed to enhance brain network classification using fMRI data. AIDFusion introduces a disentangle Transformer to filter out inconsistent atlas-specific information and distill meaningful cross-atlas connections. Additionally, it enforces subject- and population-level consistency constraints to improve cross-atlas coherence. To further enhance feature integration, AIDFusion incorporates an inter-atlas message-passing mechanism that facilitates the fusion of complementary information across brain regions. We evaluate AIDFusion on four resting-state fMRI datasets encompassing different neurological disorders. Experimental results demonstrate its superior classification performance and computational efficiency compared to state-of-the-art methods. Furthermore, a case study highlights AIDFusion's ability to extract interpretable patterns that align with established neuroscience findings, reinforcing its potential as a robust tool for multi-atlas brain network analysis. The code is publicly available at https://github.com/AngusMonroe/AIDFusion.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code

FEU-Diff: A Diffusion Model With Fuzzy Evidence-Driven Dynamic Uncertainty Fusion for Medical Image Segmentation.

Geng S, Jiang S, Hou T, Yao H, Huang J, Ding W

•papers•Sep 16 2025

Diffusion models, as a class of generative frameworks based on step-wise denoising, have recently attracted significant attention in the field of medical image segmentation. However, existing diffusion-based methods typically rely on static fusion strategies to integrate conditional priors with denoised features, making them difficult to adaptively balance their respective contributions at different denoising stages. Moreover, these methods often lack explicit modeling of pixel-level uncertainty in ambiguous regions, which may lead to the loss of structural details during the iterative denoising process, ultimately compromising the accuracy (Acc) and completeness of the final segmentation results. To this end, we propose FEU-Diff, a diffusion-based segmentation framework that integrates fuzzy evidence modeling and uncertainty fusion (UF) mechanisms. Specifically, a fuzzy semantic enhancement (FSE) module is designed to model pixel-level uncertainty through Gaussian membership functions and fuzzy logic rules, enhancing the model's ability to identify and represent ambiguous boundaries. An evidence dynamic fusion (EDF) module estimates feature confidence via a Dirichlet-based distribution and adaptively guides the fusion of conditional information and denoised features across different denoising stages. Furthermore, the UF module quantifies discrepancies among multisource predictions to compensate for structural detail loss during the iterative denoising process. Extensive experiments on four public datasets show that FEU-Diff consistently outperforms state-of-the-art (SOTA) methods, achieving an average gain of 1.42% in the Dice similarity coefficient (DSC), 1.47% in intersection over union (IoU), and a 2.26 mm reduction in the 95th percentile Hausdorff distance (HD95). In addition, our method generates uncertainty maps that enhance clinical interpretability.

Segmentation Methodology In Silico Benchmark SOTA

Artificial Intelligence in Cardiovascular Health: Insights into Post-COVID Public Health Challenges.

Naushad Z, Malik J, Mishra AK, Singh S, Shrivastav D, Sharma CK, Verma VV, Pal RK, Roy B, Sharma VK

•papers•Sep 16 2025

Cardiovascular diseases (CVDs) continue to be the topmost cause of the worldwide morbidity and mortality. Risk factors such as diabetes, hypertension, obesity and smoking are significantly worsening the situation. The COVID-19 pandemic has powerfully highlighted the undeniable connection between viral infections and cardiovascular health. Current literature highlights that SARS-CoV-2 contributes to myocardial injury, endothelial dysfunction, thrombosis, and systemic inflammation, increasing the severity of CVD outcomes. Long COVID has also been associated with persistent cardiovascular complications, including myocarditis, arrhythmias, thromboembolic events, and accelerated atherosclerosis. Addressing these challenges requires continued research and public health strategies to mitigate long-term risks. Artificial intelligence (AI) is changing cardiovascular medicine and community health through progressive machine learning (ML) and deep learning (DL) applications. AI enhances risk prediction, facilitates biomarker discovery, and improves imaging techniques such as echocardiography, CT, and MRI for detecting coronary artery disease and myocardial injury on time. Remote monitoring and wearable devices powered by AI enable real-time cardiovascular assessment and personalized treatment. In public health, AI optimizes disease surveillance, epidemiological modeling, and healthcare resource allocation. AI-driven clinical decision support systems improve diagnostic accuracy and health equity by enabling targeted interventions. The integration of AI into cardiovascular medicine and public health offers data-driven, efficient, and patient-centered solutions to mitigate post-COVID cardiovascular complications.

Mixed Modality Detection Cardiac

Automated Field of View Prescription for Whole-body Magnetic Resonance Imaging Using Deep Learning Based Body Region Segmentations.

Quinsten AS, Bojahr C, Nassenstein K, Straus J, Holtkamp M, Salhöfer L, Umutlu L, Forsting M, Haubold J, Wen Y, Kohnke J, Borys K, Nensa F, Hosch R

•papers•Sep 16 2025

Manual field-of-view (FoV) prescription in whole-body magnetic resonance imaging (WB-MRI) is vital for ensuring comprehensive anatomic coverage and minimising artifacts, thereby enhancing image quality. However, this procedure is time-consuming, subject to operator variability, and adversely impacts both patient comfort and workflow efficiency. To overcome these limitations, an automated system was developed and evaluated that prescribes multiple consecutive FoV stations for WB-MRI using deep-learning (DL)-based three-dimensional anatomic segmentations. A total of 374 patients (mean age: 50.5 ± 18.2 y; 52% females) who underwent WB-MRI, including T2-weighted Half-Fourier acquisition single-shot turbo spin-echo (T2-HASTE) and fast whole-body localizer (FWBL) sequences acquired during continuous table movement on a 3T MRI system, were retrospectively collected between March 2012 and January 2025. An external cohort of 10 patients, acquired on two 1.5T scanners, was utilized for generalizability testing. Complementary nnUNet-v2 models were fine-tuned to segment tissue compartments, organs, and a whole-body (WB) outline on FWBL images. From these predicted segmentations, 5 consecutive FoVs (head/neck, thorax, liver, pelvis, and spine) were generated. Segmentation accuracy was quantified by Sørensen-Dice coefficients (DSC), Precision (P), Recall (R), and Specificity (S). Clinical utility was assessed on 30 test cases by 4 blinded experts using Likert scores and a 4-way ranking against 3 radiographer prescriptions. Interrater reliability and statistical comparisons were employed using the intraclass correlation coefficient (ICC), Kendall W, Friedman, and Wilcoxon signed-rank tests. Mean DSCs were 0.98 for torso (P = 0.98, R = 0.98, S = 1.00), 0.96 for head/neck (P = 0.95, R = 0.96, S = 1.00), 0.94 for abdominal cavity (P = 0.95, R = 0.94, S = 1.00), 0.90 for thoracic cavity (P = 0.90, R = 0.91, S = 1.00), 0.86 for liver (P = 0.85, R = 0.87, S = 1.00), and 0.63 for spinal cord (P = 0.64, R = 0.63, S = 1.00). The clinical utility was evidenced by assessments from 2 expert radiologists and 2 radiographers, with 98.3% and 87.5% of cases rated as clinically acceptable in the internal test data set and the external test data set. Predicted FoVs received the highest ranking in 60% of cases. They placed within the top 2 in 85.8% of cases, outperforming radiographers with 9 and 13 years of experience (P < 0.001) and matching the performance of a radiographer with 20 years of experience. DL-based three-dimensional anatomic segmentations enable accurate and reliable multistation FoV prescription for WB-MRI, achieving expert-level performance while significantly reducing manual workload. Automated FoV planning has the potential to standardize WB-MRI acquisition, reduce interoperator variability, and enhance workflow efficiency, thereby facilitating broader clinical adoption.

MRI Segmentation Whole Body Retrospective Clinical In Silico Benchmark SOTA

Concurrent AI assistance with LI-RADS classification for contrast enhanced MRI of focal hepatic nodules: a multi-reader, multi-case study.

Qin X, Huang L, Wei Y, Li H, Wu Y, Zhong J, Jian M, Zhang J, Zheng Z, Xu Y, Yan C

•papers•Sep 16 2025

The Liver Imaging Reporting and Data System (LI-RADS) assessment is subject to inter-reader variability. The present study aimed to evaluate the impact of an artificial intelligence (AI) system on the accuracy and inter-reader agreement of LI-RADS classification based on contrast-enhanced magnetic resonance imaging among radiologists with varying experience levels. This single-center, multi-reader, multi-case retrospective study included 120 patients with 200 focal liver lesions who underwent abdominal contrast-enhanced magnetic resonance imaging examinations between June 2023 and May 2024. Five radiologists with different experience levels independently assessed LI-RADS classification and imaging features with and without AI assistance. The reference standard was established by consensus between two expert radiologists. Accuracy was used to measure the performance of AI systems and radiologists. Kappa or intraclass correlation coefficient was utilized to estimate inter-reader agreement. The LI-RADS categories were as follows: 33.5% of LR-3 (67/200), 29.0% of LR-4 (58/200), 33.5% of LR-5 (67/200), and 4.0% of LR-M (8/200) cases. The AI system significantly improved the overall accuracy of LI-RADS classification from 69.9 to 80.1% (p < 0.001), with the most notable improvement among junior radiologists from 65.7 to 79.7% (p < 0.001). Inter-reader agreement for LI-RADS classification was significantly higher with AI assistance compared to that without (weighted Cohen's kappa, 0.655 vs. 0.812, p < 0.001). The AI system also enhanced the accuracy and inter-reader agreement for imaging features, including non-rim arterial phase hyperenhancement, non-peripheral washout, and restricted diffusion. Additionally, inter-reader agreement for lesion size measurements improved, with intraclass correlation coefficient changing from 0.857 to 0.951 (p < 0.001). The AI system significantly increases accuracy and inter-reader agreement of LI-RADS 3/4/5/M classification, particularly benefiting junior radiologists.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays

Accuracy of Foundation AI Models for Hepatic Macrovesicular Steatosis Quantification in Frozen Sections

MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.

Machine and deep learning for MRI-based quantification of liver iron overload: a systematic review and meta-analysis.

Mammographic features in screening mammograms with high AI scores but a true-negative screening result.

Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion.

FEU-Diff: A Diffusion Model With Fuzzy Evidence-Driven Dynamic Uncertainty Fusion for Medical Image Segmentation.

Artificial Intelligence in Cardiovascular Health: Insights into Post-COVID Public Health Challenges.

Automated Field of View Prescription for Whole-body Magnetic Resonance Imaging Using Deep Learning Based Body Region Segmentations.

Concurrent AI assistance with LI-RADS classification for contrast enhanced MRI of focal hepatic nodules: a multi-reader, multi-case study.

Ready to Sharpen Your Edge?