Sort by:
Page 22 of 1401395 results

Data Scaling Laws for Radiology Foundation Models

Maximilian Ilse, Harshita Sharma, Anton Schwaighofer, Sam Bond-Taylor, Fernando Pérez-García, Olesya Melnichenko, Anne-Marie G. Sykes, Kelly K. Horst, Ashish Khandelwal, Maxwell Reynolds, Maria T. Wetscherek, Noel C. F. Codella, Javier Alvarez-Valle, Korfiatis Panagiotis, Valentina Salvatelli

arxiv logopreprintSep 16 2025
Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and evaluate a model's ability to extract features that preserve continuity along elongated structures. Our experiments show that MI2 scales more effectively for finding-related tasks, while RAD-DINO is stronger on tube-related tasks. Surprisingly, continually pretraining MI2 with both reports and structured labels using UniCL improves performance, underscoring the value of structured supervision at scale. We further show that for some tasks, as few as 30k in-domain samples are sufficient to surpass open-weights foundation models. These results highlight the utility of center-specific continual pretraining, enabling medical institutions to derive significant performance gains by utilizing in-domain data.

Prediction of cerebrospinal fluid intervention in fetal ventriculomegaly via AI-powered normative modelling.

Zhou M, Rajan SA, Nedelec P, Bayona JB, Glenn O, Gupta N, Gano D, George E, Rauschecker AM

pubmed logopapersSep 16 2025
Fetal ventriculomegaly (VM) is common and largely benign when isolated. However, it can occasionally progress to hydrocephalus, a more severe condition associated with increased mortality and neurodevelopmental delay that may require surgical postnatal intervention. Accurate differentiation between VM and hydrocephalus is essential but remains challenging, relying on subjective assessment and limited two-dimensional measurements. Deep learning-based segmentation offers a promising solution for objective and reproducible volumetric analysis. This work presents an AI-powered method for segmentation, volume quantification, and classification of the ventricles in fetal brain MRI to predict need for postnatal intervention. This retrospective study included 222 patients with singleton pregnancies. An nnUNet was trained to segment the fetal ventricles on 20 manually segmented, institutional fetal brain MRIs combined with 80 studies from a publicly available dataset. The validated model was then applied to 138 normal fetal brain MRIs to generate a normative reference range across a range of gestational ages (18-36 weeks). Finally it was applied to 64 fetal brains with VM (14 of which required postnatal intervention). ROC curves and AUC to predict VM and need for postnatal intervention were calculated. The nnUNet predicted segmentation of the fetal ventricles in the reference dataset were high quality and accurate (median Dice score 0.96, IQR 0.93-0.99). A normative reference range of ventricular volumes across gestational ages was developed using automated segmentation volumes. The optimal threshold for identifying VM was 2 standard deviations from normal with sensitivity of 92% and specificity of 93% (AUC 0.97, 95% CI 0.91-0.98). When normalized to intracranial volume, fetal ventricular volume was higher and subarachnoid volume lower among those who required postnatal intervention (p<0.001, p=0.003). The optimal threshold for identifying need for postnatal intervention was 11 standard deviations from normal with sensitivity of 86% and specificity of 100% (AUC 0.97, 95% CI 0.86-1.00). This work introduces a deep-learning based method for fast and accurate quantification of ventricular volumes in fetal brain MRI. A normative reference standard derived using this method can predict VM and need for postnatal CSF intervention. Increased ventricular volume is a strong predictor for postnatal intervention. VM = ventriculomegaly, 2D = two-dimensional, 3D = three-dimensional, ROC = receiver operating characteristics, AUC = area under curve.

Challenges and Limitations of Multimodal Large Language Models in Interpreting Pediatric Panoramic Radiographs.

Mine Y, Iwamoto Y, Okazaki S, Nishimura T, Tabata E, Takeda S, Peng TY, Nomura R, Kakimoto N, Murayama T

pubmed logopapersSep 16 2025
Multimodal large language models (LLMs) have potential for medical image analysis, yet their reliability for pediatric panoramic radiographs remains uncertain. This study evaluated two multimodal LLMs (OpenAI o1, Claude 3.5 Sonnet) for detecting and counting teeth (including tooth germs) on pediatric panoramic radiographs. Eighty-seven pediatric panoramic radiographs from an open-source data set were analyzed. Two pediatric dentists annotated the presence or absence of each potential tooth position. Each image was processed five times by the LLMs using identical prompts, and the results were compared with the expert annotations. Standard performance metrics and Fleiss' kappa were calculated. Detailed examination revealed that subtle developmental stages and minor tooth loss were consistently misidentified. Claude 3.5 Sonnet had higher sensitivity but significantly lower specificity (29.8% ± 21.5%), resulting in many false positives. OpenAI o1 demonstrated superior specificity compared to Claude 3.5 Sonnet, but still failed to correctly detect subtle defects in certain mixed dentition cases. Both models showed large variability in repeated runs. Both LLMs failed to achieve clinically acceptable performance and cannot reliably identify nuanced discrepancies critical for pediatric dentistry. Further refinements and consistency improvements are essential before routine clinical use.

Machine and deep learning for MRI-based quantification of liver iron overload: a systematic review and meta-analysis.

Elhaie M, Koozari A, Alshammari QT

pubmed logopapersSep 16 2025
Liver iron overload, associated with conditions such as hereditary hemochromatosis and β‑thalassemia major, requires accurate quantification of liver iron concentration (LIC) to guide timely interventions and prevent complications. Magnetic resonance imaging (MRI) is the gold standard for noninvasive LIC assessment, but challenges in protocol variability and diagnostic consistency persist. Machine learning (ML) and deep learning (DL) offer potential to enhance MRI-based LIC quantification, yet their efficacy remains underexplored. This systematic review and meta-analysis evaluates the diagnostic accuracy, algorithmic performance, and clinical applicability of ML and DL techniques for MRI-based LIC quantification in liver iron overload, adhering to PRISMA guidelines. A comprehensive search across PubMed, Embase, Scopus, Web of Science, Cochrane Library, and IEEE Xplore identified studies applying ML/DL to MRI-based LIC quantification. Eligible studies were assessed for diagnostic accuracy (sensitivity, specificity, AUC), LIC quantification precision (correlation, mean absolute error), and clinical applicability (automation, processing time). Methodological quality was evaluated using the QUADAS‑2 tool, with qualitative synthesis and meta-analysis where feasible. Eight studies were included, employing algorithms such as convolutional neural networks (CNNs), radiomics, and fuzzy C‑mean clustering on T2*-weighted and multiparametric MRI. Pooled diagnostic accuracy from three studies showed a sensitivity of 0.79 (95% CI: 0.66-0.88) and specificity of 0.77 (95% CI: 0.64-0.86), with an AUC of 0.84. The DL methods demonstrated high precision (e.g., Pearson's r = 0.999) and automation, reducing processing times to as low as 0.1 s/slice. Limitations included heterogeneity, limited generalizability, and small external validation sets. Both ML and DL enhance MRI-based LIC quantification, offering high accuracy and efficiency. Standardized protocols and multicenter validation are needed to ensure clinical scalability and equitable access.

More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM era

Yingtai Li, Haoran Lai, Xiaoqian Zhou, Shuai Ming, Wenxin Ma, Wei Wei, Shaohua Kevin Zhou

arxiv logopreprintSep 16 2025
The emergence of Large Language Models (LLMs) presents unprecedented opportunities to revolutionize medical contrastive vision-language pre-training. In this paper, we show how LLMs can facilitate large-scale supervised pre-training, thereby advancing vision-language alignment. We begin by demonstrate that modern LLMs can automatically extract diagnostic labels from radiology reports with remarkable precision (>96\% AUC in our experiments) without complex prompt engineering, enabling the creation of large-scale "silver-standard" datasets at a minimal cost (~\$3 for 50k CT image-report pairs). Further, we find that vision encoder trained on this "silver-standard" dataset achieves performance comparable to those trained on labels extracted by specialized BERT-based models, thereby democratizing the access to large-scale supervised pre-training. Building on this foundation, we proceed to reveal that supervised pre-training fundamentally improves contrastive vision-language alignment. Our approach achieves state-of-the-art performance using only a 3D ResNet-18 with vanilla CLIP training, including 83.8\% AUC for zero-shot diagnosis on CT-RATE, 77.3\% AUC on RAD-ChestCT, and substantial improvements in cross-modal retrieval (MAP@50=53.7\% for image-image, Recall@100=52.2\% for report-image). These results demonstrate the potential of utilizing LLMs to facilitate {\bf more performant and scalable} medical AI systems. Our code is avaiable at https://github.com/SadVoxel/More-performant-and-scalable.

Automated Field of View Prescription for Whole-body Magnetic Resonance Imaging Using Deep Learning Based Body Region Segmentations.

Quinsten AS, Bojahr C, Nassenstein K, Straus J, Holtkamp M, Salhöfer L, Umutlu L, Forsting M, Haubold J, Wen Y, Kohnke J, Borys K, Nensa F, Hosch R

pubmed logopapersSep 16 2025
Manual field-of-view (FoV) prescription in whole-body magnetic resonance imaging (WB-MRI) is vital for ensuring comprehensive anatomic coverage and minimising artifacts, thereby enhancing image quality. However, this procedure is time-consuming, subject to operator variability, and adversely impacts both patient comfort and workflow efficiency. To overcome these limitations, an automated system was developed and evaluated that prescribes multiple consecutive FoV stations for WB-MRI using deep-learning (DL)-based three-dimensional anatomic segmentations. A total of 374 patients (mean age: 50.5 ± 18.2 y; 52% females) who underwent WB-MRI, including T2-weighted Half-Fourier acquisition single-shot turbo spin-echo (T2-HASTE) and fast whole-body localizer (FWBL) sequences acquired during continuous table movement on a 3T MRI system, were retrospectively collected between March 2012 and January 2025. An external cohort of 10 patients, acquired on two 1.5T scanners, was utilized for generalizability testing. Complementary nnUNet-v2 models were fine-tuned to segment tissue compartments, organs, and a whole-body (WB) outline on FWBL images. From these predicted segmentations, 5 consecutive FoVs (head/neck, thorax, liver, pelvis, and spine) were generated. Segmentation accuracy was quantified by Sørensen-Dice coefficients (DSC), Precision (P), Recall (R), and Specificity (S). Clinical utility was assessed on 30 test cases by 4 blinded experts using Likert scores and a 4-way ranking against 3 radiographer prescriptions. Interrater reliability and statistical comparisons were employed using the intraclass correlation coefficient (ICC), Kendall W, Friedman, and Wilcoxon signed-rank tests. Mean DSCs were 0.98 for torso (P = 0.98, R = 0.98, S = 1.00), 0.96 for head/neck (P = 0.95, R = 0.96, S = 1.00), 0.94 for abdominal cavity (P = 0.95, R = 0.94, S = 1.00), 0.90 for thoracic cavity (P = 0.90, R = 0.91, S = 1.00), 0.86 for liver (P = 0.85, R = 0.87, S = 1.00), and 0.63 for spinal cord (P = 0.64, R = 0.63, S = 1.00). The clinical utility was evidenced by assessments from 2 expert radiologists and 2 radiographers, with 98.3% and 87.5% of cases rated as clinically acceptable in the internal test data set and the external test data set. Predicted FoVs received the highest ranking in 60% of cases. They placed within the top 2 in 85.8% of cases, outperforming radiographers with 9 and 13 years of experience (P < 0.001) and matching the performance of a radiographer with 20 years of experience. DL-based three-dimensional anatomic segmentations enable accurate and reliable multistation FoV prescription for WB-MRI, achieving expert-level performance while significantly reducing manual workload. Automated FoV planning has the potential to standardize WB-MRI acquisition, reduce interoperator variability, and enhance workflow efficiency, thereby facilitating broader clinical adoption.

MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.

Xia Z, Li H, Lan L

pubmed logopapersSep 16 2025
Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is designed to explicitly attend to the most relevant content. Theoretical analysis demonstrates that MedFormer outperforms existing medical vision transformers in terms of generality and efficiency. Extensive experiments across various imaging modality datasets show that MedFormer consistently enhances performance in all three medical image recognition tasks mentioned above. MedFormer provides an efficient and versatile solution for medical image recognition, with strong potential for clinical application. The code is available on GitHub.

FEU-Diff: A Diffusion Model With Fuzzy Evidence-Driven Dynamic Uncertainty Fusion for Medical Image Segmentation.

Geng S, Jiang S, Hou T, Yao H, Huang J, Ding W

pubmed logopapersSep 16 2025
Diffusion models, as a class of generative frameworks based on step-wise denoising, have recently attracted significant attention in the field of medical image segmentation. However, existing diffusion-based methods typically rely on static fusion strategies to integrate conditional priors with denoised features, making them difficult to adaptively balance their respective contributions at different denoising stages. Moreover, these methods often lack explicit modeling of pixel-level uncertainty in ambiguous regions, which may lead to the loss of structural details during the iterative denoising process, ultimately compromising the accuracy (Acc) and completeness of the final segmentation results. To this end, we propose FEU-Diff, a diffusion-based segmentation framework that integrates fuzzy evidence modeling and uncertainty fusion (UF) mechanisms. Specifically, a fuzzy semantic enhancement (FSE) module is designed to model pixel-level uncertainty through Gaussian membership functions and fuzzy logic rules, enhancing the model's ability to identify and represent ambiguous boundaries. An evidence dynamic fusion (EDF) module estimates feature confidence via a Dirichlet-based distribution and adaptively guides the fusion of conditional information and denoised features across different denoising stages. Furthermore, the UF module quantifies discrepancies among multisource predictions to compensate for structural detail loss during the iterative denoising process. Extensive experiments on four public datasets show that FEU-Diff consistently outperforms state-of-the-art (SOTA) methods, achieving an average gain of 1.42% in the Dice similarity coefficient (DSC), 1.47% in intersection over union (IoU), and a 2.26 mm reduction in the 95th percentile Hausdorff distance (HD95). In addition, our method generates uncertainty maps that enhance clinical interpretability.

Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction for Sparse-View CT

Haodong Li, Shuo Han, Haiyang Mao, Yu Shi, Changsheng Fang, Jianjia Zhang, Weiwen Wu, Hengyong Yu

arxiv logopreprintSep 16 2025
Sparse-View CT (SVCT) reconstruction enhances temporal resolution and reduces radiation dose, yet its clinical use is hindered by artifacts due to view reduction and domain shifts from scanner, protocol, or anatomical variations, leading to performance degradation in out-of-distribution (OOD) scenarios. In this work, we propose a Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction (CDPIR) framework to tackle the OOD problem in SVCT. CDPIR integrates cross-distribution diffusion priors, derived from a Scalable Interpolant Transformer (SiT), with model-based iterative reconstruction methods. Specifically, we train a SiT backbone, an extension of the Diffusion Transformer (DiT) architecture, to establish a unified stochastic interpolant framework, leveraging Classifier-Free Guidance (CFG) across multiple datasets. By randomly dropping the conditioning with a null embedding during training, the model learns both domain-specific and domain-invariant priors, enhancing generalizability. During sampling, the globally sensitive transformer-based diffusion model exploits the cross-distribution prior within the unified stochastic interpolant framework, enabling flexible and stable control over multi-distribution-to-noise interpolation paths and decoupled sampling strategies, thereby improving adaptation to OOD reconstruction. By alternating between data fidelity and sampling updates, our model achieves state-of-the-art performance with superior detail preservation in SVCT reconstructions. Extensive experiments demonstrate that CDPIR significantly outperforms existing approaches, particularly under OOD conditions, highlighting its robustness and potential clinical value in challenging imaging scenarios.

FunKAN: Functional Kolmogorov-Arnold Network for Medical Image Enhancement and Segmentation

Maksim Penkin, Andrey Krylov

arxiv logopreprintSep 16 2025
Medical image enhancement and segmentation are critical yet challenging tasks in modern clinical practice, constrained by artifacts and complex anatomical variations. Traditional deep learning approaches often rely on complex architectures with limited interpretability. While Kolmogorov-Arnold networks offer interpretable solutions, their reliance on flattened feature representations fundamentally disrupts the intrinsic spatial structure of imaging data. To address this issue we propose a Functional Kolmogorov-Arnold Network (FunKAN) -- a novel interpretable neural framework, designed specifically for image processing, that formally generalizes the Kolmogorov-Arnold representation theorem onto functional spaces and learns inner functions using Fourier decomposition over the basis Hermite functions. We explore FunKAN on several medical image processing tasks, including Gibbs ringing suppression in magnetic resonance images, benchmarking on IXI dataset. We also propose U-FunKAN as state-of-the-art binary medical segmentation model with benchmarks on three medical datasets: BUSI (ultrasound images), GlaS (histological structures) and CVC-ClinicDB (colonoscopy videos), detecting breast cancer, glands and polyps, respectively. Experiments on those diverse datasets demonstrate that our approach outperforms other KAN-based backbones in both medical image enhancement (PSNR, TV) and segmentation (IoU, F1). Our work bridges the gap between theoretical function approximation and medical image analysis, offering a robust, interpretable solution for clinical applications.
Page 22 of 1401395 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.