Latest Papers on Radiology AI.

Large language models for patient education prior to interventional radiology procedures: a comparative study.

Levita B, Eminovic S, Lüdemann WM, Schnapauff D, Schmidt R, Haack AM, Dell'Orco A, Nawabi J, Penzkofer T

•papers•Oct 13 2025

This study evaluates four large language models' (LLMs) ability to answer common patient questions preceding transarterial periarticular embolization (TAPE), computed tomography (CT)-guided high-dose-rate (HDR) brachytherapy, and bleomycin electrosclerotherapy (BEST). The goal is to evaluate their potential to enhance clinical workflows and patient comprehension, while also assessing associated risks. Thirty-five TAPE, 34 CT-HDR brachytherapy, and 36 BEST related questions were presented to ChatGPT-4o, DeepSeek-V3, OpenBioLLM-8b, and BioMistral-7b. The LLM-generated responses were independently assessed by two board-certified radiologists. Accuracy was rated on a 5-point Likert scale. Statistics compared LLM performance across question categories for patient-education suitability. DeepSeek-V3 attained the highest mean scores for BEST [4.49 (± 0.77)] and CT-HDR [4.24 (± 0.81)] and demonstrated comparable performance to ChatGPT-4o for TAPE-related questions (DeepSeek-V3 [4.20 (± 0.77)] vs. ChatGPT-4o [4.17 (± 0.64)]; p = 1.000). In contrast, OpenBioLLM-8b (BEST 3.51 (± 1.15), CT-HDR 3.32 (± 1.13), TAPE 3.34 (± 1.16)) and BioMistral-7b (BEST 2.92 (± 1.35), CT-HDR 3.03 (± 1.06), TAPE 3.33 (± 1.28)) performed significantly worse than DeepSeek-V3 and ChatGPT-4o across all procedures. Preparation/Planning was the only category without statistically significant differences across all three procedures. DeepSeek-V3 and ChatGPT-4o excelled on TAPE, BEST, and CT-HDR brachytherapy questions, indicating potential to enhance patient education in interventional radiology, where complex but minimally invasive procedures often are explained in brief consultations. However, OpenBioLLM-8b and BioMistral-7b exhibited more frequent inaccuracies, suggesting that LLMs cannot replace comprehensive clinical consultations yet. Patient feedback and clinical workflow implementation should validate these findings.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Deep learning for multi-modal medical image segmentation: a survey and comparative study.

Atek S, Mehidi I, Jabri D, Belkhiat DEC

•papers•Oct 13 2025

For over two decades, medical imaging modalities have played crucial roles in clinical diagnosis. Extracting comprehensive information from a single modality often proves challenging for ensuring clinical accuracy. Consequently, multi-modal medical image fusion methods integrate images from diverse modalities into a single fused image, enhancing information quality and diagnostic reliability. In recent years, deep learning for multi-modal medical image segmentation has emerged as a vibrant research area, yielding promising outcomes. This paper conducts a thorough survey and comparative analysis of advancements in deep learning techniques for multi-modal medical image segmentation from 2019 to 2025. It aims to provide a comprehensive overview of deep learning-based approaches and fusion strategies for integrating information from different imaging modalities. Additionally, the survey highlights how various deep learning models enhance segmentation accuracy and reliability. Common challenges in medical image segmentation are discussed, along side current research trends in the field.

Mixed Modality Segmentation Review Concept Benchmark SOTA

Radiomics-based IVIM-DWI for early noninvasive assessment of renal allograft dysfunction.

Liu X, Chen L, Yang L, Zhu J, Shen W

•papers•Oct 13 2025

To develop and rigorously validate radiomics-based predictive models using postoperative intravoxel incoherent motion diffusion-weighted imaging (IVIM-DWI) MRI for the early, noninvasive assessment of impaired renal allograft function (IRF) in kidney transplant recipients. This retrospective study included 97 kidney transplant recipients (mean age, 36.77 ± 10.71 years), categorized into normal or impaired renal function groups based on an estimated glomerular filtration rate (eGFR) cutoff of 60 ml/min/1.73 m2. Patients were randomly assigned to training (n = 68) or validation (n = 29) groups. Postoperative IVIM-DWI MRI with 11 b-values was performed on a 3T scanner, generating parametric maps (apparent diffusion coefficient (ADC), slow diffusion coefficient (Dslow), fast diffusion coefficient (Dfast), perfusion fraction (PF)). Whole-graft 3D manual segmentation was used to extract 1604 radiomic features per dataset. Feature selection was performed through analysis of variance (ANOVA), Relief, and recursive feature elimination (RFE), followed by classification using ten machine learning algorithms, including auto-encoder (AE) and naïve Bayes (NB). Performance was evaluated using receiver operating characteristic (ROC) analysis, with area under the curve (AUC), accuracy, sensitivity, and specificity as metrics. Radiomics models based on IVIM-derived parametric maps (ADC, Dslow, Dfast, PF) achieved superior diagnostic performance, with a validation AUC of 0.790 (95% confidence interval (CI) 0.607-0.937) using ANOVA-based feature selection and AE classification, and a training AUC of 0.770. Integrative models combining multi-b-value DWI and IVIM maps further enhanced predictive power, achieving a validation AUC of 0.790 (95% CI 0.600-0.951) and a training AUC of 0.816, utilizing 16 features selected via ANOVA and classified with the NB algorithm. AE and NB classifiers consistently exhibited the strongest discriminative performance across all model configurations. Notably, the median histogram intensity from the Dslow map was the most influential feature for predicting impaired renal function. This study is the first to comprehensively compare the predictive performance of radiomics models based on IVIM-DWI, including both single b-value DWI and IVIM parametric maps, for early assessment of renal allograft dysfunction. The integrative use of multi-b-value DWI and IVIM imaging markedly improves diagnostic accuracy, demonstrating a robust noninvasive framework for early detection of renal allograft dysfunction.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Accelerated magnetic resonance imaging of hippocampal sclerosis in pediatric patients with deep learning-based reconstruction: comparison of image quality and diagnostic performance with conventional reconstruction.

Peng X, Zhu B, Shao J

•papers•Oct 13 2025

Magnetic resonance imaging (MRI) plays an important role in the diagnosis and treatment of hippocampal sclerosis. However, this exam presents challenges due to long scan times and image quality variability in pediatric patients. This study aims to compare conventional reconstructed MRI and accelerated sequences with and without deep learning-based reconstruction (DLR) with regard to image quality and diagnostic performance in pediatric hippocampal sclerosis patients. A total of 68 pediatric patients proven or suspected to have temporal lobe epilepsy with hippocampal sclerosis who underwent recommended epilepsy structural MRI were included in this study. MRI examination included standard sequences and accelerated sequences with and without DLR. Standard sequences were reconstructed using the conventional pipeline, while accelerated sequences were reconstructed using both the conventional pipeline and DLR pipeline. Two experienced pediatric radiologists independently evaluated the following parameters of three reconstructed image sets on a 5-point scale: image quality, anatomic structure visibility, motion artifact, truncation artifact, image noise, and detectability of hippocampal abnormalities. Signal-to-noise ratio (SNR) measurements of the hippocampus were performed in all sequences and compared between the three sets of images. Inter-reader agreement and agreement between image sets for detecting hippocampal abnormalities were assessed using Cohen's kappa. Images reconstructed with DLR received significantly higher scores of overall image quality, presence of lesion, and image noise than with conventional or original accelerated reconstructions (all P<0.05), while there was no statistical difference of artifacts between the three groups (all P>0.05). The SNR for all sequences with DLR was significantly higher than conventional or original reconstructions without DLR (all P<0.001). Inter-reader agreement showed almost perfect agreement (κ=0.803-0.963) of the imaging manifestations, while agreement between image sets showed substantial agreement to almost perfect agreement (κ=0.778-0.965) of the imaging manifestations. Accelerated sequences with DLR provide a 44% scan time reduction with similar subjective image quality, artifacts, and diagnostic performance to conventional reconstruction sequences.

MRI Reconstruction Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Molecular Biomarkers and Machine Learning in Oral Cancer: A Systematic Review and Meta-Analysis.

Ardila CM, Vivares-Builes AM, Pineda-Vélez E

•papers•Oct 13 2025

This systematic review and meta-analysis aimed to synthesize diagnostic and prognostic performance metrics of machine learning (ML)-based biomarker models in oral squamous cell carcinoma (OSCC) and to integrate biological insights through a functional metasynthesis. Following PRISMA 2020 guidelines, a comprehensive search was conducted up to July 2025. Eligible studies applied ML algorithms to molecular or imaging biomarkers from OSCC patients. Data synthesis incorporated meta-analysis when endpoints and designs were sufficiently comparable; otherwise, study-level results were summarized narratively. Twenty-five studies encompassing 4408 patients were included. Diagnostic performance was strongest for salivary DNA methylation (AUC up to 1.00), metabolomics (AUC ≈0.92), and FTIR imaging (AUC ≈0.91), while autoantibody and microbiome models showed more variable accuracy. Prognostic models based on immune-feature signatures outperformed conventional scores, while multimodal approaches integrating imaging and metabolomics retained strong performance under external validation. Models based on pathomics and MRI radiomics also achieved clinically meaningful accuracy across independent cohorts. Functional metasynthesis revealed convergent biological processes-metabolic reprogramming, immune-inflammatory remodeling, microbiome dysbiosis, and epithelial/extracellular matrix disruption-that underpin predictive accuracy. ML models leveraging molecular and imaging biomarkers show strong potential to improve OSCC diagnosis, risk stratification, and prognosis, particularly through multimodal integration.

Mixed Modality Classification Meta Analysis In Silico Benchmark SOTA

Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation

Kai Han, Siqi Ma, Chengxuan Qian, Jun Chen, Chongwen Lyu, Yuqing Song, Zhe Liu

•preprint•Oct 13 2025

Accurate segmentation of tumors and adjacent normal tissues in medical images is essential for surgical planning and tumor staging. Although foundation models generally perform well in segmentation tasks, they often struggle to focus on foreground areas in complex, low-contrast backgrounds, where some malignant tumors closely resemble normal organs, complicating contextual differentiation. To address these challenges, we propose the Foreground-Aware Spectrum Segmentation (FASS) framework. First, we introduce a foreground-aware module to amplify the distinction between background and the entire volume space, allowing the model to concentrate more effectively on target areas. Next, a feature-level frequency enhancement module, based on wavelet transform, extracts discriminative high-frequency features to enhance boundary recognition and detail perception. Eventually, we introduce an edge constraint module to preserve geometric continuity in segmentation boundaries. Extensive experiments on multiple medical datasets demonstrate superior performance across all metrics, validating the effectiveness of our framework, particularly in robustness under complex conditions and fine structure recognition. Our framework significantly enhances segmentation of low-contrast images, paving the way for applications in more diverse and complex medical imaging scenarios.

CT Segmentation Abdominal Methodology In Silico Academic Lab Breakthrough

Development and validation of a nomogram combining PI-RADS v2.1 and clinical indicators for the diagnosis of prostate cancer in patients with PSA ≤ 20 ng/mL.

Su Z, Cai B, Li L, Huang Z, Fu Y

•papers•Oct 13 2025

This investigation focused on developing a predictive clinical tool that combines biparametric MRI-derived PI-RADS v2.1 assessments with patient-specific biomarkers. The model was designed to optimize prostate cancer detection reliability in individuals exhibiting prostate-specific antigen concentrations below 20 ng/mL, particularly targeting the diagnostic challenges presented by this intermediate PSA range. By systematically integrating imaging characteristics with laboratory parameters, the research sought to establish a practical decision-making framework for clinicians managing suspected prostate malignancies. A total of 218 patients with confirmed pathological diagnoses between January 2020 and December 2023 underwent a retrospective review. The cohort was divided into two distinct groups: a training cohort comprising 153 cases and a validation cohort containing 65 cases. For nomogram predictor selection, statistical modeling incorporated machine learning approaches including LASSO regression with ten-fold cross-validation, supplemented by both univariate and multivariate logistic regression analyses to identify independent prognostic factors.The nomogram's predictive performance was evaluated by determining the area under the receiver operating characteristic curve (AUC), developing calibration plots, and implementing decision curve analysis (DCA). The study findings revealed that among patients with prostate-specific antigen (PSA) concentrations ≤ 20 ng/mL, four parameters - PI-RADS v2.1 classification, free PSA ratio (%fPSA), diffusion-weighted imaging-derived ADC values, and serum hemoglobin concentrations - emerged as independent predictive factors for prostate carcinoma detection. The composite predictive model demonstrated superior diagnostic performance compared to individual parameters, achieving an elevated receiver operating characteristic curve area of 0.922. Notably, the PI-RADS v2.1 scoring system alone showed an AUC of 0.848 (P < 0.05) in this patient cohort. The area under the curve (AUC) for free PSA percentage reached 0.760 (P < 0.001), while apparent diffusion coefficient (ADC) values showed superior discriminative ability with an AUC of 0.825 (P < 0.001). Hemoglobin levels exhibited moderate predictive value (AUC = 0.622, P = 0.006). The developed predictive model exhibited outstanding diagnostic accuracy, achieving AUC scores of 0.922 in the training dataset and 0.898 in the validation cohort, complemented by precise calibration metrics. Integrating PI-RADS v2.1 scores with clinical parameters enhanced diagnostic performance, yielding 81.2% sensitivity and 89.3% specificity in lesion characterization.This marked improvement becomes evident when compared to the standalone application of PI-RADS v2.1, which yielded sensitivity and specificity values of 73.2% and 86.8% correspondingly. The PI-RADS v2.1 assessment derived from biparametric MRI demonstrates standalone prognostic value for detecting prostate malignancies in patients with serum PSA concentrations below 20 ng/mL. This imaging-based scoring system, when integrated with additional clinical parameters, significantly enhances the diagnostic reliability of clinical assessments. The methodology provides clinicians with a non-invasive evaluation tool featuring intuitive visualization capabilities, potentially reducing the necessity for invasive biopsy procedures while maintaining diagnostic precision. This integrated methodology demonstrates considerable promise as an effective framework for improving diagnostic accuracy in PCa identification and supporting therapeutic choices in clinical practice.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

A Comparison of Different Radiomics Methods Predicting Cervical Lymph Node Metastasis in Papillary Thyroid Carcinoma.

Deng Y, Zheng L, Zhang M, Xu L, Li Q, Zhou L, Wang Q, Gong Y, Li S

•papers•Oct 13 2025

The preoperative identification of cervical lymph node metastasis in papillary thyroid carcinoma is essential in tailoring surgical treatment. We aim to develop an ultrasound-based handcrafted radiomics model, a deep learning radiomics model, and a combined model for better predicting cervical lymph node metastasis in papillary thyroid carcinoma patients. A retrospective cohort of 441 patients was included (308 in the training set, 133 in the testing set). Handcrafted radiomics features, manually selected by physicians, were extracted using Pyradiomics software, whereas deep learning radiomics features were extracted from a pretrained DenseNet121 network, a fully automatic process that eliminates the need for manual selection. A combined model integrating radiomics signatures from the above models was developed. ROC analysis was used to evaluate the performance of three models. DeLong's tests were conducted to compare the AUC values of the different models in the training and testing sets. In the training set, the AUC value of the combined model (0.790) was significantly higher than that of the handcrafted radiomics (0.743, p = 0.021) and deep learning radiomics (0.730, p = 0.003) models. In the testing set, although the AUC value of the combined model (0.761) was higher than that of the handcrafted radiomics model (0.734, p = 0.368) and deep learning radiomics model (0.719, p = 0.228), statistical significance was not reached. The handcrafted radiomics model exhibited high accuracy in both the training and testing sets (0.714 and 0.707), while the deep learning radiomics model showed accuracy below 0.7 in both the training and testing sets (0.698 and 0.662). The combined model based on conventional ultrasound images enhances the predictive performance compared to different radiomics models alone.

Ultrasound Classification Retrospective Clinical In Silico Academic Lab

How many samples to label for an application given a foundation model? Chest X-ray classification study

Nikolay Nechaev, Evgenia Przhezdzetskaya, Viktor Gombolevskiy, Dmitry Umerenkov, Dmitry Dylov

•preprint•Oct 13 2025

Chest X-ray classification is vital yet resource-intensive, typically demanding extensive annotated data for accurate diagnosis. Foundation models mitigate this reliance, but how many labeled samples are required remains unclear. We systematically evaluate the use of power-law fits to predict the training size necessary for specific ROC-AUC thresholds. Testing multiple pathologies and foundation models, we find XrayCLIP and XraySigLIP achieve strong performance with significantly fewer labeled examples than a ResNet-50 baseline. Importantly, learning curve slopes from just 50 labeled cases accurately forecast final performance plateaus. Our results enable practitioners to minimize annotation costs by labeling only the essential samples for targeted performance.

X-Ray Classification Chest Methodology In Silico Benchmark SOTA

LightPneumoNet: Lightweight Pneumonia Classifier

Neilansh Chauhan, Piyush Kumar Gupta, Faraz Doja

•preprint•Oct 13 2025

Effective pneumonia diagnosis is often challenged by the difficulty of deploying large, computationally expensive deep learning models in resource-limited settings. This study introduces LightPneumoNet, an efficient, lightweight convolutional neural network (CNN) built from scratch to provide an accessible and accurate diagnostic solution for pneumonia detection from chest X-rays. Our model was trained on a public dataset of 5,856 chest X-ray images. Preprocessing included image resizing to 224x224, grayscale conversion, and pixel normalization, with data augmentation (rotation, zoom, shear) to prevent overfitting. The custom architecture features four blocks of stacked convolutional layers and contains only 388,082 trainable parameters, resulting in a minimal 1.48 MB memory footprint. On the independent test set, our model delivered exceptional performance, achieving an overall accuracy of 0.942, precision of 0.92, and an F1-Score of 0.96. Critically, it obtained a sensitivity (recall) of 0.99, demonstrating a near-perfect ability to identify true pneumonia cases and minimize clinically significant false negatives. Notably, LightPneumoNet achieves this high recall on the same dataset where existing approaches typically require significantly heavier architectures or fail to reach comparable sensitivity levels. The model's efficiency enables deployment on low-cost hardware, making advanced computer-aided diagnosis accessible in underserved clinics and serving as a reliable second-opinion tool to improve patient outcomes.

X-Ray Classification Chest Methodology In Silico Breakthrough

Filter Papers

Tags

Large language models for patient education prior to interventional radiology procedures: a comparative study.

Deep learning for multi-modal medical image segmentation: a survey and comparative study.

Radiomics-based IVIM-DWI for early noninvasive assessment of renal allograft dysfunction.

Accelerated magnetic resonance imaging of hippocampal sclerosis in pediatric patients with deep learning-based reconstruction: comparison of image quality and diagnostic performance with conventional reconstruction.

Molecular Biomarkers and Machine Learning in Oral Cancer: A Systematic Review and Meta-Analysis.

Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation

Development and validation of a nomogram combining PI-RADS v2.1 and clinical indicators for the diagnosis of prostate cancer in patients with PSA ≤ 20 ng/mL.

A Comparison of Different Radiomics Methods Predicting Cervical Lymph Node Metastasis in Papillary Thyroid Carcinoma.

How many samples to label for an application given a foundation model? Chest X-ray classification study

LightPneumoNet: Lightweight Pneumonia Classifier

Ready to Sharpen Your Edge?