Latest Papers on Radiology AI. Sources: medrxiv, Order: Best Match, Limit: 10.

Large Language Model-Based Entity Extraction Reliably Classifies Pancreatic Cysts and Reveals Predictors of Malignancy: A Cross-Sectional and Retrospective Cohort Study

Papale, A. J., Flattau, R., Vithlani, N., Mahajan, D., Ziemba, Y., Zavadsky, T., Carvino, A., King, D., Nadella, S.

•preprint•Jul 17 2025

Pancreatic cystic lesions (PCLs) are often discovered incidentally on imaging and may progress to pancreatic ductal adenocarcinoma (PDAC). PCLs have a high incidence in the general population, and adherence to screening guidelines can be variable. With the advent of technologies that enable automated text classification, we sought to evaluate various natural language processing (NLP) tools including large language models (LLMs) for identifying and classifying PCLs from radiology reports. We correlated our classification of PCLs to clinical features to identify risk factors for a positive PDAC biopsy. We contrasted a previously described NLP classifier to LLMs for prospective identification of PCLs in radiology. We evaluated various LLMs for PCL classification into low-risk or high-risk categories based on published guidelines. We compared prompt-based PCL classification to specific entity-guided PCL classification. To this end, we developed tools to deidentify radiology and track patients longitudinally based on their radiology reports. Additionally, we used our newly developed tools to evaluate a retrospective database of patients who underwent pancreas biopsy to determine associated factors including those in their radiology reports and clinical features using multivariable logistic regression modelling. Of 14,574 prospective radiology reports, 665 (4.6%) described a pancreatic cyst, including 175 (1.2%) high-risk lesions. Our Entity-Extraction Large Language Model tool achieved recall 0.992 (95% confidence interval [CI], 0.985-0.998), precision 0.988 (0.979-0.996), and F1-score 0.990 (0.985-0.995) for detecting cysts; F1-scores were 0.993 (0.987-0.998) for low-risk and 0.977 (0.952-0.995) for high-risk classification. Among 4,285 biopsy patients, 330 had pancreatic cysts documented [≥]6 months before biopsy. In the final multivariable model (AUC = 0.877), independent predictors of adenocarcinoma were change in duct caliber with upstream atrophy (adjusted odds ratio [AOR], 4.94; 95% CI, 1.30-18.79), mural nodules (AOR, 11.02; 1.81-67.26), older age (AOR, 1.10; 1.05-1.16), lower body mass index (AOR, 0.86; 0.76-0.96), and total bilirubin (AOR, 1.81; 1.18-2.77). Automated NLP-based analysis of radiology reports using LLM-driven entity extraction can accurately identify and risk-stratify PCLs and, when retrospectively applied, reveal factors predicting malignant progression. Widespread implementation may improve surveillance and enable earlier intervention.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Myocardial Native T1 Mapping in the German National Cohort (NAKO): Associations with Age, Sex, and Cardiometabolic Risk Factors

Ammann, C., Gröschel, J., Saad, H., Rospleszcz, S., Schuppert, C., Hadler, T., Hickstein, R., Niendorf, T., Nolde, J. M., Schulze, M. B., Greiser, K. H., Decker, J. A., Kröncke, T., Küstner, T., Nikolaou, K., Willich, S. N., Keil, T., Dörr, M., Bülow, R., Bamberg, F., Pischon, T., Schlett, C. L., Schulz-Menger, J.

•preprint•Jul 17 2025

Background and AimsIn cardiovascular magnetic resonance (CMR), myocardial native T1 mapping enables quantitative, non-invasive tissue characterization and is sensitive to subclinical changes in myocardial structure and composition. We investigated how age, sex, and cardiometabolic risk factors are associated with myocardial T1 in a population-based analysis within the German National Cohort (NAKO). MethodsThis cross-sectional study included 29,573 prospectively enrolled participants who underwent CMR-based midventricular T1 mapping at 3.0 T, alongside clinical phenotyping. After artificial intelligence-assisted myocardial segmentation, a subset of 9,162 outliers was subjected to manual quality control according to clinical evaluation standards. Associations with cardiometabolic risk factors, identified through self-reported medical history, clinical chemistry, and blood pressure measurements, were evaluated using adjusted linear regression models. ResultsWomen had higher T1 values than men, with sex differences progressively declining with age. T1 was significantly elevated in individuals with diabetes ({beta}=3.91 ms; p<0.001), kidney disease ({beta}=3.44 ms; p<0.001), and current smoking ({beta}=6.67 ms; p<0.001). Conversely, hyperlipidaemia was significantly associated with lower T1 ({beta}=-4.41 ms; p<0.001). Associations with hypertension showed a sex-specific pattern: T1 was lower in women but increased with hypertension severity in men. ConclusionsMyocardial native T1 varies by sex and age and shows associations with major cardiometabolic risk factors. Notably, lower T1 times in participants with hyperlipidaemia may indicate a direct effect of blood lipids on the heart. Our findings support the utility of T1 mapping as a sensitive marker of early myocardial changes and highlight the sex-specific interplay between cardiometabolic health and myocardial tissue composition. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=139 SRC="FIGDIR/small/25331651v1_ufig1.gif" ALT="Figure 1"> View larger version (44K): [email protected]@131514borg.highwire.dtl.DTLVardef@d03877org.highwire.dtl.DTLVardef@2b2fec_HPS_FORMAT_FIGEXP M_FIG C_FIG Key QuestionHow are age, sex, and cardiometabolic risk factors associated with myocardial native T1, a quantitative magnetic resonance imaging marker of myocardial tissue composition, in a large-scale population-based evaluation within the German National Cohort (NAKO)? Key FindingT1 relaxation times were higher in women and gradually converged between sexes with age. Diabetes, kidney disease, smoking, and hypertension in men were associated with prolonged T1 times. Unexpectedly, hyperlipidaemia and hypertension in women showed a negative association with T1. Take-Home MessageNative T1 mapping is sensitive to subclinical myocardial changes and reflects a close interplay between metabolic and myocardial health. It reveals marked age-dependent sex differences and sex-specific responses in myocardial tissue composition to cardiometabolic risk factors.

MRI Segmentation Cardiac Retrospective Clinical In Silico Consortium

Cardiac Function Assessment with Deep-Learning-Based Automatic Segmentation of Free-Running 4D Whole-Heart CMR

Ogier, A. C., Baup, S., Ilanjian, G., Touray, A., Rocca, A., Banus Cobo, J., Monton Quesada, I., Nicoletti, M., Ledoux, J.-B., Richiardi, J., Holtackers, R. J., Yerly, J., Stuber, M., Hullin, R., Rotzinger, D., van Heeswijk, R. B.

•preprint•Jul 17 2025

BackgroundFree-running (FR) cardiac MRI enables free-breathing ECG-free fully dynamic 5D (3D spatial+cardiac+respiration dimensions) imaging but poses significant challenges for clinical integration due to the volume and complexity of image analysis. Existing segmentation methods are tailored to 2D cine or static 3D acquisitions and cannot leverage the unique spatial-temporal wealth of FR data. PurposeTo develop and validate a deep learning (DL)-based segmentation framework for isotropic 3D+cardiac cycle FR cardiac MRI that enables accurate, fast, and clinically meaningful anatomical and functional analysis. MethodsFree-running, contrast-free bSSFP acquisitions at 1.5T and contrast-enhanced GRE acquisitions at 3T were used to reconstruct motion-resolved 5D datasets. From these, the end-expiratory respiratory phase was retained to yield fully isotropic 4D datasets. Automatic propagation of a limited set of manual segmentations was used to segment the left and right ventricular blood pool (LVB, RVB) and left ventricular myocardium (LVM) on reformatted short-axis (SAX) end-systolic (ES) and end-diastolic (ED) images. These were used to train a 3D nnU-Net model. Validation was performed using geometric metrics (Dice similarity coefficient [DSC], relative volume difference [RVD]), clinical metrics (ED and ES volumes, ejection fraction [EF]), and physiological consistency metrics (systole-diastole LVM volume mismatch and LV-RV stroke volume agreement). To assess the robustness and flexibility of the approach, we evaluated multiple additional DL training configurations such as using 4D propagation-based data augmentation to incorporate all cardiac phases into training. ResultsThe main proposed method achieved automatic segmentation within a minute, delivering high geometric accuracy and consistency (DSC: 0.94 {+/-} 0.01 [LVB], 0.86 {+/-} 0.02 [LVM], 0.92 {+/-} 0.01 [RVB]; RVD: 2.7%, 5.8%, 4.5%). Clinical LV metrics showed excellent agreement (ICC > 0.98 for EDV/ESV/EF, bias < 2 mL for EDV/ESV, < 1% for EF), while RV metrics remained clinically reliable (ICC > 0.93 for EDV/ESV/EF, bias < 1 mL for EDV/ESV, < 1% for EF) but exhibited wider limits of agreement. Training on all cardiac phases improved temporal coherence, reducing LVM volume mismatch from 4.0% to 2.6%. ConclusionThis study validates a DL-based method for fast and accurate segmentation of whole-heart free-running 4D cardiac MRI. Robust performance across diverse protocols and evaluation with complementary metrics that match state-of-the-art benchmarks supports its integration into clinical and research workflows, helping to overcome a key barrier to the broader adoption of free-running imaging.

MRI Segmentation Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AI-Powered Segmentation and Prognosis with Missing MRI in Pediatric Brain Tumors

Chrysochoou, D., Gandhi, D., Adib, S., Familiar, A., Khalili, N., Khalili, N., Ware, J. B., Tu, W., Jain, P., Anderson, H., Haldar, S., Storm, P. B., Franson, A., Prados, M., Kline, C., Mueller, S., Resnick, A., Vossough, A., Davatzikos, C., Nabavizadeh, A., Fathi Kazerooni, A.

•preprint•Jul 16 2025

ImportanceBrain MRI is the main imaging modality for pediatric brain tumors (PBTs); however, incomplete MRI exams are common in pediatric neuro-oncology settings and pose a barrier to the development and application of deep learning (DL) models, such as tumor segmentation and prognostic risk estimation. ObjectiveTo evaluate DL-based strategies (image-dropout training and generative image synthesis) and heuristic imputation approaches for handling missing MRI sequences in PBT imaging from clinical acquisition protocols, and to determine their impact on segmentation accuracy and prognostic risk estimation. DesignThis cohort study included 715 patients from the Childrens Brain Tumor Network (CBTN) and BraTS-PEDs, and 43 patients with longitudinal MRI (157 timepoints) from PNOC003/007 clinical trials. We developed a dropout-trained nnU-Net tumor segmentation model that randomly omitted FLAIR and/or T1w (no contrast) sequences during training to simulate missing inputs. We compared this against three imputation approaches: a generative model for image synthesis, copy-substitution heuristics, and zeroed missing inputs. Model-generated tumor volumes from each segmentation method were compared and evaluated against ground truth (expert manual segmentations) and incorporated into time-varying Cox regression models for survival analysis. SettingMulti-institutional PBT datasets and longitudinal clinical trial cohorts. ParticipantsAll patients had multi-parametric MRI and expert manual segmentations. The PNOC cohort had a median of three imaging timepoints and associated clinical data. Main Outcomes and MeasuresSegmentation accuracy (Dice scores), image quality metrics for synthesized scans (SSIM, PSNR, MSE), and survival discrimination (C-index, hazard ratios). ResultsThe dropout model achieved robust segmentation under missing MRI, with [≤]0.04 Dice drop and a stable C-index of 0.65 compared to complete-input performance. DL-based MRI synthesis achieved high image quality (SSIM > 0.90) and removed artifacts, benefiting visual interpretability. Performance was consistent across cohorts and missing data scenarios. Conclusion and RelevanceModality-dropout training yields robust segmentation and risk-stratification on incomplete pediatric MRI without the computational and clinical complexity of synthesis approaches. Image synthesis, though less effective for these tasks, provides complementary benefits for artifact removal and qualitative assessment of missing or corrupted MRI scans. Together, these approaches can facilitate broader deployment of AI tools in real-world pediatric neuro-oncology settings.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

SLOTMFound: Foundation-Based Diagnosis of Multiple Sclerosis Using Retinal SLO Imaging and OCT Thickness-maps

Esmailizadeh, R., Aghababaei, A., Mirzaei, S., Arian, R., Kafieh, R.

•preprint•Jul 15 2025

Multiple Sclerosis (MS) is a chronic autoimmune disorder of the central nervous system that can lead to significant neurological disability. Retinal imaging--particularly Scanning Laser Ophthalmoscopy (SLO) and Optical Coherence Tomography (OCT)--provides valuable biomarkers for early MS diagnosis through non-invasive visualization of neurodegenerative changes. This study proposes a foundation-based bi-modal classification framework that integrates SLO images and OCT-derived retinal thickness maps for MS diagnosis. To facilitate this, we introduce two modality-specific foundation models--SLOFound and TMFound--fine-tuned from the RETFound-Fundus backbone using an independent dataset of 203 healthy eyes, acquired at Noor Ophthalmology Hospital with the Heidelberg Spectralis HRA+OCT system. This dataset, which contains only normal cases, was used exclusively for encoder adaptation and is entirely disjoint from the classification dataset. For the classification stage, we use a separate dataset comprising IR-SLO images from 32 MS patients and 70 healthy controls, collected at the Kashani Comprehensive MS Center in Isfahan, Iran. We first assess OCT-derived maps layer-wise and identify the Ganglion Cell-Inner Plexiform Layer (GCIPL) as the most informative for MS detection. All subsequent analyses utilize GCIPL thickness maps in conjunction with SLO images. Experimental evaluations on the MS classification dataset demonstrate that our foundation-based bi-modal model outperforms unimodal variants and a prior ResNet-based state-of-the-art model, achieving a classification accuracy of 97.37%, with perfect sensitivity (100%). These results highlight the effectiveness of leveraging pre-trained foundation models, even when fine-tuned on limited data, to build robust, efficient, and generalizable diagnostic tools for MS in medical imaging contexts where labeled datasets are often scarce.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA

A Clinically-Informed Framework for Evaluating Vision-Language Models in Radiology Report Generation: Taxonomy of Errors and Risk-Aware Metric

Guan, H., Hou, P. C., Hong, P., Wang, L., Zhang, W., Du, X., Zhou, Z., Zhou, L.

•preprint•Jul 14 2025

Recent advances in vision-language models (VLMs) have enabled automatic radiology report generation, yet current evaluation methods remain limited to general-purpose NLP metrics or coarse classification-based clinical scores. In this study, we propose a clinically informed evaluation framework for VLM-generated radiology reports that goes beyond traditional performance measures. We define a taxonomy of 12 radiology-specific error types, each annotated with clinical risk levels (low, medium, high) in collaboration with physicians. Using this framework, we conduct a comprehensive error analysis of three representative VLMs, i.e., DeepSeek VL2, CXR-LLaVA, and CheXagent, on 685 gold-standard, expert-annotated MIMIC-CXR cases. We further introduce a risk-aware evaluation metric, the Clinical Risk-weighted Error Score for Text-generation (CREST), to quantify safety impact. Our findings reveal critical model vulnerabilities, common error patterns, and condition-specific risk profiles, offering actionable insights for model development and deployment. This work establishes a safety-centric foundation for evaluating and improving medical report generation models. The source code of our evaluation framework, including CREST computation and error taxonomy analysis, is available at https://github.com/guanharry/VLM-CREST.

X-Ray LLM Radiology Report Chest Methodology In Silico Open Code GenAI

The Potential of ChatGPT as an Aiding Tool for the Neuroradiologist

nikola, s., paz, d.

•preprint•Jul 14 2025

PurposeThis study aims to explore whether ChatGPT can serve as an assistive tool for neuroradiologists in establishing a reasonable differential diagnosis in central nervous system tumors based on MRI images characteristics. MethodsThis retrospective study included 50 patients aged 18-90 who underwent imaging and surgery at the Western Galilee Medical Center. ChatGPT was provided with demographic and radiological information of the patients to generate differential diagnoses. We compared ChatGPTs performance to an experienced neuroradiologist, using pathological reports as the gold standard. Quantitative data were described using means and standard deviations, median and range. Qualitative data were described using frequencies and percentages. The level of agreement between examiners (neuroradiologist versus ChatGPT) was assessed using Fleiss kappa coefficient. A significance value below 5% was considered statistically significant. Statistical analysis was performed using IBM SPSS Statistics, version 27. ResultsThe results showed that while ChatGPT demonstrated good performance, particularly in identifying common tumors such as glioblastoma and meningioma, its overall accuracy (48%) was lower than that of the neuroradiologist (70%). The AI tool showed moderate agreement with the neuroradiologist (kappa = 0.445) and with pathology results (kappa = 0.419). ChatGPTs performance varied across tumor types, performing better with common tumors but struggling with rarer ones. ConclusionThis study suggests that ChatGPT has the potential to serve as an assistive tool in neuroradiology for establishing a reasonable differential diagnosis in central nervous system tumors based on MRI images characteristics. However, its limitations and potential risks must be considered, and it should therefore be used with caution.

MRI LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI

Three-dimensional high-content imaging of unstained soft tissue with subcellular resolution using a laboratory-based multi-modal X-ray microscope

Esposito, M., Astolfo, A., Zhou, Y., Buchanan, I., Teplov, A., Endrizzi, M., Egido Vinogradova, A., Makarova, O., Divan, R., Tang, C.-M., Yagi, Y., Lee, P. D., Walsh, C. L., Ferrara, J. D., Olivo, A.

•preprint•Jul 14 2025

With increasing interest in studying biological systems across spatial scales--from centimetres down to nanometres--histology continues to be the gold standard for tissue imaging at cellular resolution, providing an essential bridge between macroscopic and nanoscopic analysis. However, its inherently destructive and two-dimensional nature limits its ability to capture the full three-dimensional complexity of tissue architecture. Here we show that phase-contrast X-ray microscopy can enable three-dimensional virtual histology with subcellular resolution. This technique provides direct quantification of electron density without restrictive assumptions, allowing for direct characterisation of cellular nuclei in a standard laboratory setting. By combining high spatial resolution and soft tissue contrast, with automated segmentation of cell nuclei, we demonstrated virtual H&E staining using machine learning-based style transfer, yielding volumetric datasets compatible with existing histopathological analysis tools. Furthermore, by integrating electron density and the sensitivity to nanometric features of the dark field contrast channel, we achieve stain-free, high-content imaging capable of distinguishing nuclei and extracellular matrix.

X-Ray Segmentation Methodology Prototype GenAI

Explainable AI for Precision Oncology: A Task-Specific Approach Using Imaging, Multi-omics, and Clinical Data

Park, Y., Park, S., Bae, E.

•preprint•Jul 14 2025

Despite continued advances in oncology, cancer remains a leading cause of global mortality, highlighting the need for diagnostic and prognostic tools that are both accurate and interpretable. Unimodal approaches often fail to capture the biological and clinical complexity of tumors. In this study, we present a suite of task-specific AI models that leverage CT imaging, multi-omics profiles, and structured clinical data to address distinct challenges in segmentation, classification, and prognosis. We developed three independent models across large public datasets. Task 1 applied a 3D U-Net to segment pancreatic tumors from CT scans, achieving a Dice Similarity Coefficient (DSC) of 0.7062. Task 2 employed a hierarchical ensemble of omics-based classifiers to distinguish tumor from normal tissue and classify six major cancer types with 98.67% accuracy. Task 3 benchmarked classical machine learning models on clinical data for prognosis prediction across three cancers (LIHC, KIRC, STAD), achieving strong performance (e.g., C-index of 0.820 in KIRC, AUC of 0.978 in LIHC). Across all tasks, explainable AI methods such as SHAP and attention-based visualization enabled transparent interpretation of model outputs. These results demonstrate the value of tailored, modality-aware models and underscore the clinical potential of applying such tailored AI systems for precision oncology. Technical FoundationsO_LISegmentation (Task 1): A custom 3D U-Net was trained using the Task07_Pancreas dataset from the Medical Segmentation Decathlon (MSD). CT images were preprocessed with MONAI-based pipelines, resampled to (64, 96, 96) voxels, and intensity-windowed to HU ranges of -100 to 240. C_LIO_LIClassification (Task 2): Multi-omics data from TCGA--including gene expression, methylation, miRNA, CNV, and mutation profiles--were log-transformed and normalized. Five modality-specific LightGBM classifiers generated meta-features for a late-fusion ensemble. Stratified 5-fold cross-validation was used for evaluation. C_LIO_LIPrognosis (Task 3): Clinical variables from TCGA were curated and imputed (median/mode), with high-missing-rate columns removed. Survival models (e.g., Cox-PH, Random Forest, XGBoost) were trained with early stopping. No omics or imaging data were used in this task. C_LIO_LIInterpretability: SHAP values were computed for all tree-based models, and attention-based overlays were used in imaging tasks to visualize salient regions. C_LI

CT Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA

A Multi-Modal Deep Learning Framework for Predicting PSA Progression-Free Survival in Metastatic Prostate Cancer Using PSMA PET/CT Imaging

Ghaderi, H., Shen, C., Issa, W., Pomper, M. G., Oz, O. K., Zhang, T., Wang, J., Yang, D. X.

•preprint•Jul 14 2025

PSMA PET/CT imaging has been increasingly utilized in the management of patients with metastatic prostate cancer (mPCa). Imaging biomarkers derived from PSMA PET may provide improved prognostication and prediction of treatment response for mPCa patients. This study investigates a novel deep learning-derived imaging biomarker framework for outcome prediction using multi-modal PSMA PET/CT and clinical features. A single institution cohort of 99 mPCa patients with 396 lesions was evaluated. Imaging features were extracted from cropped lesion areas and combined with clinical variables including body mass index, ECOG performance status, prostate specific antigen (PSA) level, Gleason score, and treatments received. The PSA progression-free survival (PFS) model was trained using a ResNet architecture with a Cox proportional hazards loss function using five-fold cross-validation. Performance was assessed using concordance index (C-index) and Kaplan-Meier survival analysis. Among evaluated model architectures, the ResNet-18 backbone offered the best performance. The multi-modal deep learning framework achieved a 5-fold cross-validation C-index ranging from 0.75 to 0.94, outperforming models incorporating imaging only (0.70-0.89) and clinical features only (0.53-0.65). Kaplan-Meir survival analysis performed on the deep learning-derived predictions demonstrated clear risk stratification, with a median PSA progression free survival (PFS) of 19.7 months in the high-risk group and 26 months in the low-risk group (P < 0.001). Deep learning-derived imaging biomarker based on PSMA PET/CT can effectively predict PSA PFS for mPCa patients. Further clinical validation in prospective cohorts is warranted.

PET Classification Abdominal Retrospective Clinical In Silico

Large Language Model-Based Entity Extraction Reliably Classifies Pancreatic Cysts and Reveals Predictors of Malignancy: A Cross-Sectional and Retrospective Cohort Study

Myocardial Native T1 Mapping in the German National Cohort (NAKO): Associations with Age, Sex, and Cardiometabolic Risk Factors

Cardiac Function Assessment with Deep-Learning-Based Automatic Segmentation of Free-Running 4D Whole-Heart CMR

AI-Powered Segmentation and Prognosis with Missing MRI in Pediatric Brain Tumors

SLOTMFound: Foundation-Based Diagnosis of Multiple Sclerosis Using Retinal SLO Imaging and OCT Thickness-maps

A Clinically-Informed Framework for Evaluating Vision-Language Models in Radiology Report Generation: Taxonomy of Errors and Risk-Aware Metric

The Potential of ChatGPT as an Aiding Tool for the Neuroradiologist

Three-dimensional high-content imaging of unstained soft tissue with subcellular resolution using a laboratory-based multi-modal X-ray microscope

Explainable AI for Precision Oncology: A Task-Specific Approach Using Imaging, Multi-omics, and Clinical Data

A Multi-Modal Deep Learning Framework for Predicting PSA Progression-Free Survival in Metastatic Prostate Cancer Using PSMA PET/CT Imaging

Ready to Sharpen Your Edge?