Latest Papers on Radiology AI. Tags: None

Deep learning models for deriving optimised measures of fat and muscle mass from MRI.

Thomas B, Ali MA, Ali FMH, Chung A, Joshi M, Maiguma-Wilson S, Reiff G, Said H, Zalmay P, Berks M, Blackledge MD, O'Connor JPB

•papers•Jul 17 2025

Fat and muscle mass are potential biomarkers of wellbeing and disease in oncology, but clinical measurement methods vary considerably. Here we evaluate the accuracy, precision and ability to track change for multiple deep learning (DL) models that quantify fat and muscle mass from abdominal MRI. Specifically, subcutaneous fat (SF), intra-abdominal fat (VF), external muscle (EM) and psoas muscle (PM) were evaluated using 15 convolutional neural network (CNN)-based and 4 transformer-based deep learning model architectures. There was negligible difference in the accuracy of human observers and all deep learning models in delineating SF or EM. Both of these tissues had excellent repeatability of their delineation. VF was measured most accurately by the human observers, then by CNN-based models, which outperformed transformer-based models. In distinction, PM delineation accuracy and repeatability was poor for all assessments. Repeatability limits of agreement determined when changes measured in individual patients were due to real change rather than test-retest variation. In summary, DL model accuracy and precision of delineating fat and muscle volumes varies between CNN-based and transformer-based models, between different tissues and in some cases with gender. These factors should be considered when investigators deploy deep learning methods to estimate biomarkers of fat and muscle mass.

MRI Segmentation Abdominal Methodology In Silico Academic Lab

Multi-modal Risk Stratification in Heart Failure with Preserved Ejection Fraction Using Clinical and CMR-derived Features: An Approach Incorporating Model Explainability.

Zhang S, Lin Y, Han D, Pan Y, Geng T, Ge H, Zhao J

•papers•Jul 17 2025

Heart failure with preserved ejection fraction (HFpEF) poses significant diagnostic and prognostic challenges due to its clinical heterogeneity. This study proposes a multi-modal, explainable machine learning framework that integrates clinical variables and cardiac magnetic resonance (CMR)-derived features, particularly epicardial adipose tissue (EAT) volume, to improve risk stratification and outcome prediction in patients with HFpEF. A retrospective cohort of 301 participants (171 in the HFpEF group and 130 in the control group) was analyzed. Baseline characteristics, CMR-derived EAT volume, and laboratory biomarkers were integrated into machine learning models. Model performance was evaluated using accuracy, precision, recall, and F1-score. Additionally, receiver operating characteristic area under the curve (ROC-AUC) and precision-recall area under the curve (PR-AUC) were employed to assess discriminative power across varying decision thresholds. Hyperparameter optimization and ensemble techniques were applied to enhance predictive performance. HFpEF patients exhibited significantly higher EAT volume (70.9±27.3 vs. 41.9±18.3 mL, p<0.001) and NT-proBNP levels (1574 [963,2722] vs. 33 [10,100] pg/mL, p<0.001), along with a greater prevalence of comorbidities. The voting classifier demonstrated the highest accuracy for HFpEF diagnosis (0.94), with a precision of 0.96, recall of 0.94, and an F1-score of 0.95. For prognostic tasks, AdaBoost, XGBoost and Random Forest yielded superior performance in predicting adverse clinical outcomes, including rehospitalization and all-cause mortality (accuracy: 0.95). Key predictive features identified included EAT volume, right atrioventricular groove (Right AVG), tricuspid regurgitation velocity (TRV), and metabolic syndrome. Explainable models combining clinical and CMR-derived features, especially EAT volume, improve support for HFpEF diagnosis and outcome prediction. These findings highlight the value of a data-driven, interpretable approach to characterizing HFpEF phenotypes and may facilitate individualized risk assessment in selected populations.

MRI Classification Cardiac Retrospective Clinical In Silico GenAI

Exploring ChatGPT's potential in diagnosing oral and maxillofacial pathologies: a study of 123 challenging cases.

Tassoker M

•papers•Jul 17 2025

This study aimed to evaluate the diagnostic performance of ChatGPT-4o, a large language model developed by OpenAI, in challenging cases of oral and maxillofacial diseases presented in the Clinicopathologic Conference section of the journal Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology. A total of 123 diagnostically challenging oral and maxillofacial cases published in the aforementioned journal were retrospectively collected. The case presentations, which included detailed clinical, radiographic, and sometimes histopathologic descriptions, were input into ChatGPT-4o. The model was prompted to provide a single most likely diagnosis for each case. These outputs were then compared to the final diagnoses established by expert consensus in each original case report. The accuracy of ChatGPT-4o was calculated based on exact diagnostic matches. ChatGPT-4o correctly diagnosed 96 out of 123 cases, achieving an overall diagnostic accuracy of 78%. Nevertheless, even in cases where the exact diagnosis was not provided, the model often suggested one of the clinically reasonable differential diagnoses. ChatGPT-4o demonstrates a promising ability to assist in the diagnostic process of complex maxillofacial conditions, with a relatively high accuracy rate in challenging cases. While it is not a replacement for expert clinical judgment, large language models may offer valuable decision support in oral and maxillofacial radiology, particularly in educational or consultative contexts. Not applicable.

X-Ray Classification Retrospective Clinical In Silico Big Tech GenAI

Precision Diagnosis and Treatment Monitoring of Glioma via PET Radiomics.

Zhou C, Ji P, Gong B, Kou Y, Fan Z, Wang L

•papers•Jul 17 2025

Glioma, the most common primary intracranial tumor, poses significant challenges to precision diagnosis and treatment due to its heterogeneity and invasiveness. With the introduction of the 2021 WHO classification standard based on molecular biomarkers, the role of imaging in non-invasive subtyping and therapeutic monitoring of gliomas has become increasingly crucial. While conventional MRI shows limitations in assessing metabolic status and differentiating tumor recurrence, positron emission tomography (PET) combined with radiomics and artificial intelligence technologies offers a novel paradigm for precise diagnosis and treatment monitoring through quantitative extraction of multimodal imaging features (e.g., intensity, texture, dynamic parameters). This review systematically summarizes the technical workflow of PET radiomics (including tracer selection, image segmentation, feature extraction, and model construction) and its applications in predicting molecular subtypes (such as IDH mutation and MGMT methylation), distinguishing recurrence from treatment-related changes, and prognostic stratification. Studies demonstrate that amino acid tracers (e.g., 18F-FET, 11C-MET) combined with multimodal radiomics models significantly outperform traditional parametric analysis in diagnostic efficacy. Nevertheless, current research still faces challenges including data heterogeneity, insufficient model interpretability, and lack of clinical validation. Future advancements require multicenter standardized protocols, open-source algorithm frameworks, and multi-omics integration to facilitate the transformative clinical translation of PET radiomics from research to practice.

PET Classification Neurological Review Concept Academic Lab GenAI Open Code

Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Kenza Bouzid, Shruthi Bannur, Felix Meissen, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland

•preprint•Jul 17 2025

Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.

Mixed Modality LLM Radiology Report Chest Methodology In Silico Big Tech Open Code GenAI

AortaDiff: Volume-Guided Conditional Diffusion Models for Multi-Branch Aortic Surface Generation

Delin An, Pan Du, Jian-Xun Wang, Chaoli Wang

•preprint•Jul 17 2025

Accurate 3D aortic construction is crucial for clinical diagnosis, preoperative planning, and computational fluid dynamics (CFD) simulations, as it enables the estimation of critical hemodynamic parameters such as blood flow velocity, pressure distribution, and wall shear stress. Existing construction methods often rely on large annotated training datasets and extensive manual intervention. While the resulting meshes can serve for visualization purposes, they struggle to produce geometrically consistent, well-constructed surfaces suitable for downstream CFD analysis. To address these challenges, we introduce AortaDiff, a diffusion-based framework that generates smooth aortic surfaces directly from CT/MRI volumes. AortaDiff first employs a volume-guided conditional diffusion model (CDM) to iteratively generate aortic centerlines conditioned on volumetric medical images. Each centerline point is then automatically used as a prompt to extract the corresponding vessel contour, ensuring accurate boundary delineation. Finally, the extracted contours are fitted into a smooth 3D surface, yielding a continuous, CFD-compatible mesh representation. AortaDiff offers distinct advantages over existing methods, including an end-to-end workflow, minimal dependency on large labeled datasets, and the ability to generate CFD-compatible aorta meshes with high geometric fidelity. Experimental results demonstrate that AortaDiff performs effectively even with limited training data, successfully constructing both normal and pathologically altered aorta meshes, including cases with aneurysms or coarctation. This capability enables the generation of high-quality visualizations and positions AortaDiff as a practical solution for cardiovascular research.

Mixed Modality Segmentation Cardiac Methodology In Silico Academic Lab GenAI

A conversational artificial intelligence based web application for medical conversations: a prototype for a chatbot

Pires, J. G.

•preprint•Jul 17 2025

BackgroundArtificial Intelligence (AI) has evolved through various trends, with different subfields gaining prominence over time. Currently, Conversational Artificial Intelligence (CAI)--particularly Generative AI--is at the forefront. CAI models are primarily focused on text-based tasks and are commonly deployed as chatbots. Recent advancements by OpenAI have enabled the integration of external, independently developed models, allowing chatbots to perform specialized, task-oriented functions beyond general language processing. ObjectiveThis study aims to develop a smart chatbot that integrates large language models (LLMs) from OpenAI with specialized domain-specific models, such as those used in medical image diagnostics. The system leverages transfer learning via Googles Teachable Machine to construct image-based classifiers and incorporates a diabetes detection model developed in TensorFlow.js. A key innovation is the chatbots ability to extract relevant parameters from user input, trigger the appropriate diagnostic model, interpret the output, and deliver responses in natural language. The overarching goal is to demonstrate the potential of combining LLMs with external models to build multimodal, task-oriented conversational agents. MethodsTwo image-based models were developed and integrated into the chatbot system. The first analyzes chest X-rays to detect viral and bacterial pneumonia. The second uses optical coherence tomography (OCT) images to identify ocular conditions such as drusen, choroidal neovascularization (CNV), and diabetic macular edema (DME). Both models were incorporated into the chatbot to enable image-based medical query handling. In addition, a text-based model was constructed to process physiological measurements for diabetes prediction using TensorFlow.js. The architecture is modular: new diagnostic models can be added without redesigning the chatbot, enabling straightforward functional expansion. ResultsThe findings demonstrate effective integration between the chatbot and the diagnostic models, with only minor deviations from expected behavior. Additionally, a stub function was implemented within the chatbot to schedule medical appointments based on the severity of a patients condition, and it was specifically tested with the OCT and X-ray models. ConclusionsThis study demonstrates the feasibility of developing advanced AI systems--including image-based diagnostic models and chatbot integration--by leveraging Artificial Intelligence as a Service (AIaaS). It also underscores the potential of AI to enhance user experiences in bioinformatics, paving the way for more intuitive and accessible interfaces in the field. Looking ahead, the modular nature of the chatbot allows for the integration of additional diagnostic models as the system evolves.

Mixed Modality Classification Chest Methodology Prototype Academic Lab GenAI

Predicting ADC map quality from T2-weighted MRI: A deep learning approach for early quality assessment to assist point-of-care.

Brender JR, Ota M, Nguyen N, Ford JW, Kishimoto S, Harmon SA, Wood BJ, Pinto PA, Krishna MC, Choyke PL, Turkbey B

•papers•Jul 17 2025

Poor quality prostate MRI images compromise diagnostic accuracy, with diffusion-weighted imaging and the resulting apparent diffusion coefficient (ADC) maps being particularly vulnerable. These maps are critical for prostate cancer diagnosis, yet current methods relying on standardizing technical parameters fail to consistently ensure image quality. We propose a novel deep learning approach to predict low-quality ADC maps using T2-weighted (T2W) images, enabling real-time corrective interventions during imaging. A multi-site dataset of T2W images and ADC maps from 486 patients, spanning 62 external clinics and in-house imaging, was retrospectively analyzed. A neural network was trained to classify ADC map quality as "diagnostic" or "non-diagnostic" based solely on T2W images. Rectal cross-sectional area measurements were evaluated as an interpretable metric for susceptibility-induced distortions. Analysis revealed limited correlation between individual acquisition parameters and image quality, with horizontal phase encoding significant for T2 imaging (p < 0.001, AUC = 0.6735) and vertical resolution for ADC maps (p = 0.006, AUC = 0.6348). By contrast, the neural network achieved robust performance for ADC map quality prediction from T2 images, with 83 % sensitivity and 90 % negative predictive value in multicenter validation, comparable to single-site models using ADC maps directly. Remarkably, it generalized well to unseen in-house data (94 ± 2 % accuracy). Rectal cross-sectional area correlated with ADC quality (AUC = 0.65), offering a simple, interpretable metric. The probability of low quality, uninterpretable ADC maps can be inferred early in the imaging process by a neural network approach, allowing corrective action to be employed.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Patient-Specific and Interpretable Deep Brain Stimulation Optimisation Using MRI and Clinical Review Data

Mikroulis, A., Lasica, A., Filip, P., Bakstein, E., Novak, D.

•preprint•Jul 17 2025

BackgroundOptimisation of Deep Brain Stimulation (DBS) settings is a key aspect in achieving clinical efficacy in movement disorders, such as the Parkinsons disease. Modern techniques attempt to solve the problem through data-intensive statistical and machine learning approaches, adding significant overhead to the existing clinical workflows. Here, we present an optimisation approach for DBS electrode contact and current selection, grounded in routinely collected MRI data, well-established tools (Lead-DBS) and, optionally, clinical review records. MethodsThe pipeline, packaged in a cross-platform tool, uses lead reconstruction data and simulation of volume of tissue activated to estimate the contacts in optimal position relative to the target structure, and suggest optimal stimulation current. The tool then allows further interactive user optimisation of the current settings. Existing electrode contact evaluations can be optionally included in the calculation process for further fine-tuning and adverse effect avoidance. ResultsBased on a sample of 177 implanted electrode reconstructions from 89 Parkinsons disease patients, we demonstrate that DBS parameter setting by our algorithm is more effective in covering the target structure (Wilcoxon p<6e-12, Hedges g>0.34) and minimising electric field leakage to neighbouring regions (p<2e-15, g>0.84) compared to expert parameter settings. ConclusionThe proposed automated method, for optimisation of the DBS electrode contact and current selection shows promising results and is readily applicable to existing clinical workflows. We demonstrate that the algorithmically selected contacts perform better than manual selections according to electric field calculations, allowing for a comparable clinical outcome without the iterative optimisation procedure.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab

Large Language Model-Based Entity Extraction Reliably Classifies Pancreatic Cysts and Reveals Predictors of Malignancy: A Cross-Sectional and Retrospective Cohort Study

Papale, A. J., Flattau, R., Vithlani, N., Mahajan, D., Ziemba, Y., Zavadsky, T., Carvino, A., King, D., Nadella, S.

•preprint•Jul 17 2025

Pancreatic cystic lesions (PCLs) are often discovered incidentally on imaging and may progress to pancreatic ductal adenocarcinoma (PDAC). PCLs have a high incidence in the general population, and adherence to screening guidelines can be variable. With the advent of technologies that enable automated text classification, we sought to evaluate various natural language processing (NLP) tools including large language models (LLMs) for identifying and classifying PCLs from radiology reports. We correlated our classification of PCLs to clinical features to identify risk factors for a positive PDAC biopsy. We contrasted a previously described NLP classifier to LLMs for prospective identification of PCLs in radiology. We evaluated various LLMs for PCL classification into low-risk or high-risk categories based on published guidelines. We compared prompt-based PCL classification to specific entity-guided PCL classification. To this end, we developed tools to deidentify radiology and track patients longitudinally based on their radiology reports. Additionally, we used our newly developed tools to evaluate a retrospective database of patients who underwent pancreas biopsy to determine associated factors including those in their radiology reports and clinical features using multivariable logistic regression modelling. Of 14,574 prospective radiology reports, 665 (4.6%) described a pancreatic cyst, including 175 (1.2%) high-risk lesions. Our Entity-Extraction Large Language Model tool achieved recall 0.992 (95% confidence interval [CI], 0.985-0.998), precision 0.988 (0.979-0.996), and F1-score 0.990 (0.985-0.995) for detecting cysts; F1-scores were 0.993 (0.987-0.998) for low-risk and 0.977 (0.952-0.995) for high-risk classification. Among 4,285 biopsy patients, 330 had pancreatic cysts documented [≥]6 months before biopsy. In the final multivariable model (AUC = 0.877), independent predictors of adenocarcinoma were change in duct caliber with upstream atrophy (adjusted odds ratio [AOR], 4.94; 95% CI, 1.30-18.79), mural nodules (AOR, 11.02; 1.81-67.26), older age (AOR, 1.10; 1.05-1.16), lower body mass index (AOR, 0.86; 0.76-0.96), and total bilirubin (AOR, 1.81; 1.18-2.77). Automated NLP-based analysis of radiology reports using LLM-driven entity extraction can accurately identify and risk-stratify PCLs and, when retrospectively applied, reveal factors predicting malignant progression. Widespread implementation may improve surveillance and enable earlier intervention.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Filter Papers

Tags

Deep learning models for deriving optimised measures of fat and muscle mass from MRI.

Multi-modal Risk Stratification in Heart Failure with Preserved Ejection Fraction Using Clinical and CMR-derived Features: An Approach Incorporating Model Explainability.

Exploring ChatGPT's potential in diagnosing oral and maxillofacial pathologies: a study of 123 challenging cases.

Precision Diagnosis and Treatment Monitoring of Glioma via PET Radiomics.

Insights into a radiology-specialised multimodal large language model with sparse autoencoders

AortaDiff: Volume-Guided Conditional Diffusion Models for Multi-Branch Aortic Surface Generation

A conversational artificial intelligence based web application for medical conversations: a prototype for a chatbot

Predicting ADC map quality from T2-weighted MRI: A deep learning approach for early quality assessment to assist point-of-care.

Patient-Specific and Interpretable Deep Brain Stimulation Optimisation Using MRI and Clinical Review Data

Large Language Model-Based Entity Extraction Reliably Classifies Pancreatic Cysts and Reveals Predictors of Malignancy: A Cross-Sectional and Retrospective Cohort Study

Ready to Sharpen Your Edge?