Latest Papers on Radiology AI. Tags: None

Deep learning models for deriving optimised measures of fat and muscle mass from MRI.

Thomas B, Ali MA, Ali FMH, Chung A, Joshi M, Maiguma-Wilson S, Reiff G, Said H, Zalmay P, Berks M, Blackledge MD, O'Connor JPB

•papers•Jul 17 2025

Fat and muscle mass are potential biomarkers of wellbeing and disease in oncology, but clinical measurement methods vary considerably. Here we evaluate the accuracy, precision and ability to track change for multiple deep learning (DL) models that quantify fat and muscle mass from abdominal MRI. Specifically, subcutaneous fat (SF), intra-abdominal fat (VF), external muscle (EM) and psoas muscle (PM) were evaluated using 15 convolutional neural network (CNN)-based and 4 transformer-based deep learning model architectures. There was negligible difference in the accuracy of human observers and all deep learning models in delineating SF or EM. Both of these tissues had excellent repeatability of their delineation. VF was measured most accurately by the human observers, then by CNN-based models, which outperformed transformer-based models. In distinction, PM delineation accuracy and repeatability was poor for all assessments. Repeatability limits of agreement determined when changes measured in individual patients were due to real change rather than test-retest variation. In summary, DL model accuracy and precision of delineating fat and muscle volumes varies between CNN-based and transformer-based models, between different tissues and in some cases with gender. These factors should be considered when investigators deploy deep learning methods to estimate biomarkers of fat and muscle mass.

MRI Segmentation Abdominal Methodology In Silico Academic Lab

A conversational artificial intelligence based web application for medical conversations: a prototype for a chatbot

Pires, J. G.

•preprint•Jul 17 2025

BackgroundArtificial Intelligence (AI) has evolved through various trends, with different subfields gaining prominence over time. Currently, Conversational Artificial Intelligence (CAI)--particularly Generative AI--is at the forefront. CAI models are primarily focused on text-based tasks and are commonly deployed as chatbots. Recent advancements by OpenAI have enabled the integration of external, independently developed models, allowing chatbots to perform specialized, task-oriented functions beyond general language processing. ObjectiveThis study aims to develop a smart chatbot that integrates large language models (LLMs) from OpenAI with specialized domain-specific models, such as those used in medical image diagnostics. The system leverages transfer learning via Googles Teachable Machine to construct image-based classifiers and incorporates a diabetes detection model developed in TensorFlow.js. A key innovation is the chatbots ability to extract relevant parameters from user input, trigger the appropriate diagnostic model, interpret the output, and deliver responses in natural language. The overarching goal is to demonstrate the potential of combining LLMs with external models to build multimodal, task-oriented conversational agents. MethodsTwo image-based models were developed and integrated into the chatbot system. The first analyzes chest X-rays to detect viral and bacterial pneumonia. The second uses optical coherence tomography (OCT) images to identify ocular conditions such as drusen, choroidal neovascularization (CNV), and diabetic macular edema (DME). Both models were incorporated into the chatbot to enable image-based medical query handling. In addition, a text-based model was constructed to process physiological measurements for diabetes prediction using TensorFlow.js. The architecture is modular: new diagnostic models can be added without redesigning the chatbot, enabling straightforward functional expansion. ResultsThe findings demonstrate effective integration between the chatbot and the diagnostic models, with only minor deviations from expected behavior. Additionally, a stub function was implemented within the chatbot to schedule medical appointments based on the severity of a patients condition, and it was specifically tested with the OCT and X-ray models. ConclusionsThis study demonstrates the feasibility of developing advanced AI systems--including image-based diagnostic models and chatbot integration--by leveraging Artificial Intelligence as a Service (AIaaS). It also underscores the potential of AI to enhance user experiences in bioinformatics, paving the way for more intuitive and accessible interfaces in the field. Looking ahead, the modular nature of the chatbot allows for the integration of additional diagnostic models as the system evolves.

Mixed Modality Classification Chest Methodology Prototype Academic Lab GenAI

Predicting ADC map quality from T2-weighted MRI: A deep learning approach for early quality assessment to assist point-of-care.

Brender JR, Ota M, Nguyen N, Ford JW, Kishimoto S, Harmon SA, Wood BJ, Pinto PA, Krishna MC, Choyke PL, Turkbey B

•papers•Jul 17 2025

Poor quality prostate MRI images compromise diagnostic accuracy, with diffusion-weighted imaging and the resulting apparent diffusion coefficient (ADC) maps being particularly vulnerable. These maps are critical for prostate cancer diagnosis, yet current methods relying on standardizing technical parameters fail to consistently ensure image quality. We propose a novel deep learning approach to predict low-quality ADC maps using T2-weighted (T2W) images, enabling real-time corrective interventions during imaging. A multi-site dataset of T2W images and ADC maps from 486 patients, spanning 62 external clinics and in-house imaging, was retrospectively analyzed. A neural network was trained to classify ADC map quality as "diagnostic" or "non-diagnostic" based solely on T2W images. Rectal cross-sectional area measurements were evaluated as an interpretable metric for susceptibility-induced distortions. Analysis revealed limited correlation between individual acquisition parameters and image quality, with horizontal phase encoding significant for T2 imaging (p < 0.001, AUC = 0.6735) and vertical resolution for ADC maps (p = 0.006, AUC = 0.6348). By contrast, the neural network achieved robust performance for ADC map quality prediction from T2 images, with 83 % sensitivity and 90 % negative predictive value in multicenter validation, comparable to single-site models using ADC maps directly. Remarkably, it generalized well to unseen in-house data (94 ± 2 % accuracy). Rectal cross-sectional area correlated with ADC quality (AUC = 0.65), offering a simple, interpretable metric. The probability of low quality, uninterpretable ADC maps can be inferred early in the imaging process by a neural network approach, allowing corrective action to be employed.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Patient-Specific and Interpretable Deep Brain Stimulation Optimisation Using MRI and Clinical Review Data

Mikroulis, A., Lasica, A., Filip, P., Bakstein, E., Novak, D.

•preprint•Jul 17 2025

BackgroundOptimisation of Deep Brain Stimulation (DBS) settings is a key aspect in achieving clinical efficacy in movement disorders, such as the Parkinsons disease. Modern techniques attempt to solve the problem through data-intensive statistical and machine learning approaches, adding significant overhead to the existing clinical workflows. Here, we present an optimisation approach for DBS electrode contact and current selection, grounded in routinely collected MRI data, well-established tools (Lead-DBS) and, optionally, clinical review records. MethodsThe pipeline, packaged in a cross-platform tool, uses lead reconstruction data and simulation of volume of tissue activated to estimate the contacts in optimal position relative to the target structure, and suggest optimal stimulation current. The tool then allows further interactive user optimisation of the current settings. Existing electrode contact evaluations can be optionally included in the calculation process for further fine-tuning and adverse effect avoidance. ResultsBased on a sample of 177 implanted electrode reconstructions from 89 Parkinsons disease patients, we demonstrate that DBS parameter setting by our algorithm is more effective in covering the target structure (Wilcoxon p<6e-12, Hedges g>0.34) and minimising electric field leakage to neighbouring regions (p<2e-15, g>0.84) compared to expert parameter settings. ConclusionThe proposed automated method, for optimisation of the DBS electrode contact and current selection shows promising results and is readily applicable to existing clinical workflows. We demonstrate that the algorithmically selected contacts perform better than manual selections according to electric field calculations, allowing for a comparable clinical outcome without the iterative optimisation procedure.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab

Integrative radiomics of intra- and peri-tumoral features for enhanced risk prediction in thymic tumors: a multimodal analysis of tumor microenvironment contributions.

Zhu L, Li J, Wang X, He Y, Li S, He S, Deng B

•papers•Jul 17 2025

This study aims to explore the role of intra- and peri-tumoral radiomics features in tumor risk prediction, with a particular focus on the impact of peri-tumoral characteristics on the tumor microenvironment. A total of 133 patients, including 128 with thymomas and 5 with thymic carcinomas, were ultimately enrolled in this study. Based on the high- and low-risk classification, the cohort was divided into a training set (n = 93) and a testing set (n = 40) for subsequent analysis.Based on imaging data from these 133 patients, multiple radiomics prediction models integrating intra-tumoral and peritumoral features were developed. The data were sourced from patients treated at the Affiliated Hospital of Guangdong Medical University between 2015 and 2023, with all imaging obtained through preoperative CT scans. Radiomics feature extraction involved three primary categories: first-order features, shape features, and high-order features. Initially, the tumor's region of interest (ROI) was manually delineated using ITK-SNAP software. A custom Python algorithm was then used to automatically expand the peri-tumoral area, extracting features within 1 mm, 2 mm, and 3 mm zones surrounding the tumor. Additionally, considering the multimodal nature of the imaging data, image fusion techniques were incorporated to further enhance the model's ability to capture the tumor microenvironment. To build the radiomics models, selected features were first standardized using z-scores. Initial feature selection was performed using a t-test (p < 0.05), followed by Spearman correlation analysis to remove redundancy by retaining only one feature from each pair with a correlation coefficient ≥ 0.90. Subsequently, hierarchical clustering and the LASSO algorithm were applied to identify the most predictive features. These selected features were then used to train machine learning models, which were optimized on the training dataset and assessed for predictive performance. To further evaluate the effectiveness of these models, various statistical methods were applied, including DeLong's test, NRI, and IDI, to compare predictive differences among models. Decision curve analysis (DCA) was also conducted to assess the clinical applicability of the models. The results indicate that the IntraPeri1mm model performed the best, achieving an AUC of 0.837, with sensitivity and specificity at 0.846 and 0.84, respectively, significantly outperforming other models. SHAP value analysis identified several key features, such as peri_log_sigma_2_0_mm 3D_firstorder RootMeanSquared and intra_wavelet_LLL_firstorder Skewness, which made substantial contributions to the model's predictive accuracy. NRI and IDI analyses further confirmed the model's superior clinical applicability, and the DCA curve demonstrated robust performance across different thresholds. DeLong's test highlighted the statistical significance of the IntraPeri1mm model, underscoring its potential utility in radiomics research. Overall, this study provides a new perspective on tumor risk assessment, highlighting the importance of peri-tumoral features in the analysis of the tumor microenvironment. It aims to offer valuable insights for the development of personalized treatment plans. Not applicable.

CT Classification Chest Retrospective Clinical In Silico Academic Lab

Myocardial Native T1 Mapping in the German National Cohort (NAKO): Associations with Age, Sex, and Cardiometabolic Risk Factors

Ammann, C., Gröschel, J., Saad, H., Rospleszcz, S., Schuppert, C., Hadler, T., Hickstein, R., Niendorf, T., Nolde, J. M., Schulze, M. B., Greiser, K. H., Decker, J. A., Kröncke, T., Küstner, T., Nikolaou, K., Willich, S. N., Keil, T., Dörr, M., Bülow, R., Bamberg, F., Pischon, T., Schlett, C. L., Schulz-Menger, J.

•preprint•Jul 17 2025

Background and AimsIn cardiovascular magnetic resonance (CMR), myocardial native T1 mapping enables quantitative, non-invasive tissue characterization and is sensitive to subclinical changes in myocardial structure and composition. We investigated how age, sex, and cardiometabolic risk factors are associated with myocardial T1 in a population-based analysis within the German National Cohort (NAKO). MethodsThis cross-sectional study included 29,573 prospectively enrolled participants who underwent CMR-based midventricular T1 mapping at 3.0 T, alongside clinical phenotyping. After artificial intelligence-assisted myocardial segmentation, a subset of 9,162 outliers was subjected to manual quality control according to clinical evaluation standards. Associations with cardiometabolic risk factors, identified through self-reported medical history, clinical chemistry, and blood pressure measurements, were evaluated using adjusted linear regression models. ResultsWomen had higher T1 values than men, with sex differences progressively declining with age. T1 was significantly elevated in individuals with diabetes ({beta}=3.91 ms; p<0.001), kidney disease ({beta}=3.44 ms; p<0.001), and current smoking ({beta}=6.67 ms; p<0.001). Conversely, hyperlipidaemia was significantly associated with lower T1 ({beta}=-4.41 ms; p<0.001). Associations with hypertension showed a sex-specific pattern: T1 was lower in women but increased with hypertension severity in men. ConclusionsMyocardial native T1 varies by sex and age and shows associations with major cardiometabolic risk factors. Notably, lower T1 times in participants with hyperlipidaemia may indicate a direct effect of blood lipids on the heart. Our findings support the utility of T1 mapping as a sensitive marker of early myocardial changes and highlight the sex-specific interplay between cardiometabolic health and myocardial tissue composition. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=139 SRC="FIGDIR/small/25331651v1_ufig1.gif" ALT="Figure 1"> View larger version (44K): [email protected]@131514borg.highwire.dtl.DTLVardef@d03877org.highwire.dtl.DTLVardef@2b2fec_HPS_FORMAT_FIGEXP M_FIG C_FIG Key QuestionHow are age, sex, and cardiometabolic risk factors associated with myocardial native T1, a quantitative magnetic resonance imaging marker of myocardial tissue composition, in a large-scale population-based evaluation within the German National Cohort (NAKO)? Key FindingT1 relaxation times were higher in women and gradually converged between sexes with age. Diabetes, kidney disease, smoking, and hypertension in men were associated with prolonged T1 times. Unexpectedly, hyperlipidaemia and hypertension in women showed a negative association with T1. Take-Home MessageNative T1 mapping is sensitive to subclinical myocardial changes and reflects a close interplay between metabolic and myocardial health. It reveals marked age-dependent sex differences and sex-specific responses in myocardial tissue composition to cardiometabolic risk factors.

MRI Segmentation Cardiac Retrospective Clinical In Silico Consortium

Exploring ChatGPT's potential in diagnosing oral and maxillofacial pathologies: a study of 123 challenging cases.

Tassoker M

•papers•Jul 17 2025

This study aimed to evaluate the diagnostic performance of ChatGPT-4o, a large language model developed by OpenAI, in challenging cases of oral and maxillofacial diseases presented in the Clinicopathologic Conference section of the journal Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology. A total of 123 diagnostically challenging oral and maxillofacial cases published in the aforementioned journal were retrospectively collected. The case presentations, which included detailed clinical, radiographic, and sometimes histopathologic descriptions, were input into ChatGPT-4o. The model was prompted to provide a single most likely diagnosis for each case. These outputs were then compared to the final diagnoses established by expert consensus in each original case report. The accuracy of ChatGPT-4o was calculated based on exact diagnostic matches. ChatGPT-4o correctly diagnosed 96 out of 123 cases, achieving an overall diagnostic accuracy of 78%. Nevertheless, even in cases where the exact diagnosis was not provided, the model often suggested one of the clinically reasonable differential diagnoses. ChatGPT-4o demonstrates a promising ability to assist in the diagnostic process of complex maxillofacial conditions, with a relatively high accuracy rate in challenging cases. While it is not a replacement for expert clinical judgment, large language models may offer valuable decision support in oral and maxillofacial radiology, particularly in educational or consultative contexts. Not applicable.

X-Ray Classification Retrospective Clinical In Silico Big Tech GenAI

Multi-modal Risk Stratification in Heart Failure with Preserved Ejection Fraction Using Clinical and CMR-derived Features: An Approach Incorporating Model Explainability.

Zhang S, Lin Y, Han D, Pan Y, Geng T, Ge H, Zhao J

•papers•Jul 17 2025

Heart failure with preserved ejection fraction (HFpEF) poses significant diagnostic and prognostic challenges due to its clinical heterogeneity. This study proposes a multi-modal, explainable machine learning framework that integrates clinical variables and cardiac magnetic resonance (CMR)-derived features, particularly epicardial adipose tissue (EAT) volume, to improve risk stratification and outcome prediction in patients with HFpEF. A retrospective cohort of 301 participants (171 in the HFpEF group and 130 in the control group) was analyzed. Baseline characteristics, CMR-derived EAT volume, and laboratory biomarkers were integrated into machine learning models. Model performance was evaluated using accuracy, precision, recall, and F1-score. Additionally, receiver operating characteristic area under the curve (ROC-AUC) and precision-recall area under the curve (PR-AUC) were employed to assess discriminative power across varying decision thresholds. Hyperparameter optimization and ensemble techniques were applied to enhance predictive performance. HFpEF patients exhibited significantly higher EAT volume (70.9±27.3 vs. 41.9±18.3 mL, p<0.001) and NT-proBNP levels (1574 [963,2722] vs. 33 [10,100] pg/mL, p<0.001), along with a greater prevalence of comorbidities. The voting classifier demonstrated the highest accuracy for HFpEF diagnosis (0.94), with a precision of 0.96, recall of 0.94, and an F1-score of 0.95. For prognostic tasks, AdaBoost, XGBoost and Random Forest yielded superior performance in predicting adverse clinical outcomes, including rehospitalization and all-cause mortality (accuracy: 0.95). Key predictive features identified included EAT volume, right atrioventricular groove (Right AVG), tricuspid regurgitation velocity (TRV), and metabolic syndrome. Explainable models combining clinical and CMR-derived features, especially EAT volume, improve support for HFpEF diagnosis and outcome prediction. These findings highlight the value of a data-driven, interpretable approach to characterizing HFpEF phenotypes and may facilitate individualized risk assessment in selected populations.

MRI Classification Cardiac Retrospective Clinical In Silico GenAI

Large Language Model-Based Entity Extraction Reliably Classifies Pancreatic Cysts and Reveals Predictors of Malignancy: A Cross-Sectional and Retrospective Cohort Study

Papale, A. J., Flattau, R., Vithlani, N., Mahajan, D., Ziemba, Y., Zavadsky, T., Carvino, A., King, D., Nadella, S.

•preprint•Jul 17 2025

Pancreatic cystic lesions (PCLs) are often discovered incidentally on imaging and may progress to pancreatic ductal adenocarcinoma (PDAC). PCLs have a high incidence in the general population, and adherence to screening guidelines can be variable. With the advent of technologies that enable automated text classification, we sought to evaluate various natural language processing (NLP) tools including large language models (LLMs) for identifying and classifying PCLs from radiology reports. We correlated our classification of PCLs to clinical features to identify risk factors for a positive PDAC biopsy. We contrasted a previously described NLP classifier to LLMs for prospective identification of PCLs in radiology. We evaluated various LLMs for PCL classification into low-risk or high-risk categories based on published guidelines. We compared prompt-based PCL classification to specific entity-guided PCL classification. To this end, we developed tools to deidentify radiology and track patients longitudinally based on their radiology reports. Additionally, we used our newly developed tools to evaluate a retrospective database of patients who underwent pancreas biopsy to determine associated factors including those in their radiology reports and clinical features using multivariable logistic regression modelling. Of 14,574 prospective radiology reports, 665 (4.6%) described a pancreatic cyst, including 175 (1.2%) high-risk lesions. Our Entity-Extraction Large Language Model tool achieved recall 0.992 (95% confidence interval [CI], 0.985-0.998), precision 0.988 (0.979-0.996), and F1-score 0.990 (0.985-0.995) for detecting cysts; F1-scores were 0.993 (0.987-0.998) for low-risk and 0.977 (0.952-0.995) for high-risk classification. Among 4,285 biopsy patients, 330 had pancreatic cysts documented [≥]6 months before biopsy. In the final multivariable model (AUC = 0.877), independent predictors of adenocarcinoma were change in duct caliber with upstream atrophy (adjusted odds ratio [AOR], 4.94; 95% CI, 1.30-18.79), mural nodules (AOR, 11.02; 1.81-67.26), older age (AOR, 1.10; 1.05-1.16), lower body mass index (AOR, 0.86; 0.76-0.96), and total bilirubin (AOR, 1.81; 1.18-2.77). Automated NLP-based analysis of radiology reports using LLM-driven entity extraction can accurately identify and risk-stratify PCLs and, when retrospectively applied, reveal factors predicting malignant progression. Widespread implementation may improve surveillance and enable earlier intervention.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Opportunistic computed tomography (CT) assessment of osteoporosis in patients undergoing transcatheter aortic valve replacement (TAVR).

Paukovitsch M, Fechner T, Felbel D, Moerike J, Rottbauer W, Klömpken S, Brunner H, Kloth C, Beer M, Sekuboyina A, Buckert D, Kirschke JS, Sollmann N

•papers•Jul 17 2025

CT-based opportunistic screening using artificial intelligence finds a high prevalence (43%) of osteoporosis in CT scans obtained for planning of transcatheter aortic valve replacement. Thus, opportunistic screening may be a cost-effective way to assess osteoporosis in high-risk populations. Osteoporosis is an underdiagnosed condition associated with fractures and frailty, but may be detected in routine computed tomography (CT) scans. Volumetric bone mineral density (vBMD) was measured in clinical routine thoraco-abdominal CT scans of 207 patients for planning of transcatheter aortic valve replacement (TAVR) using an artificial intelligence (AI)-based algorithm. 43% of patients had osteoporosis (vBMD < 80 mg/cm3 L1-L3) and were elderly (83.0 {interquartile range [IQR]: 78.0-85.5} vs. 79.0 {IQR: 71.8-84.0} years, p < 0.001), more often female (55.1 vs. 28.8%, p < 0.001), and had a higher Society of Thoracic Surgeon's score for mortality (3.0 {IQR:1.8-4.6} vs. 2.1 {IQR: 1.4-3.2}%, p < 0.001). In addition to lumbar vBMD (58.2 ± 14.7 vs. 106 ± 21.4 mg/cm3, p < 0.001), thoracic vBMD (79.5 ± 17.9 vs. 127.4 ± 26.0 mg/cm3, p < 0.001) was also significantly reduced in these patients and showed high diagnostic accuracy for osteoporosis assessment (area under curve: 0.96, p < 0.001). Osteoporotic patients were significantly more often at risk for falls (40.4 vs. 22.9%, p = 0.007) and required help in activities of daily life (ADL) more frequently (48.3 vs. 33.1%, p = 0.026), while direct-to-home discharges were fewer (88.8 vs. 96.6%, p = 0.026). In-hospital bleeding complications (3.4 vs. 5.1%), stroke (1.1 vs. 2.5%), and death (1.1 vs. 0.8%) were equally low, while in-hospital device success was equally high (94.4 vs. 94.9%, p > 0.05 for all comparisons). However, one-year probability of survival was significantly lower (84.0 vs. 98.2%, log-rank p < 0.01). Applying an AI-based algorithm to TAVR planning CT scans can reveal a high rate of 43% patients having osteoporosis. Osteoporosis may represent a marker related to frailty and worsened outcome in TAVR patients.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Deep learning models for deriving optimised measures of fat and muscle mass from MRI.

A conversational artificial intelligence based web application for medical conversations: a prototype for a chatbot

Predicting ADC map quality from T2-weighted MRI: A deep learning approach for early quality assessment to assist point-of-care.

Patient-Specific and Interpretable Deep Brain Stimulation Optimisation Using MRI and Clinical Review Data

Integrative radiomics of intra- and peri-tumoral features for enhanced risk prediction in thymic tumors: a multimodal analysis of tumor microenvironment contributions.

Myocardial Native T1 Mapping in the German National Cohort (NAKO): Associations with Age, Sex, and Cardiometabolic Risk Factors

Exploring ChatGPT's potential in diagnosing oral and maxillofacial pathologies: a study of 123 challenging cases.

Multi-modal Risk Stratification in Heart Failure with Preserved Ejection Fraction Using Clinical and CMR-derived Features: An Approach Incorporating Model Explainability.

Large Language Model-Based Entity Extraction Reliably Classifies Pancreatic Cysts and Reveals Predictors of Malignancy: A Cross-Sectional and Retrospective Cohort Study

Opportunistic computed tomography (CT) assessment of osteoporosis in patients undergoing transcatheter aortic valve replacement (TAVR).

Ready to Sharpen Your Edge?