Latest Papers on Radiology AI. Category: papers, Sources: pubmed, Order: Best Match, Limit: 10.

Significance of Papillary and Trabecular Muscular Volume in Right Ventricular Volumetry with Cardiac MR Imaging.

Shibagaki Y, Oka H, Imanishi R, Shimada S, Nakau K, Takahashi S

•papers•Jun 20 2025

Pulmonary valve regurgitation after repaired Tetralogy of Fallot (TOF) or double-outlet right ventricle (DORV) causes hypertrophy and papillary muscle enlargement. Cardiac magnetic resonance imaging (CMR) can evaluate the right ventricular (RV) dilatation, but the effect of trabecular and papillary muscle (TPM) exclusion on RV volume for TOF or DORV reoperation decision is unclear. Twenty-three patients with repaired TOF or DORV, and 19 healthy controls aged ≥15, underwent CMR from 2012 to 2022. TPM volume is measured by artificial intelligence. Reoperation was considered when RV end-diastolic volume index (RVEDVI) >150 mL/m2 or RV end-systolic volume index (RVESVI) >80 mL/m2. RV volumes were higher in the disease group than controls (P α 0.001). RV mass and TPM volumes were higher in the disease group (P α 0.001). The reduction rate of RV volumes due to the exclusion of TPM volume was 6.3% (2.1-10.5), 11.7% (6.9-13.8), and 13.9% (9.5-19.4) in the control, volume load, and volume α pressure load groups, respectively. TPM/RV volumes were higher in the volume α pressure load group (control: 0.07 g/mL, volume: 0.14 g/mL, volume α pressure: 0.17 g/mL), and correlated with QRS duration (R α 0.77). In 3 patients in the volume α pressure, RV volume included TPM was indicated for reoperation, but when RV volume was reduced by TPM removal, reoperation was no indicated. RV volume measurements, including TPM in volume α pressure load, may help determine appropriate volume recommendations for reoperation.

MRI Segmentation Cardiac Retrospective Clinical In Silico None Academic Lab

Research hotspots and development trends in molecular imaging of glioma (2014-2024): A bibliometric review.

Zhou H, Luo Y, Li S, Zhang G, Zeng X

•papers•Jun 20 2025

This study aims to explore research hotspots and development trends in molecular imaging of glioma from 2014 to 2024. A total of 2957 publications indexed in the web of science core collection (WoSCC) were analyzed using bibliometric techniques. To visualize the research landscape, co-citation clustering, keyword analysis, and technological trend mapping were performed using CiteSpace and Excel. Publication output peaked in 2021. Emerging research trends included the integration of radiomics and artificial intelligence and the application of novel imaging modalities such as positron emission tomography and magnetic resonance spectroscopy. Significant progress was observed in blood-brain barrier disruption techniques and the development of molecular probes, especially those targeting IDH and MGMT mutations. Molecular imaging has been pivotal in advancing glioma research, contributing to improved diagnostic accuracy and personalized treatment strategies. However, challenges such as clinical translation and standardization remain. Future studies should focus on integrating advanced technologies into routine clinical practice to enhance patient care.

Mixed Modality Classification Neurological Review Concept Academic Lab GenAI

Evaluating ChatGPT's performance across radiology subspecialties: A meta-analysis of board-style examination accuracy and variability.

Nguyen D, Kim GHJ, Bedayat A

•papers•Jun 20 2025

Large language models (LLMs) like ChatGPT are increasingly used in medicine due to their ability to synthesize information and support clinical decision-making. While prior research has evaluated ChatGPT's performance on medical board exams, limited data exist on radiology-specific exams especially considering prompt strategies and input modalities. This meta-analysis reviews ChatGPT's performance on radiology board-style questions, assessing accuracy across radiology subspecialties, prompt engineering methods, GPT model versions, and input modalities. Searches in PubMed and SCOPUS identified 163 articles, of which 16 met inclusion criteria after excluding irrelevant topics and non-board exam evaluations. Data extracted included subspecialty topics, accuracy, question count, GPT model, input modality, prompting strategies, and access dates. Statistical analyses included two-proportion z-tests, a binomial generalized linear model (GLM), and meta-regression with random effects (Stata v18.0, R v4.3.1). Across 7024 questions, overall accuracy was 58.83 % (95 % CI, 55.53-62.13). Performance varied widely by subspecialty, highest in emergency radiology (73.00 %) and lowest in musculoskeletal radiology (49.24 %). GPT-4 and GPT-4o significantly outperformed GPT-3.5 (p < .001), but visual inputs yielded lower accuracy (46.52 %) compared to textual inputs (67.10 %, p < .001). Prompting strategies showed significant improvement (p < .01) with basic prompts (66.23 %) compared to no prompts (59.70 %). A modest but significant decline in performance over time was also observed (p < .001). ChatGPT demonstrates promising but inconsistent performance in radiology board-style questions. Limitations in visual reasoning, heterogeneity across studies, and prompt engineering variability highlight areas requiring targeted optimization.

LLM Radiology Report Other Meta Analysis In Silico None Academic Lab GenAI

Concordance between single-slice abdominal computed tomography-based and bioelectrical impedance-based analysis of body composition in a prospective study.

Fehrenbach U, Hosse C, Wienbrandt W, Walter-Rittel T, Kolck J, Auer TA, Blüthner E, Tacke F, Beetz NL, Geisel D

•papers•Jun 19 2025

Body composition analysis (BCA) is a recognized indicator of patient frailty. Apart from the established bioelectrical impedance analysis (BIA), computed tomography (CT)-derived BCA is being increasingly explored. The aim of this prospective study was to directly compare BCA obtained from BIA and CT. A total of 210 consecutive patients scheduled for CT, including a high proportion of cancer patients, were prospectively enrolled. Immediately prior to the CT scan, all patients underwent BIA. CT-based BCA was performed using a single-slice AI tool for automated detection and segmentation at the level of the third lumbar vertebra (L3). BIA-based parameters, body fat mass (BFMBIA) and skeletal muscle mass (SMMBIA), CT-based parameters, subcutaneous and visceral adipose tissue area (SATACT and VATACT) and total abdominal muscle area (TAMACT) were determined. Indices were calculated by normalizing the BIA and CT parameters to patient's weight (body fat percentage (BFPBIA) and body fat index (BFICT)) or height (skeletal muscle index (SMIBIA) and lumbar skeletal muscle index (LSMICT)). Parameters representing fat, BFMBIA and SATACT + VATACT, and parameters representing muscle tissue, SMMBIA and TAMACT, showed strong correlations in female (fat: r = 0.95; muscle: r = 0.72; p < 0.001) and male (fat: r = 0.91; muscle: r = 0.71; p < 0.001) patients. Linear regression analysis was statistically significant (fat: R2 = 0.73 (female) and 0.74 (male); muscle: R2 = 0.56 (female) and 0.56 (male); p < 0.001), showing that BFICT and LSMICT allowed prediction of BFPBIA and SMIBIA for both sexes. CT-based BCA strongly correlates with BIA results and yields quantitative results for BFP and SMI comparable to the existing gold standard. Question CT-based body composition analysis (BCA) is moving more and more into clinical focus, but validation against established methods is lacking. Findings Fully automated CT-based BCA correlates very strongly with guideline-accepted bioelectrical impedance analysis (BIA). Clinical relevance BCA is currently moving further into clinical focus to improve assessment of patient frailty and individualize therapies accordingly. Comparability with established BIA strengthens the value of CT-based BCA and supports its translation into clinical routine.

CT Segmentation Abdominal Prospective Clinical Pilot None Academic Lab

Optimized YOLOv8 for enhanced breast tumor segmentation in ultrasound imaging.

Mostafa AM, Alaerjan AS, Aldughayfiq B, Allahem H, Mahmoud AA, Said W, Shabana H, Ezz M

•papers•Jun 19 2025

Breast cancer significantly affects people's health globally, making early and accurate diagnosis vital. While ultrasound imaging is safe and non-invasive, its manual interpretation is subjective. This study explores machine learning (ML) techniques to improve breast ultrasound image segmentation, comparing models trained on combined versus separate classes of benign and malignant tumors. The YOLOv8 object detection algorithm is applied to the image segmentation task, aiming to capitalize on its robust feature detection capabilities. We utilized a dataset of 780 ultrasound images categorized into benign and malignant classes to train several deep learning (DL) models: UNet, UNet with DenseNet-121, VGG16, VGG19, and an adapted YOLOv8. These models were evaluated in two experimental setups-training on a combined dataset and training on separate datasets for benign and malignant classes. Performance metrics such as Dice Coefficient, Intersection over Union (IoU), and mean Average Precision (mAP) were used to assess model effectiveness. The study demonstrated substantial improvements in model performance when trained on separate classes, with the UNet model's F1-score increasing from 77.80 to 84.09% and Dice Coefficient from 75.58 to 81.17%, and the adapted YOLOv8 model achieving an F1-score improvement from 93.44 to 95.29% and Dice Coefficient from 82.10 to 84.40%. These results highlight the advantage of specialized model training and the potential of using advanced object detection algorithms for segmentation tasks. This research underscores the significant potential of using specialized training strategies and innovative model adaptations in medical imaging segmentation, ultimately contributing to better patient outcomes.

Ultrasound Segmentation Breast Methodology In Silico None Academic Lab Benchmark SOTA

Artificial Intelligence Language Models to Translate Professional Radiology Mammography Reports Into Plain Language - Impact on Interpretability and Perception by Patients.

Pisarcik D, Kissling M, Heimer J, Farkas M, Leo C, Kubik-Huch RA, Euler A

•papers•Jun 19 2025

This study aimed to evaluate the interpretability and patient perception of AI-translated mammography and sonography reports, focusing on comprehensibility, follow-up recommendations, and conveyed empathy using a survey. In this observational study, three fictional mammography and sonography reports with BI-RADS categories 3, 4, and 5 were created. These reports were repeatedly translated to plain language by three different large language models (LLM: ChatGPT-4, ChatGPT-4o, Google Gemini). In a first step, the best of these repeatedly translated reports for each BI-RADS category and LLM was selected by two experts in breast imaging considering factual correctness, completeness, and quality. In a second step, female participants compared and rated the translated reports regarding comprehensibility, follow-up recommendations, conveyed empathy, and additional value of each report using a survey with Likert scales. Statistical analysis included cumulative link mixed models and the Plackett-Luce model for ranking preferences. 40 females participated in the survey. GPT-4 and GPT-4o were rated significantly higher than Gemini across all categories (P<.001). Participants >50 years of age rated the reports significantly higher as compared to participants of 18-29 years of age (P<.05). Higher education predicted lower ratings (P=.02). No prior mammography increased scores (P=.03), and AI-experience had no effect (P=.88). Ranking analysis showed GPT-4o as the most preferred (P=.48), followed by GPT-4 (P=.37), with Gemini ranked last (P=.15). Patient preference differed among AI-translated radiology reports. Compared to a traditional report using radiological language, AI-translated reports add value for patients, enhance comprehensibility and empathy and therefore hold the potential to improve patient communication in breast imaging.

Mammography LLM Radiology Report Breast Retrospective Clinical In Silico None Academic Lab GenAI Ethics

A fusion-based deep-learning algorithm predicts PDAC metastasis based on primary tumour CT images: a multinational study.

Xue N, Sabroso-Lasa S, Merino X, Munzo-Beltran M, Schuurmans M, Olano M, Estudillo L, Ledesma-Carbayo MJ, Liu J, Fan R, Hermans JJ, van Eijck C, Malats N

•papers•Jun 19 2025

Diagnosing the presence of metastasis of pancreatic cancer is pivotal for patient management and treatment, with contrast-enhanced CT scans (CECT) as the cornerstone of diagnostic evaluation. However, this diagnostic modality requires a multifaceted approach. To develop a convolutional neural network (CNN)-based model (PMPD, Pancreatic cancer Metastasis Prediction Deep-learning algorithm) to predict the presence of metastases based on CECT images of the primary tumour. CECT images in the portal venous phase of 335 patients with pancreatic ductal adenocarcinoma (PDAC) from the PanGenEU study and The First Affiliated Hospital of Zhengzhou University (ZZU) were randomly divided into training and internal validation sets by applying fivefold cross-validation. Two independent external validation datasets of 143 patients from the Radboud University Medical Center (RUMC), included in the PANCAIM study (RUMC-PANCAIM) and 183 patients from the PREOPANC trial of the Dutch Pancreatic Cancer Group (PREOPANC-DPCG) were used to evaluate the results. The area under the receiver operating characteristic curve (AUROC) for the internally tested model was 0.895 (0.853-0.937) and 0.779 (0.741-0.817) in the PanGenEU and ZZU sets, respectively. In the external validation sets, the mean AUROC was 0.806 (0.787-0.826) for the RUMC-PANCAIM and 0.761 (0.717-0.804) for the PREOPANC-DPCG. When stratified by the different metastasis sites, the PMPD model achieved the average AUROC between 0.901-0.927 in PanGenEU, 0.782-0.807 in ZZU and 0.761-0.820 in PREOPANC-DPCG sets. A PMPD-derived Metastasis Risk Score (MRS) (HR: 2.77, 95% CI 1.99 to 3.86, p=1.59e-09) outperformed the Resectability status from the National Comprehensive Cancer Network guideline and the CA19-9 biomarker in predicting overall survival. Meanwhile, the MRS could potentially predict developed metastasis (AUROC: 0.716 for within 3 months, 0.645 for within 6 months). This study represents a pioneering utilisation of a high-performance deep-learning model to predict extrapancreatic organ metastasis in patients with PDAC.

CT Classification Abdominal Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Deep learning detects retropharyngeal edema on MRI in patients with acute neck infections.

Rainio O, Huhtanen H, Vierula JP, Nurminen J, Heikkinen J, Nyman M, Klén R, Hirvonen J

•papers•Jun 19 2025

In acute neck infections, magnetic resonance imaging (MRI) shows retropharyngeal edema (RPE), which is a prognostic imaging biomarker for a severe course of illness. This study aimed to develop a deep learning-based algorithm for the automated detection of RPE. We developed a deep neural network consisting of two parts using axial T2-weighted water-only Dixon MRI images from 479 patients with acute neck infections annotated by radiologists at both slice and patient levels. First, a convolutional neural network (CNN) classified individual slices; second, an algorithm classified patients based on a stack of slices. Model performance was compared with the radiologists' assessment as a reference standard. Accuracy, sensitivity, specificity, and area under receiver operating characteristic curve (AUROC) were calculated. The proposed CNN was compared with InceptionV3, and the patient-level classification algorithm was compared with traditional machine learning models. Of the 479 patients, 244 (51%) were positive and 235 (49%) negative for RPE. Our model achieved accuracy, sensitivity, specificity, and AUROC of 94.6%, 83.3%, 96.2%, and 94.1% at the slice level, and 87.4%, 86.5%, 88.2%, and 94.8% at the patient level, respectively. The proposed CNN was faster than InceptionV3 but equally accurate. Our patient classification algorithm outperformed traditional machine learning models. A deep learning model, based on weakly annotated data and computationally manageable training, achieved high accuracy for automatically detecting RPE on MRI in patients with acute neck infections. Our automated method for detecting relevant MRI findings was efficiently trained and might be easily deployed in practice to study clinical applicability. This approach might improve early detection of patients at high risk for a severe course of acute neck infections. Deep learning automatically detected retropharyngeal edema on MRI in acute neck infections. Areas under the receiver operating characteristic curve were 94.1% at the slice level and 94.8% at the patient level. The proposed convolutional neural network was lightweight and required only weakly annotated data.

MRI Detection Other Retrospective Clinical In Silico None Academic Lab

Development and validation of an AI-driven radiomics model using non-enhanced CT for automated severity grading in chronic pancreatitis.

Chen C, Zhou J, Mo S, Li J, Fang X, Liu F, Wang T, Wang L, Lu J, Shao C, Bian Y

•papers•Jun 19 2025

To develop and validate the chronic pancreatitis CT severity model (CATS), an artificial intelligence (AI)-based tool leveraging automated 3D segmentation and radiomics analysis of non-enhanced CT scans for objective severity stratification in chronic pancreatitis (CP). This retrospective study encompassed patients with recurrent acute pancreatitis (RAP) and CP from June 2016 to May 2020. A 3D convolutional neural network segmented non-enhanced CT scans, extracting 1843 radiomic features to calculate the radiomics score (Rad-score). The CATS was formulated using multivariable logistic regression and validated in a subsequent cohort from June 2020 to April 2023. Overall, 2054 patients with RAP and CP were included in the training (n = 927), validation set (n = 616), and external test (n = 511) sets. CP grade I and II patients accounted for 300 (14.61%) and 1754 (85.39%), respectively. The Rad-score significantly correlated with the acinus-to-stroma ratio (p = 0.023; OR, -2.44). The CATS model demonstrated high discriminatory performance in differentiating CP severity grades, achieving an area under the curve (AUC) of 0.96 (95% CI: 0.94-0.98) and 0.88 (95% CI: 0.81-0.90) in the validation and test cohorts. CATS-predicted grades correlated with exocrine insufficiency (all p < 0.05) and showed significant prognostic differences (all p < 0.05). CATS outperformed radiologists in detecting calcifications, identifying all minute calcifications missed by radiologists. The CATS, developed using non-enhanced CT and AI, accurately predicts CP severity, reflects disease morphology, and forecasts short- to medium-term prognosis, offering a significant advancement in CP management. Question Existing CP severity assessments rely on semi-quantitative CT evaluations and multi-modality imaging, leading to inconsistency and inaccuracy in early diagnosis and prognosis prediction. Findings The AI-driven CATS model, using non-enhanced CT, achieved high accuracy in grading CP severity, and correlated with histopathological fibrosis markers. Clinical relevance CATS provides a cost-effective, widely accessible tool for precise CP severity stratification, enabling early intervention, personalized management, and improved outcomes without contrast agents or invasive biopsies.

CT Segmentation Abdominal Retrospective Clinical In Silico None Academic Lab

Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.

Wihl J, Rosenkranz E, Schramm S, Berberich C, Griessmair M, Woźnicki P, Pinto F, Ziegelmayer S, Adams LC, Bressem KK, Kirschke JS, Zimmer C, Wiestler B, Hedderich D, Kim SH

•papers•Jun 19 2025

To evaluate the impact of an annotation guideline on the performance of large language models (LLMs) in extracting data from stroke computed tomography (CT) reports. The performance of GPT-4o and Llama-3.3-70B in extracting ten imaging findings from stroke CT reports was assessed in two datasets from a single academic stroke center. Dataset A (n = 200) was a stratified cohort including various pathological findings, whereas dataset B (n = 100) was a consecutive cohort. Initially, an annotation guideline providing clear data extraction instructions was designed based on a review of cases with inter-annotator disagreements in dataset A. For each LLM, data extraction was performed under two conditions: with the annotation guideline included in the prompt and without it. GPT-4o consistently demonstrated superior performance over Llama-3.3-70B under identical conditions, with micro-averaged precision ranging from 0.83 to 0.95 for GPT-4o and from 0.65 to 0.86 for Llama-3.3-70B. Across both models and both datasets, incorporating the annotation guideline into the LLM input resulted in higher precision rates, while recall rates largely remained stable. In dataset B, the precision of GPT-4o and Llama-3-70B improved from 0.83 to 0.95 and from 0.87 to 0.94, respectively. Overall classification performance with and without the annotation guideline was significantly different in five out of six conditions. GPT-4o and Llama-3.3-70B show promising performance in extracting imaging findings from stroke CT reports, although GPT-4o steadily outperformed Llama-3.3-70B. We also provide evidence that well-defined annotation guidelines can enhance LLM data extraction accuracy. Annotation guidelines can improve the accuracy of LLMs in extracting findings from radiological reports, potentially optimizing data extraction for specific downstream applications. LLMs have utility in data extraction from radiology reports, but the role of annotation guidelines remains underexplored. Data extraction accuracy from stroke CT reports by GPT-4o and Llama-3.3-70B improved when well-defined annotation guidelines were incorporated into the model prompt. Well-defined annotation guidelines can improve the accuracy of LLMs in extracting imaging findings from radiological reports.

CT LLM Radiology Report Neurological Retrospective Clinical In Silico None Academic Lab GenAI

Significance of Papillary and Trabecular Muscular Volume in Right Ventricular Volumetry with Cardiac MR Imaging.

Research hotspots and development trends in molecular imaging of glioma (2014-2024): A bibliometric review.

Evaluating ChatGPT's performance across radiology subspecialties: A meta-analysis of board-style examination accuracy and variability.

Concordance between single-slice abdominal computed tomography-based and bioelectrical impedance-based analysis of body composition in a prospective study.

Optimized YOLOv8 for enhanced breast tumor segmentation in ultrasound imaging.

Artificial Intelligence Language Models to Translate Professional Radiology Mammography Reports Into Plain Language - Impact on Interpretability and Perception by Patients.

A fusion-based deep-learning algorithm predicts PDAC metastasis based on primary tumour CT images: a multinational study.

Deep learning detects retropharyngeal edema on MRI in patients with acute neck infections.

Development and validation of an AI-driven radiomics model using non-enhanced CT for automated severity grading in chronic pancreatitis.

Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.

Ready to Sharpen Your Edge?