Latest Papers on Radiology AI. Tags: None

Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.

Shi MJ, Wang ZX, Wang SK, Li XH, Zhang YL, Yan Y, An R, Dong LN, Qiu L, Tian T, Liu JX, Song HC, Wang YF, Deng C, Cao ZB, Wang HY, Wang Z, Wei W, Song J, Lu J, Wei X, Wang ZC

•papers•Jul 7 2025

Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous. To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons. Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients. This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.

MRI Report Generation Abdominal Retrospective Clinical In Silico Academic Lab GenAI

External Validation on a Japanese Cohort of a Computer-Aided Diagnosis System Aimed at Characterizing ISUP ≥ 2 Prostate Cancers at Multiparametric MRI.

Escande R, Jaouen T, Gonindard-Melodelima C, Crouzet S, Kuroda S, Souchon R, Rouvière O, Shoji S

•papers•Jul 7 2025

To evaluate the generalizability of a computer-aided diagnosis (CADx) system based on the apparent diffusion coefficient (ADC) and wash-in rate, and trained on a French population to diagnose International Society of Urological Pathology ≥ 2 prostate cancer on multiparametric MRI. Sixty-eight consecutive patients who underwent radical prostatectomy at a single Japanese institution were retrospectively included. Pre-prostatectomy MRIs were reviewed by an experienced radiologist who assigned to suspicious lesions a Prostate Imaging-Reporting and Data System version 2.1 (PI-RADSv2.1) score and delineated them. The CADx score was computed from these regions-of-interest. Using prostatectomy whole-mounts as reference, the CADx and PI-RADSv2.1 scores were compared at the lesion level using areas under the receiver operating characteristic curves (AUC), and sensitivities and specificities obtained with predefined thresholds. In PZ, AUCs were 80% (95% confidence interval [95% CI]: 71-90) for the CADx score and 80% (95% CI: 71-89; p = 0.886) for the PI-RADSv2.1score; in TZ, AUCs were 79% (95% CI: 66-90) for the CADx score and 93% (95% CI: 82-96; p = 0.051) for the PI-RADSv2.1 score. The CADx diagnostic thresholds that provided sensitivities of 86%-91% and specificities of 64%-75% in French test cohorts yielded sensitivities of 60% (95% CI: 38-83) in PZ and 42% (95% CI: 20-71) in TZ, with specificities of 95% (95% CI: 86-100) and 92% (95% CI: 73-100), respectively. This shift may be attributed to higher ADC values and lower dynamic contrast-enhanced temporal resolution in the test cohort. The CADx obtained good overall results in this external cohort. However, predefined diagnostic thresholds provided lower sensitivities and higher specificities than expected.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Impact of a computed tomography-based artificial intelligence software on radiologists' workflow for detecting acute intracranial hemorrhage.

Kim J, Jang J, Oh SW, Lee HY, Min EJ, Choi JW, Ahn KJ

•papers•Jul 7 2025

To assess the impact of a commercially available computed tomography (CT)-based artificial intelligence (AI) software for detecting acute intracranial hemorrhage (AIH) on radiologists' diagnostic performance and workflow in a real-world clinical setting. This retrospective study included a total of 956 non-contrast brain CT scans obtained over a 70-day period, interpreted independently by 2 board-certified general radiologists. Of these, 541 scans were interpreted during the initial 35 days before the implementation of AI software, and the remaining 415 scans were interpreted during the subsequent 35 days, with reference to AIH probability scores generated by the software. To assess the software's impact on radiologists' performance in detecting AIH, performance before and after implementation was compared. Additionally, to evaluate the software's effect on radiologists' workflow, Kendall's Tau was used to assess the correlation between the daily chronological order of CT scans and the radiologists' reading order before and after implementation. The early diagnosis rate for AIH (defined as the proportion of AIH cases read within the first quartile by radiologists) and the median reading order of AIH cases were also compared before and after implementation. A total of 956 initial CT scans from 956 patients [mean age: 63.14 ± 18.41 years; male patients: 447 (47%)] were included. There were no significant differences in accuracy [from 0.99 (95% confidence interval: 0.99-1.00) to 0.99 (0.98-1.00), P = 0.343], sensitivity [from 1.00 (0.99-1.00) to 1.00 (0.99-1.00), P = 0.859], or specificity [from 1.00 (0.99-1.00) to 0.99 (0.97-1.00), P = 0.252] following the implementation of the AI software. However, the daily correlation between the chronological order of CT scans and the radiologists' reading order significantly decreased [Kendall's Tau, from 0.61 (0.48-0.73) to 0.01 (0.00-0.26), P < 0.001]. Additionally, the early diagnosis rate significantly increased [from 0.49 (0.34-0.63) to 0.76 (0.60-0.93), P = 0.013], and the daily median reading order of AIH cases significantly decreased [from 7.25 (Q1-Q3: 3-10.75) to 1.5 (1-3), P < 0.001] after the implementation. After the implementation of CT-based AI software for detecting AIH, the radiologists' daily reading order was considerably reprioritized to allow more rapid interpretation of AIH cases without compromising diagnostic performance in a real-world clinical setting. With the increasing number of CT scans and the growing burden on radiologists, optimizing the workflow for diagnosing AIH through CT-based AI software integration may enhance the prompt and efficient treatment of patients with AIH.

CT Detection Neurological Retrospective Clinical Clinical Pilot Academic Lab

Evaluation of AI-based detection of incidental pulmonary emboli in cardiac CT angiography scans.

Brin D, Gilat EK, Raskin D, Goitein O

•papers•Jul 7 2025

Incidental pulmonary embolism (PE) is detected in 1% of cardiac CT angiography (CCTA) scans, despite the targeted aortic opacification and limited field of view. While artificial intelligence (AI) algorithms have proven effective in detecting PE in CT pulmonary angiography (CTPA), their use in CCTA remains unexplored. This study aimed to evaluate the feasibility of an AI algorithm for detecting incidental PE in CCTA scans. A dedicated AI algorithm was retrospectively applied to CCTA scans to detect PE. Radiology reports were reviewed using a natural language processing (NLP) tool to detect mentions of PE. Discrepancies between the AI and radiology reports triggered a blinded review by a cardiothoracic radiologist. All scans identified as positive for PE were thoroughly assessed for radiographic features, including the location of emboli and right ventricular (RV) strain. The performance of the AI algorithm for PE detection was compared to the original radiology report. Between 2021 and 2023, 1534 CCTA scans were analyzed. The AI algorithm identified 27 positive PE scans, with a subsequent review confirming PE in 22/27 cases. Of these, 10 (45.5%) were missed in the initial radiology report, all involving segmental or subsegmental arteries (P < 0.05) with no evidence of RV strain. This study demonstrates the feasibility of using an AI algorithm to detect incidental PE in CCTA scans. A notable radiology report miss rate (45.5%) of segmental and subsegmental emboli was documented. While these findings emphasize the potential value of AI for PE detection in the daily radiology workflow, further research is needed to fully determine its clinical impact.

CT Detection Cardiac Retrospective Clinical In Silico Academic Lab

Potential Time and Recall Benefits for Adaptive AI-Based Breast Cancer MRI Screening.

Balkenende L, Ferm J, van Veldhuizen V, Brunekreef J, Teuwen J, Mann RM

•papers•Jul 7 2025

Abbreviated breast MRI protocols are advocated for breast screening as they limit acquisition duration and increase resource availability. However, radiologists' specificity may be slightly lowered when only such short protocols are evaluated. An adaptive approach, where a full protocol is performed only when abnormalities are detected by artificial intelligence (AI)-based models in the abbreviated protocol, might improve and speed up MRI screening. This study explores the potential benefits of such an approach. To assess the potential impact of adaptive breast MRI scanning based on AI detection of malignancies. Mathematical model. Breast cancer screening protocols. Theoretical upper and lower limits on expected protocol duration and recall rate were determined for the adaptive approach, and the influence of the AI model and radiologists' performance metrics on these limits was assessed, under the assumption that any finding on the abbreviated protocol would, in an ideal follow-up scenario, prompt a second MRI with the full protocol. Estimated most likely scenario. Theoretical limits for the proposed adaptive AI-based MRI breast cancer screening showed that the recall rates of the abbreviated and full screening protocols always constrained the recall rate. These abbreviated and full protocols did not fully constrain the expected protocol duration, and an adaptive protocol's expected duration could thus be shorter than the abbreviated protocol duration. Specificity, either from AI models or radiologists, has the largest effect on the theoretical limits. In the most likely scenario, the adaptive protocol achieved an expected protocol duration reduction of ~47%-60% compared with the full protocol. The proposed adaptive approach may offer a reduction in expected protocol duration compared with the use of the full protocol alone, and a lower recall rate relative to an abbreviated-only approach could be achieved. Optimal performance was observed when AI models emulated radiologists' decision-making behavior, rather than focusing solely on near-perfect malignancy detection. Not applicable. Stage 6.

MRI Triage Breast Methodology Concept Academic Lab

Multi-Stage Cascaded Deep Learning-Based Model for Acute Aortic Syndrome Detection: A Multisite Validation Study.

Chang J, Lee KJ, Wang TH, Chen CM

•papers•Jul 7 2025

Background: Acute Aortic Syndrome (AAS), encompassing aortic dissection (AD), intramural hematoma (IMH), and penetrating atherosclerotic ulcer (PAU), presents diagnostic challenges due to its varied manifestations and the critical need for rapid assessment. Methods: We developed a multi-stage deep learning model trained on chest computed tomography angiography (CTA) scans. The model utilizes a U-Net architecture for aortic segmentation, followed by a cascaded classification approach for detecting AD and IMH, and a multiscale CNN for identifying PAU. External validation was conducted on 260 anonymized CTA scans from 14 U.S. clinical sites, encompassing data from four different CT manufacturers. Performance metrics, including sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), were calculated with 95% confidence intervals (CIs) using Wilson's method. Model performance was compared against predefined benchmarks. Results: The model achieved a sensitivity of 0.94 (95% CI: 0.88-0.97), specificity of 0.93 (95% CI: 0.89-0.97), and an AUC of 0.96 (95% CI: 0.94-0.98) for overall AAS detection, with p-values < 0.001 when compared to the 0.80 benchmark. Subgroup analyses demonstrated consistent performance across different patient demographics, CT manufacturers, slice thicknesses, and anatomical locations. Conclusions: This deep learning model effectively detects the full spectrum of AAS across diverse populations and imaging platforms, suggesting its potential utility in clinical settings to enable faster triage and expedite patient management.

CT Detection Vascular Retrospective Clinical In Silico Academic Lab

Automated Deep Learning-Based 3D-to-2D Segmentation of Geographic Atrophy in Optical Coherence Tomography Data

Al-khersan, H., Oakley, J. D., Russakoff, D. B., Cao, J. A., Saju, S. M., Zhou, A., Sodhi, S. K., Pattathil, N., Choudhry, N., Boyer, D. S., Wykoff, C. C.

•preprint•Jul 7 2025

PurposeWe report on a deep learning-based approach to the segmentation of geographic atrophy (GA) in patients with advanced age-related macular degeneration (AMD). MethodThree-dimensional (3D) optical coherence tomography (OCT) data was collected from two instruments at two different retina practices. This totaled 367 and 348 volumes, respectively, of routinely collected clinical data. For all data, the accuracy of a 3D-to-2D segmentation model was assessed relative to ground-truth manual labeling. ResultsDice Similarity Scores (DSC) averaged 0.824 and 0.826 for each data set. Correlations (r2) between manual and automated areas were 0.883 and 0.906, respectively. The inclusion of near Infra-red imagery as an additional information channel to the algorithm did not notably improve performance. ConclusionAccurate assessment of GA in real-world clinical OCT data can be achieved using deep learning. In the advent of therapeutics to slow the rate of GA progression, reliable, automated assessment is a clinical objective and this work validates one such method.

OCT Segmentation Retrospective Clinical In Silico

Development and International Validation of a Deep Learning Model for Predicting Acute Pancreatitis Severity from CT Scans

Xu, Y., Teutsch, B., Zeng, W., Hu, Y., Rastogi, S., Hu, E. Y., DeGregorio, I. M., Fung, C. W., Richter, B. I., Cummings, R., Goldberg, J. E., Mathieu, E., Appiah Asare, B., Hegedus, P., Gurza, K.-B., Szabo, I. V., Tarjan, H., Szentesi, A., Borbely, R., Molnar, D., Faluhelyi, N., Vincze, A., Marta, K., Hegyi, P., Lei, Q., Gonda, T., Huang, C., Shen, Y.

•preprint•Jul 7 2025

Background and aimsAcute pancreatitis (AP) is a common gastrointestinal disease with rising global incidence. While most cases are mild, severe AP (SAP) carries high mortality. Early and accurate severity prediction is crucial for optimal management. However, existing severity prediction models, such as BISAP and mCTSI, have modest accuracy and often rely on data unavailable at admission. This study proposes a deep learning (DL) model to predict AP severity using abdominal contrast-enhanced CT (CECT) scans acquired within 24 hours of admission. MethodsWe collected 10,130 studies from 8,335 patients across a multi-site U.S. health system. The model was trained in two stages: (1) self-supervised pretraining on large-scale unlabeled CT studies and (2) fine-tuning on 550 labeled studies. Performance was evaluated against mCTSI and BISAP on a hold-out internal test set (n=100 patients) and externally validated on a Hungarian AP registry (n=518 patients). ResultsOn the internal test set, the model achieved AUROCs of 0.888 (95% CI: 0.800-0.960) for SAP and 0.888 (95% CI: 0.819-0.946) for mild AP (MAP), outperforming mCTSI (p = 0.002). External validation showed robust AUROCs of 0.887 (95% CI: 0.825-0.941) for SAP and 0.858 (95% CI: 0.826-0.888) for MAP, surpassing mCTSI (p = 0.024) and BISAP (p = 0.002). Retrospective simulation suggested the models potential to support admission triage and serve as a second reader during CECT interpretation. ConclusionsThe proposed DL model outperformed standard scoring systems for AP severity prediction, generalized well to external data, and shows promise for providing early clinical decision support and improving resource allocation.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays.

Aljuaid H, Albalahad H, Alshuaibi W, Almutairi S, Aljohani TH, Hussain N, Mohammad F

•papers•Jul 7 2025

Background: Chest X-rays are rapidly gaining prominence as a prevalent diagnostic tool, as recognized by the World Health Organization (WHO). However, interpreting chest X-rays can be demanding and time-consuming, even for experienced radiologists, leading to potential misinterpretations and delays in treatment. Method: The purpose of this research is the development of a RadAI model. The RadAI model can accurately detect four types of lung abnormalities in chest X-rays and generate a report on each identified abnormality. Moreover, deep learning algorithms, particularly convolutional neural networks (CNNs), have demonstrated remarkable potential in automating medical image analysis, including chest X-rays. This work addresses the challenge of chest X-ray interpretation by fine tuning the following three advanced deep learning models: Feature-selective and Spatial Receptive Fields Network (FSRFNet50), ResNext50, and ResNet50. These models are compared based on accuracy, precision, recall, and F1-score. Results: The outstanding performance of RadAI shows its potential to assist radiologists to interpret the detected chest abnormalities accurately. Conclusions: RadAI is beneficial in enhancing the accuracy and efficiency of chest X-ray interpretation, ultimately supporting the timely and reliable diagnosis of lung abnormalities.

X-Ray Classification Chest Methodology In Silico

Deep-Learning-Assisted Highly-Accurate COVID-19 Diagnosis on Lung Computed Tomography Images

Yinuo Wang, Juhyun Bae, Ka Ho Chow, Shenyang Chen, Shreyash Gupta

•preprint•Jul 6 2025

COVID-19 is a severe and acute viral disease that can cause symptoms consistent with pneumonia in which inflammation is caused in the alveolous regions of the lungs leading to a build-up of fluid and breathing difficulties. Thus, the diagnosis of COVID using CT scans has been effective in assisting with RT-PCR diagnosis and severity classifications. In this paper, we proposed a new data quality control pipeline to refine the quality of CT images based on GAN and sliding windows. Also, we use class-sensitive cost functions including Label Distribution Aware Loss(LDAM Loss) and Class-balanced(CB) Loss to solve the long-tail problem existing in datasets. Our model reaches more than 0.983 MCC in the benchmark test dataset.

CT Classification Chest Methodology In Silico

Filter Papers

Tags

Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.

External Validation on a Japanese Cohort of a Computer-Aided Diagnosis System Aimed at Characterizing ISUP ≥ 2 Prostate Cancers at Multiparametric MRI.

Impact of a computed tomography-based artificial intelligence software on radiologists' workflow for detecting acute intracranial hemorrhage.

Evaluation of AI-based detection of incidental pulmonary emboli in cardiac CT angiography scans.

Potential Time and Recall Benefits for Adaptive AI-Based Breast Cancer MRI Screening.

Multi-Stage Cascaded Deep Learning-Based Model for Acute Aortic Syndrome Detection: A Multisite Validation Study.

Automated Deep Learning-Based 3D-to-2D Segmentation of Geographic Atrophy in Optical Coherence Tomography Data

Development and International Validation of a Deep Learning Model for Predicting Acute Pancreatitis Severity from CT Scans

RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays.

Deep-Learning-Assisted Highly-Accurate COVID-19 Diagnosis on Lung Computed Tomography Images

Ready to Sharpen Your Edge?