Sort by:
Page 1 of 211 results
Next

Agentic AI in radiology: Emerging Potential and Unresolved Challenges.

Dietrich N

pubmed logopapersJul 24 2025
This commentary introduces agentic artificial intelligence (AI) as an emerging paradigm in radiology, marking a shift from passive, user-triggered tools to systems capable of autonomous workflow management, task planning, and clinical decision support. Agentic AI models may dynamically prioritize imaging studies, tailor recommendations based on patient history and scan context, and automate administrative follow-up tasks, offering potential gains in efficiency, triage accuracy, and cognitive support. While not yet widely implemented, early pilot studies and proof-of-concept applications highlight promising utility across high-volume and high-acuity settings. Key barriers, including limited clinical validation, evolving regulatory frameworks, and integration challenges, must be addressed to ensure safe, scalable deployment. Agentic AI represents a forward-looking evolution in radiology that warrants careful development and clinician-guided implementation.

Population-scale cross-sectional observational study for AI-powered TB screening on one million CXRs.

Munjal P, Mahrooqi AA, Rajan R, Jeremijenko A, Ahmad I, Akhtar MI, Pimentel MAF, Khan S

pubmed logopapersJul 9 2025
Traditional tuberculosis (TB) screening involves radiologists manually reviewing chest X-rays (CXR), which is time-consuming, error-prone, and limited by workforce shortages. Our AI model, AIRIS-TB (AI Radiology In Screening TB), aims to address these challenges by automating the reporting of all X-rays without any findings. AIRIS-TB was evaluated on over one million CXRs, achieving an AUC of 98.51% and overall false negative rate (FNR) of 1.57%, outperforming radiologists (1.85%) while maintaining a 0% TB-FNR. By selectively deferring only cases with findings to radiologists, the model has the potential to automate up to 80% of routine CXR reporting. Subgroup analysis revealed insignificant performance disparities across age, sex, HIV status, and region of origin, with sputum tests for suspected TB showing a strong correlation with model predictions. This large-scale validation demonstrates AIRIS-TB's safety and efficiency in high-volume TB screening programs, reducing radiologist workload without compromising diagnostic accuracy.

An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.

Patrick Decourcy Hallinan JT, Leow NW, Low YX, Lee A, Ong W, Zhou Chan MD, Devi GK, He SS, De-Liang Loh D, Wei Lim DS, Low XZ, Teo EC, Furqan SM, Yang Tham WW, Tan JH, Kumar N, Makmur A, Yonghan T

pubmed logopapersJul 8 2025
Privacy-preserving large language models (PP-LLMs) hold potential for assisting clinicians with documentation. We evaluated a PP-LLM to improve the clinical information on radiology request forms for musculoskeletal magnetic resonance imaging (MRI) and to automate protocoling, which ensures that the most appropriate imaging is performed. The present retrospective study included musculoskeletal MRI radiology request forms that had been randomly collected from June to December 2023. Studies without electronic medical record (EMR) entries were excluded. An institutional PP-LLM (Claude Sonnet 3.5) augmented the original radiology request forms by mining EMRs, and, in combination with rule-based processing of the LLM outputs, suggested appropriate protocols using institutional guidelines. Clinical information on the original and PP-LLM radiology request forms were compared with use of the RI-RADS (Reason for exam Imaging Reporting and Data System) grading by 2 musculoskeletal (MSK) radiologists independently (MSK1, with 13 years of experience, and MSK2, with 11 years of experience). These radiologists established a consensus reference standard for protocoling, against which the PP-LLM and of 2 second-year board-certified radiologists (RAD1 and RAD2) were compared. Inter-rater reliability was assessed with use of the Gwet AC1, and the percentage agreement with the reference standard was calculated. Overall, 500 musculoskeletal MRI radiology request forms were analyzed for 407 patients (202 women and 205 men with a mean age [and standard deviation] of 50.3 ± 19.5 years) across a range of anatomical regions, including the spine/pelvis (143 MRI scans; 28.6%), upper extremity (169 scans; 33.8%) and lower extremity (188 scans; 37.6%). Two hundred and twenty-two (44.4%) of the 500 MRI scans required contrast. The clinical information provided in the PP-LLM-augmented radiology request forms was rated as superior to that in the original requests. Only 0.4% to 0.6% of PP-LLM radiology request forms were rated as limited/deficient, compared with 12.4% to 22.6% of the original requests (p < 0.001). Almost-perfect inter-rater reliability was observed for LLM-enhanced requests (AC1 = 0.99; 95% confidence interval [CI], 0.99 to 1.0), compared with substantial agreement for the original forms (AC1 = 0.62; 95% CI, 0.56 to 0.67). For protocoling, MSK1 and MSK2 showed almost-perfect agreement on the region/coverage (AC1 = 0.96; 95% CI, 0.95 to 0.98) and contrast requirement (AC1 = 0.98; 95% CI, 0.97 to 0.99). Compared with the consensus reference standard, protocoling accuracy for the PP-LLM was 95.8% (95% CI, 94.0% to 97.6%), which was significantly higher than that for both RAD1 (88.6%; 95% CI, 85.8% to 91.4%) and RAD2 (88.2%; 95% CI, 85.4% to 91.0%) (p < 0.001 for both). Musculoskeletal MRI request form augmentation with an institutional LLM provided superior clinical information and improved protocoling accuracy compared with clinician requests and non-MSK-trained radiologists. Institutional adoption of such LLMs could enhance the appropriateness of MRI utilization and patient care. Diagnostic Level III. See Instructions for Authors for a complete description of levels of evidence.

Potential Time and Recall Benefits for Adaptive AI-Based Breast Cancer MRI Screening.

Balkenende L, Ferm J, van Veldhuizen V, Brunekreef J, Teuwen J, Mann RM

pubmed logopapersJul 7 2025
Abbreviated breast MRI protocols are advocated for breast screening as they limit acquisition duration and increase resource availability. However, radiologists' specificity may be slightly lowered when only such short protocols are evaluated. An adaptive approach, where a full protocol is performed only when abnormalities are detected by artificial intelligence (AI)-based models in the abbreviated protocol, might improve and speed up MRI screening. This study explores the potential benefits of such an approach. To assess the potential impact of adaptive breast MRI scanning based on AI detection of malignancies. Mathematical model. Breast cancer screening protocols. Theoretical upper and lower limits on expected protocol duration and recall rate were determined for the adaptive approach, and the influence of the AI model and radiologists' performance metrics on these limits was assessed, under the assumption that any finding on the abbreviated protocol would, in an ideal follow-up scenario, prompt a second MRI with the full protocol. Estimated most likely scenario. Theoretical limits for the proposed adaptive AI-based MRI breast cancer screening showed that the recall rates of the abbreviated and full screening protocols always constrained the recall rate. These abbreviated and full protocols did not fully constrain the expected protocol duration, and an adaptive protocol's expected duration could thus be shorter than the abbreviated protocol duration. Specificity, either from AI models or radiologists, has the largest effect on the theoretical limits. In the most likely scenario, the adaptive protocol achieved an expected protocol duration reduction of ~47%-60% compared with the full protocol. The proposed adaptive approach may offer a reduction in expected protocol duration compared with the use of the full protocol alone, and a lower recall rate relative to an abbreviated-only approach could be achieved. Optimal performance was observed when AI models emulated radiologists' decision-making behavior, rather than focusing solely on near-perfect malignancy detection. Not applicable. Stage 6.

Appropriateness of acute breast symptom recommendations provided by ChatGPT.

Byrd C, Kingsbury C, Niell B, Funaro K, Bhatt A, Weinfurtner RJ, Ataya D

pubmed logopapersJun 16 2025
We evaluated the accuracy of ChatGPT-3.5's responses to common questions regarding acute breast symptoms and explored whether using lay language, as opposed to medical language, affected the accuracy of the responses. Questions were formulated addressing acute breast conditions, informed by the American College of Radiology (ACR) Appropriateness Criteria (AC) and our clinical experience at a tertiary referral breast center. Of these, seven addressed the most common acute breast symptoms, nine addressed pregnancy-associated breast symptoms, and four addressed specific management and imaging recommendations for a palpable breast abnormality. Questions were submitted three times to ChatGPT-3.5 and all responses were assessed by five fellowship-trained breast radiologists. Evaluation criteria included clinical judgment and adherence to the ACR guidelines, with responses scored as: 1) "appropriate," 2) "inappropriate" if any response contained inappropriate information, or 3) "unreliable" if responses were inconsistent. A majority vote determined the appropriateness for each question. ChatGPT-3.5 generated responses were appropriate for 7/7 (100 %) questions regarding common acute breast symptoms when phrased both colloquially and using standard medical terminology. In contrast, ChatGPT-3.5 generated responses were appropriate for 3/9 (33 %) questions about pregnancy-associated breast symptoms and 3/4 (75 %) questions about management and imaging recommendations for a palpable breast abnormality. ChatGPT-3.5 can automate healthcare information related to appropriate management of acute breast symptoms when prompted with both standard medical terminology or lay phrasing of the questions. However, physician oversight remains critical given the presence of inappropriate recommendations for pregnancy associated breast symptoms and management of palpable abnormalities.

Adaptive Breast MRI Scanning Using AI.

Eskreis-Winkler S, Bhowmik A, Kelly LH, Lo Gullo R, D'Alessio D, Belen K, Hogan MP, Saphier NB, Sevilimedu V, Sung JS, Comstock CE, Sutton EJ, Pinker K

pubmed logopapersJun 1 2025
Background MRI protocols typically involve many imaging sequences and often require too much time. Purpose To simulate artificial intelligence (AI)-directed stratified scanning for screening breast MRI with various triage thresholds and evaluate its diagnostic performance against that of the full breast MRI protocol. Materials and Methods This retrospective reader study included consecutive contrast-enhanced screening breast MRI examinations performed between January 2013 and January 2019 at three regional cancer sites. In this simulation study, an in-house AI tool generated a suspicion score for subtraction maximum intensity projection images during a given MRI examination, and the score was used to determine whether to proceed with the full MRI protocol or end the examination early (abbreviated breast MRI [AB-MRI] protocol). Examinations with suspicion scores under the 50th percentile were read using both the AB-MRI protocol (ie, dynamic contrast-enhanced MRI scans only) and the full MRI protocol. Diagnostic performance metrics for screening with various AI triage thresholds were compared with those for screening without AI triage. Results Of 863 women (mean age, 52 years ± 10 [SD]; 1423 MRI examinations), 51 received a cancer diagnosis within 12 months of screening. The diagnostic performance metrics for AI-directed stratified scanning that triaged 50% of examinations to AB-MRI versus full MRI protocol scanning were as follows: sensitivity, 88.2% (45 of 51; 95% CI: 79.4, 97.1) versus 86.3% (44 of 51; 95% CI: 76.8, 95.7); specificity, 80.8% (1108 of 1372; 95% CI: 78.7, 82.8) versus 81.4% (1117 of 1372; 95% CI: 79.4, 83.5); positive predictive value 3 (ie, percent of biopsies yielding cancer), 23.6% (43 of 182; 95% CI: 17.5, 29.8) versus 24.7% (42 of 170; 95% CI: 18.2, 31.2); cancer detection rate (per 1000 examinations), 31.6 (95% CI: 22.5, 40.7) versus 30.9 (95% CI: 21.9, 39.9); and interval cancer rate (per 1000 examinations), 4.2 (95% CI: 0.9, 7.6) versus 4.9 (95% CI: 1.3, 8.6). Specificity decreased by no more than 2.7 percentage points with AI triage. There were no AI-triaged examinations for which conducting the full MRI protocol would have resulted in additional cancer detection. Conclusion AI-directed stratified MRI decreased simulated scan times while maintaining diagnostic performance. © RSNA, 2025 <i>Supplemental material is available for this article.</i> See also the editorial by Strand in this issue.

Using AI to triage patients without clinically significant prostate cancer using biparametric MRI and PSA.

Grabke EP, Heming CAM, Hadari A, Finelli A, Ghai S, Lajkosz K, Taati B, Haider MA

pubmed logopapersMay 30 2025
To train and evaluate the performance of a machine learning triaging tool that identifies MRI negative for clinically significant prostate cancer and to compare this against non-MRI models. 2895 MRIs were collected from two sources (1630 internal, 1265 public) in this retrospective study. Risk models compared were: Prostate Cancer Prevention Trial Risk Calculator 2.0, Prostate Biopsy Collaborative Group Calculator, PSA density, U-Net segmentation, and U-Net combined with clinical parameters. The reference standard was histopathology or negative follow-up. Performance metrics were calculated by simulating a triaging workflow compared to radiologist interpreting all exams on a test set of 465 patients. Sensitivity and specificity differences were assessed using the McNemar test. Differences in PPV and NPV were assessed using the Leisenring, Alonzo and Pepe generalized score statistic. Equivalence test p-values were adjusted within each measure using Benjamini-Hochberg correction. Triaging using U-Net with clinical parameters reduced radiologist workload by 12.5% with sensitivity decrease from 93 to 90% (p = 0.023) and specificity increase from 39 to 47% (p < 0.001). This simulated workload reduction was greater than triaging with risk calculators (3.2% and 1.3%, p < 0.001), and comparable to PSA density (8.4%, p = 0.071) and U-Net alone (11.6%, p = 0.762). Both U-Net triaging strategies increased PPV (+ 2.8% p = 0.005 clinical, + 2.2% p = 0.020 nonclinical), unlike non-U-Net strategies (p > 0.05). NPV remained equivalent for all scenarios (p > 0.05). Clinically-informed U-Net triaging correctly ruled out 20 (13.4%) radiologist false positives (12 PI-RADS = 3, 8 PI-RADS = 4). Of the eight (3.6%) false negatives, two were misclassified by the radiologist. No misclassified case was interpreted as PI-RADS 5. Prostate MRI triaging using machine learning could reduce radiologist workload by 12.5% with a 3% sensitivity decrease and 8% specificity increase, outperforming triaging using non-imaging-based risk models. Further prospective validation is required.

Phantom-Based Ultrasound-ECG Deep Learning Framework for Prospective Cardiac Computed Tomography.

Ganesh S, Lindsey BD, Tridandapani S, Bhatti PT

pubmed logopapersMay 30 2025
We present the first multimodal deep learning framework combining ultrasound (US) and electrocardiography (ECG) data to predict cardiac quiescent periods (QPs) for optimized computed tomography angiography gating (CTA). The framework integrates a 3D convolutional neural network (CNN) for US data and an artificial neural network (ANN) for ECG data. A dynamic heart motion phantom, replicating diverse cardiac conditions, including arrhythmias, was used to validate the framework. Performance was assessed across varying QP lengths, cardiac segments, and motions to simulate real-world conditions. The multimodal US-ECG 3D CNN-ANN framework demonstrated improved QP prediction accuracy compared to single-modality ECG-only gating, achieving 96.87% accuracy compared to 85.56%, including scenarios involving arrhythmic conditions. Notably, the framework shows higher accuracy for longer QP durations (100 ms - 200 ms) compared to shorter durations (<100ms), while still outperforming single-modality methods, which often fail to detect shorter quiescent phases, especially in arrhythmic cases. Consistently outperforming single-modality approaches, it achieves reliable QP prediction across cardiac regions, including the whole phantom, interventricular septum, and cardiac wall regions. Analysis of QP prediction accuracy across cardiac segments demonstrated an average accuracy of 92% in clinically relevant echocardiographic views, highlighting the framework's robustness. Combining US and ECG data using a multimodal framework improves QP prediction accuracy under variable cardiac motion, particularly in arrhythmic conditions. Since even small errors in cardiac CTA can result in non-diagnostic scans, the potential benefits of multimodal gating may improve diagnostic scan rates in patients with high and variable heart rates and arrhythmias.

Coronary Computed Tomographic Angiography to Optimize the Diagnostic Yield of Invasive Angiography for Low-Risk Patients Screened With Artificial Intelligence: Protocol for the CarDIA-AI Randomized Controlled Trial.

Petch J, Tabja Bortesi JP, Sheth T, Natarajan M, Pinilla-Echeverri N, Di S, Bangdiwala SI, Mosleh K, Ibrahim O, Bainey KR, Dobranowski J, Becerra MP, Sonier K, Schwalm JD

pubmed logopapersMay 21 2025
Invasive coronary angiography (ICA) is the gold standard in the diagnosis of coronary artery disease (CAD). Being invasive, it carries rare but serious risks including myocardial infarction, stroke, major bleeding, and death. A large proportion of elective outpatients undergoing ICA have nonobstructive CAD, highlighting the suboptimal use of this test. Coronary computed tomographic angiography (CCTA) is a noninvasive option that provides similar information with less risk and is recommended as a first-line test for patients with low-to-intermediate risk of CAD. Leveraging artificial intelligence (AI) to appropriately direct patients to ICA or CCTA based on the predicted probability of disease may improve the efficiency and safety of diagnostic pathways. he CarDIA-AI (Coronary computed tomographic angiography to optimize the Diagnostic yield of Invasive Angiography for low-risk patients screened with Artificial Intelligence) study aims to evaluate whether AI-based risk assessment for obstructive CAD implemented within a centralized triage process can optimize the use of ICA in outpatients referred for nonurgent ICA. CarDIA-AI is a pragmatic, open-label, superior randomized controlled trial involving 2 Canadian cardiac centers. A total of 252 adults referred for elective outpatient ICA will be randomized 1:1 to usual care (directly proceeding to ICA) or to triage using an AI-based decision support tool. The AI-based decision support tool was developed using referral information from over 37,000 patients and uses a light gradient boosting machine model to predict the probability of obstructive CAD based on 42 clinically relevant predictors, including patient referral information, demographic characteristics, risk factors, and medical history. Participants in the intervention arm will have their ICA referral forms and medical charts reviewed, and select details entered into the decision support tool, which recommends CCTA or ICA based on the patient's predicted probability of obstructive CAD. All patients will receive the selected imaging modality within 6 weeks of referral and will be subsequently followed for 90 days. The primary outcome is the proportion of normal or nonobstructive CAD diagnosed via ICA and will be assessed using a 2-sided z test to compare the patients referred for cardiac investigation with normal or nonobstructive CAD diagnosed through ICA between the intervention and control groups. Secondary outcomes include the number of angiograms avoided and the diagnostic yield of ICA. Recruitment began on January 9, 2025, and is expected to conclude in mid to late 2025. As of April 14, 2025, we have enrolled 81 participants. Data analysis will begin once data collection is completed. We expect to submit the results for publication in 2026. CarDIA-AI will be the first randomized controlled trial using AI to optimize patient selection for CCTA versus ICA, potentially improving diagnostic efficiency, avoiding unnecessary complications of ICA, and improving health care resource usage. ClinicalTrials.gov NCT06648239; https://clinicaltrials.gov/study/NCT06648239/. DERR1-10.2196/71726.

Systematic review on the impact of deep learning-driven worklist triage on radiology workflow and clinical outcomes.

Momin E, Cook T, Gershon G, Barr J, De Cecco CN, van Assen M

pubmed logopapersMay 21 2025
To perform a systematic review on the impact of deep learning (DL)-based triage for reducing diagnostic delays and improving patient outcomes in peer-reviewed and pre-print publications. A search was conducted of primary research studies focused on DL-based worklist optimization for diagnostic imaging triage published on multiple databases from January 2018 until July 2024. Extracted data included study design, dataset characteristics, workflow metrics including report turnaround time and time-to-treatment, and patient outcome differences. Further analysis between clinical settings and integration modality was investigated using nonparametric statistics. Risk of bias was assessed with the risk of bias in non-randomized studies-of interventions (ROBINS-I) checklist. A total of 38 studies from 20 publications, involving 138,423 images, were analyzed. Workflow interventions concerned pulmonary embolism (n = 8), stroke (n = 3), intracranial hemorrhage (n = 12), and chest conditions (n = 15). Patients in the post DL-triage group had shorter median report turnaround times: a mean difference of 12.3 min (IQR: -25.7, -7.6) for pulmonary embolism, 20.5 min (IQR: -32.1, -9.3) for stroke, 4.3 min (IQR: -8.6, 1.3) for intracranial hemorrhage and 29.7 min (IQR: -2947.7, -18.3) for chest diseases. Sub-group analysis revealed that reductions varied per clinical environment and relative prevalence rates but were the highest when algorithms actively stratified and reordered the radiological worklist, with reductions of -43.7% in report turnaround time compared to -7.6% from widget-based systems (p < 0.01). DL-based triage systems had comparable report turnaround time improvements, especially in outpatient and high-prevalence settings, suggesting that AI-based triage holds promise in alleviating radiology workloads. Question Can DL-based triage address lengthening imaging report turnaround times and improve patient outcomes across distinct clinical environments? Findings DL-based triage improved report turnaround time across disease groups, with higher reductions reported in high-prevalence or lower acuity settings. Clinical relevance DL-based workflow prioritization is a reliable tool for reducing diagnostic imaging delay for time-sensitive disease across clinical settings. However, further research and reliable metrics are needed to provide specific recommendations with regards to false-negative examinations and multi-condition prioritization.
Page 1 of 211 results
Show
per page
12»

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.