Latest Papers on Radiology AI. Tags: Triage.

An economic scenario analysis of implementing artificial intelligence in BreastScreen Norway-Impact on radiologist person-years, costs and effects.

Moger TA, Nardin SB, Holen ÅS, Moshina N, Hofvind S

•papers•Sep 9 2025

ObjectiveTo study the implications of implementing artificial intelligence (AI) as a decision support tool in the Norwegian breast cancer screening program concerning cost-effectiveness and time savings for radiologists.MethodsIn a decision tree model using recent data from AI vendors and the Cancer Registry of Norway, and assuming equal effectiveness of radiologists plus AI compared to standard practice, we simulated costs, effects and radiologist person-years over the next 20 years under different scenarios: 1) Assuming a €1 additional running cost of AI instead of the €3 assumed in the base case, 2) varying the AI-score thresholds for single vs. double readings, 3) varying the consensus and recall rates, and 4) reductions in the interval cancer rate compared to standard practice.ResultsAI was unlikely to be cost-effective, even when only one radiologist was used alongside AI for all screening exams. This also applied when assuming a 10% reduction in the consensus and recall rates. However, there was a 30-50% reduction in the radiologists' screen-reading volume. Assuming an additional running cost of €1 for AI, the costs were comparable, with similar probabilities of cost-effectiveness for AI and standard practice. Assuming a 5% reduction in the interval cancer rate, AI proved to be cost-effective across all willingness-to-pay values.ConclusionsAI may be cost-effective if the interval cancer rate is reduced by 5% or more, or if its additional cost is €1 per screening exam. Despite a substantial reduction in screening volume, this remains modest relative to the total radiologist person-years available within breast centers, accounting for only 3-4% of person-years.

Mammography Triage Breast Retrospective Clinical In Silico Consortium Policy

Army Medic Performance in Trauma Sonography: The Impact of Artificial Intelligence Assistance in Focused Assessments With Sonography in Trauma-A Prospective Randomized Controlled Trial.

Hartline CPTAD, Hartvickson MAJS, Perdue CPTMJ, Sandoval CPTC, Walker LTCJD, Soules CPTA, Mitchell COLCA

•papers•Aug 31 2025

Noncompressible truncal hemorrhage is a leading cause of preventable death in military prehospital settings, particularly in combat environments where advanced imaging is unavailable. The Focused Assessment with Sonography in Trauma (FAST) exam is critical for diagnosing intra-abdominal bleeding. However, Army medics typically lack formal ultrasound training. This study examines whether artificial intelligence (AI) assistance can enhance medics' proficiency in performing FAST exams, thereby improving the speed and accuracy of trauma triage in austere conditions. This is a prospective, randomized controlled trial that involved 60 Army medics who performed 3-view abdominal FAST exams, both with and without AI assistance, using the EchoNous Kosmos device. Investigators randomized participants into 2 groups and evaluated based on time to completion, adequacy of imaging, and confidence in using the device. Two trained investigators assessed adequacy and the participants reported confidence in the device using a 5-point Likert scale. We then analyzed data using the t-test for parametric data, the Wilcoxon rank-sum test, and Cohen's Kappa test for interrater reliability. The AI-assisted group completed the FAST exam in an average of 142.57 seconds compared to 143.87 seconds (P = .9) for the non-AI-assisted group, demonstrating no statistically significant difference in time. However, the AI-assisted group demonstrated significantly higher adequacy in the left upper quadrant and pelvic views (P = .008 and P = .004, respectively). Participants reported significantly higher confidence in the AI-assisted group, with a median score of 4.00 versus 2.50 (P = .006). Interrater agreement was moderate to substantial, with Cohen's Kappa values indicating significant reliability. AI assistance did not significantly reduce the time required to complete a FAST exam but improved image adequacy and user confidence. These findings suggest that AI tools can enhance the quality of FAST exams conducted by minimally trained medics in combat settings. Further research is needed to explore integrating AI-assisted ultrasound training in military medic curricula to optimize trauma care in austere environments.

Ultrasound Triage Abdominal RCT Clinical Pilot

AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays

Xueyang Li, Mingze Jiang, Gelei Xu, Jun Xia, Mengzhao Jia, Danny Chen, Yiyu Shi

•preprint•Aug 26 2025

Agentic AI is advancing rapidly, yet truly autonomous medical-imaging triage, where a system decides when to stop, escalate, or defer under real constraints, remains relatively underexplored. To address this gap, we introduce AT-CXR, an uncertainty-aware agent for chest X-rays. The system estimates per-case confidence and distributional fit, then follows a stepwise policy to issue an automated decision or abstain with a suggested label for human intervention. We evaluate two router designs that share the same inputs and actions: a deterministic rule-based router and an LLM-decided router. Across five-fold evaluation on a balanced subset of NIH ChestX-ray14 dataset, both variants outperform strong zero-shot vision-language models and state-of-the-art supervised classifiers, achieving higher full-coverage accuracy and superior selective-prediction performance, evidenced by a lower area under the risk-coverage curve (AURC) and a lower error rate at high coverage, while operating with lower latency that meets practical clinical constraints. The two routers provide complementary operating points, enabling deployments to prioritize maximal throughput or maximal accuracy. Our code is available at https://github.com/XLIAaron/uncertainty-aware-cxr-agent.

X-Ray Triage Chest Methodology In Silico Academic Lab Open Code

Combined use of two artificial intelligence-based algorithms for mammography triaging: a retrospective simulation study.

Kim HJ, Kim HH, Eom HJ, Choi WJ, Chae EY, Shin HJ, Cha JH

•papers•Aug 21 2025

To evaluate triaging scenarios involving two commercial AI algorithms to enhance mammography interpretation and reduce workload. A total of 3012 screening or diagnostic mammograms, including 213 cancer cases, were analyzed using two AI algorithms (AI-1, AI-2) and categorized as "high-risk" (top 10%), "minimal-risk" (bottom 20%), or "indeterminate" based on malignancy likelihood. Five triaging scenarios of combined AI use (Sensitive, Specific, Conservative, Sequential Modes A and B) determined whether cases would be autonomously recalled, classified as negative, or referred for radiologist interpretation. Sensitivity, specificity, number of mammograms requiring review, and abnormal interpretation rate (AIR) were compared against single AIs and manual reading using McNemar's test. Sensitive Mode achieved 84% sensitivity, outperforming single AI (p = 0.03 [AI-1], 0.01 [AI-2]) and manual reading (p = 0.03), with an 18.3% reduction in mammograms requiring review (AIR, 23.3%). Specific Mode achieved 87.7% specificity, exceeding single AI (p < 0.001 [AI-1, AI-2]) and comparable to manual reading (p = 0.37), with a 41.7% reduction in mammograms requiring review (AIR, 17%). Conservative and Sequential Modes A and B achieved sensitivities of 82.2%, 80.8%, and 80.3%, respectively, comparable to single AI or manual reading (p > 0.05, all), with reductions of 9.8%, 49.8%, and 49.8% in mammograms requiring review (AIRs, 18.6%, 21.6%, 21.7%). Combining two AI algorithms improved sensitivity or specificity in mammography interpretation while reducing mammograms requiring radiologist review in this cancer-enriched dataset from a tertiary center. Scenario selection should consider clinical needs and requires validation in a screening population. Question AI algorithms have the potential to improve workflow efficiency by triaging mammograms. Combining algorithms trained under different conditions may offer synergistic benefits. Findings The combined use of two commercial AI algorithms for triaging mammograms improved sensitivity or specificity, depending on the scenario, while also reducing mammograms requiring radiologist review. Clinical relevance Integrating two commercial AI algorithms could enhance mammography interpretation over using a single AI for triaging or manual reading.

Mammography Triage Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

The use of artificial intelligence (AI) to safely reduce the workload of breast cancer screening: a retrospective simulation study.

Gialias P, Wiberg MK, Brehl AK, Bjerner T, Gustafsson H

•papers•Aug 17 2025

BackgroundArtificial intelligence (AI)-based systems have the potential to increase the efficiency and effectiveness of breast cancer screening programs but need to be carefully validated before clinical implementation.PurposeTo retrospectively evaluate an AI system to safely reduce the workload of a double-reading breast cancer screening program.Material and MethodsAll digital mammography (DM) screening examinations of women aged 40-74 years between August 2021 and January 2022 in Östergötland, Sweden were included. Analysis of the interval cancers (ICs) was performed in 2024. Each examination was double-read by two breast radiologists and processed by the AI system, which assigned a score of 1-10 to each examination based on increasing likelihood of cancer. In a retrospective simulation, the AI system was used for triaging; low-risk examinations (score 1-7) were selected for single reading and high-risk examinations (score 8-10) for double reading.ResultsA total of 15,468 DMs were included. Using an AI triaging strategy, 10,473 (67.7%) examinations received scores of 1-7, resulting in a 34% workload reduction. Overall, 52/53 screen-detected cancers were assigned a score of 8-10 by the AI system. One cancer was missed by the AI system (score 4) but was detected by the radiologists. In total, 11 cases of IC were found in the 2024 analysis.ConclusionReplacing one reader in breast cancer screening with an AI system for low-risk cases could safely reduce workload by 34%. In total, 11 cases of IC were found in the 2024 analysis; of them, three were identified correctly by the AI system at the 2021-2022 examination.

Mammography Triage Breast Retrospective Clinical In Silico Academic Lab

GPT-4 for automated sequence-level determination of MRI protocols based on radiology request forms from clinical routine.

Terzis R, Kaya K, Schömig T, Janssen JP, Iuga AI, Kottlors J, Lennartz S, Gietzen C, Gözdas C, Müller L, Hahnfeldt R, Maintz D, Dratsch T, Pennig L

•papers•Aug 8 2025

This study evaluated GPT-4's accuracy in MRI sequence selection based on radiology request forms (RRFs), comparing its performance to radiology residents. This retrospective study included 100 RRFs across four subspecialties (cardiac imaging, neuroradiology, musculoskeletal, and oncology). GPT-4 and two radiology residents (R1: 2 years, R2: 5 years MRI experience) selected sequences based on each patient's medical history and clinical questions. Considering imaging society guidelines, five board-certified specialized radiologists assessed protocols based on completeness, quality, and utility in consensus, using 5-point Likert scales. Clinical applicability was rated binarily by the institution's lead radiographer. GPT-4 achieved median scores of 3 (1-5) for completeness, 4 (1-5) for quality, and 4 (1-5) for utility, comparable to R1 (3 (1-5), 4 (1-5), 4 (1-5); each p > 0.05) but inferior to R2 (4 (1-5), 5 (1-5); p < 0.01, respectively, and 5 (1-5); p < 0.001). Subspecialty protocol quality varied: GPT-4 matched R1 (4 (2-4) vs. 4 (2-5), p = 0.20) and R2 (4 (2-5); p = 0.47) in cardiac imaging; showed no differences in neuroradiology (all 5 (1-5), p > 0.05); scored lower than R1 and R2 in musculoskeletal imaging (3 (2-5) vs. 4 (3-5); p < 0.01, and 5 (3-5); p < 0.001); and matched R1 (4 (1-5) vs. 2 (1-4), p = 0.12) as well as R2 (5 (2-5); p = 0.20) in oncology. GPT-4-based protocols were clinically applicable in 95% of cases, comparable to R1 (95%) and R2 (96%). GPT-4 generated MRI protocols with notable completeness, quality, utility, and clinical applicability, excelling in standardized subspecialties like cardiac and neuroradiology imaging while yielding lower accuracy in musculoskeletal examinations. Question Long MRI acquisition times limit patient access, making accurate protocol selection crucial for efficient diagnostics, though it's time-consuming and error-prone, especially for inexperienced residents. Findings GPT-4 generated MRI protocols of remarkable yet inconsistent quality, performing on par with an experienced resident in standardized fields, but moderately in musculoskeletal examinations. Clinical relevance The large language model can assist less experienced radiologists in determining detailed MRI protocols and counteract increasing workloads. The model could function as a semi-automatic tool, generating MRI protocols for radiologists' confirmation, optimizing resource allocation, and improving diagnostics and cost-effectiveness.

MRI Triage Retrospective Clinical In Silico Academic Lab GenAI

A Multimodal Deep Learning Ensemble Framework for Building a Spine Surgery Triage System.

Siavashpour M, McCabe E, Nataraj A, Pareek N, Zaiane O, Gross D

•papers•Aug 7 2025

Spinal radiology reports and physician-completed questionnaires serve as crucial resources for medical decision-making for patients experiencing low back and neck pain. However, due to the time-consuming nature of this process, individuals with severe conditions may experience a deterioration in their health before receiving professional care. In this work, we propose an ensemble framework built on top of pre-trained BERT-based models which can classify patients on their need for surgery given their different data modalities including radiology reports and questionnaires. Our results demonstrate that our approach exceeds previous studies, effectively integrating information from multiple data modalities and serving as a valuable tool to assist physicians in decision making.

Mixed Modality Triage Musculoskeletal Methodology In Silico

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.

Yao MS, Chae A, Saraiya P, Kahn CE, Witschey WR, Gee JC, Sagreiya H, Bastani O

•papers•Aug 4 2025

Diagnostic imaging studies are increasingly important in the management of acutely presenting patients. However, ordering appropriate imaging studies in the emergency department is a challenging task with a high degree of variability among healthcare providers. To address this issue, recent work has investigated whether generative AI and large language models can be leveraged to recommend diagnostic imaging studies in accordance with evidence-based medical guidelines. However, it remains challenging to ensure that these tools can provide recommendations that correctly align with medical guidelines, especially given the limited diagnostic information available in acute care settings. In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that align with the American College of Radiology's Appropriateness Criteria, a set of evidence-based guidelines. To power our experiments, we introduce RadCases, a dataset of over 1500 annotated case summaries reflecting common patient presentations, and apply our framework to enable state-of-the-art language models to reason about appropriate imaging choices. Using our framework, state-of-the-art language models achieve accuracy comparable to clinicians in ordering imaging studies. Furthermore, we demonstrate that our language model-based pipeline can be used as an intelligent assistant by clinicians to support image ordering workflows and improve the accuracy of acute image ordering according to the American College of Radiology's Appropriateness Criteria. Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision-making in alignment with expert evidence-based guidelines.

Triage Methodology In Silico Open Dataset GenAI

Agentic AI in radiology: Emerging Potential and Unresolved Challenges.

Dietrich N

•papers•Jul 24 2025

This commentary introduces agentic artificial intelligence (AI) as an emerging paradigm in radiology, marking a shift from passive, user-triggered tools to systems capable of autonomous workflow management, task planning, and clinical decision support. Agentic AI models may dynamically prioritize imaging studies, tailor recommendations based on patient history and scan context, and automate administrative follow-up tasks, offering potential gains in efficiency, triage accuracy, and cognitive support. While not yet widely implemented, early pilot studies and proof-of-concept applications highlight promising utility across high-volume and high-acuity settings. Key barriers, including limited clinical validation, evolving regulatory frameworks, and integration challenges, must be addressed to ensure safe, scalable deployment. Agentic AI represents a forward-looking evolution in radiology that warrants careful development and clinician-guided implementation.

Triage Review Concept Policy GenAI

Population-scale cross-sectional observational study for AI-powered TB screening on one million CXRs.

Munjal P, Mahrooqi AA, Rajan R, Jeremijenko A, Ahmad I, Akhtar MI, Pimentel MAF, Khan S

•papers•Jul 9 2025

Traditional tuberculosis (TB) screening involves radiologists manually reviewing chest X-rays (CXR), which is time-consuming, error-prone, and limited by workforce shortages. Our AI model, AIRIS-TB (AI Radiology In Screening TB), aims to address these challenges by automating the reporting of all X-rays without any findings. AIRIS-TB was evaluated on over one million CXRs, achieving an AUC of 98.51% and overall false negative rate (FNR) of 1.57%, outperforming radiologists (1.85%) while maintaining a 0% TB-FNR. By selectively deferring only cases with findings to radiologists, the model has the potential to automate up to 80% of routine CXR reporting. Subgroup analysis revealed insignificant performance disparities across age, sex, HIV status, and region of origin, with sputum tests for suspected TB showing a strong correlation with model predictions. This large-scale validation demonstrates AIRIS-TB's safety and efficiency in high-volume TB screening programs, reducing radiologist workload without compromising diagnostic accuracy.

X-Ray Triage Chest Retrospective Clinical Clinical Pilot Benchmark SOTA

Filter Papers

Tags