Improving Emergency Department Efficiency with Large Language Model-Guided Orthopaedic Triage for Proximal Humerus Fractures.

April 7, 2026

papers

DOI: 10.1097/BOT.0000000000003175 PMID: 41944613

Authors

Zhao L,Bott E,Rao AS,Borgida JS,Brown S,Wagner RK,Harris MB,Ly TV,Succi MD

Affiliations (3)

Harvard Medical School, Boston, MA, USA.
Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Mass General Brigham, Boston, MA, USA.
Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA.

Abstract

To evaluate whether large language models (LLMs) can reduce consults for proximal humerus fractures that do not meet institutional consult criteria. Design: Retrospective review. Single-center Level 1 trauma center. Adults presenting to the emergency department (ED) with isolated proximal humerus fractures over a two-year period were included. Exclusion criteria were polytrauma, concomitant orthopaedic injuries, pathologic fractures, lack of in-house ED imaging, and fractures missed in the ED. Generative Pre-trained Transformer-4o (GPT-4o) and o4-mini were provided history of present illnesses, physical exams, and X-ray reports and asked whether orthopaedics consultation was indicated based on institutional criteria (open fracture, tenting skin, neurovascular compromise, or humeral head dislocation). A gold standard was determined by two independent authors who retrospectively reviewed each case and reached consensus on consult necessity based on these criteria. LLM alignment with this standard was compared with performance of real-world providers using generalized linear models. Consult wait time and work relative value unit (wRVU) savings were estimated using the cohort's average wait time and Current Procedural Terminology-based wRVUs for a 30-minute low-to-moderate complexity outpatient consult. Three-hundred fifteen patients (99 males and 216 females) were included (average age: 65.1 years, range: 20-100 years). Alignment with consult criteria was 92.4% (95% confidence interval (CI) [88.9%, 94.8%]) for GPT-4o, 94.9% (95% CI [91.9%, 96.9%]) for o4-mini, and 32.7% (95% CI [27.7%, 38.1%]) for ED providers. From a baseline of 240 consults, 327.3 wait hours, and 432 wRVUs, GPT-4o could have saved 179 consults, 295.3 wait hours, and 322.2 wRVUs over two years. o4-mini could have saved 183 consults, 302.0 wait hours, and 329.4 wRVUs. Large language models accurately identified uncomplicated proximal humerus fractures, potentially conserving unnecessary ED orthopaedic consults. III.

View Source Full Text PDF

Topics

Journal Article

Improving Emergency Department Efficiency with Large Language Model-Guided Orthopaedic Triage for Proximal Humerus Fractures.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?