LLMs like ChatGPT-4o and AmbossGPT can accurately classify bone fractures in CT radiology reports, aiding radiologists.
Key Details
- 1Study assessed four LLMs (ChatGPT-4o, AmbossGPT, Claude 3.5 Sonnet, Gemini 2.0 Flash) on 292 artificial CT reports representing 310 fractures.
- 2ChatGPT-4o and AmbossGPT showed highest overall classification accuracy (74.6% and 74.3%).
- 3Bone recognition rates were high for all models (90%-99%), but fracture subtype classification was lower (71%-77%).
- 4Statistically significant accuracy differences were noted between LLMs by fracture type and anatomical location.
- 5Validation with real-world reports (145 fractures) using LLaMA 3.3-70B yielded similar results to artificial datasets (~70% performance).
- 6Authors note need for further validation on large, multi-center real-world datasets.
Why It Matters
Radiology practices rely heavily on textual reporting, and automating fracture classification could streamline radiological workflows, reduce variability, and improve efficiency. While current LLMs show promise, further validation is necessary before widespread adoption.

Source
AuntMinnie
Related News

•Health Imaging
AI as Second Reader Surpasses Radiologists in Breast Cancer Screening
AI used as a second reader on mammograms improves cancer detection rates compared to radiologists alone.

•Health Imaging
AI-Powered Ultrasound Tool Predicts Delivery Timing for Pregnant Patients
Researchers have created an AI model using ultrasound to accurately forecast expectant mothers’ delivery timelines.

•AuntMinnie
ChatGPT-4 Turbo Powers Postdeployment Monitoring of ICH Detection AI
Researchers found ChatGPT-4 Turbo could efficiently monitor the performance of Aidoc's ICH detection AI across real-world radiology practices.