LLMs Demonstrate Potential for Fracture Classification in CT Radiology Reports

July 9, 2025

LLMs like ChatGPT-4o and AmbossGPT can accurately classify bone fractures in CT radiology reports, aiding radiologists.

Key Details

  • Study assessed four LLMs (ChatGPT-4o, AmbossGPT, Claude 3.5 Sonnet, Gemini 2.0 Flash) on 292 artificial CT reports representing 310 fractures.
  • ChatGPT-4o and AmbossGPT showed highest overall classification accuracy (74.6% and 74.3%).
  • Bone recognition rates were high for all models (90%-99%), but fracture subtype classification was lower (71%-77%).
  • Statistically significant accuracy differences were noted between LLMs by fracture type and anatomical location.
  • Validation with real-world reports (145 fractures) using LLaMA 3.3-70B yielded similar results to artificial datasets (~70% performance).
  • Authors note need for further validation on large, multi-center real-world datasets.

Why It Matters

Radiology practices rely heavily on textual reporting, and automating fracture classification could streamline radiological workflows, reduce variability, and improve efficiency. While current LLMs show promise, further validation is necessary before widespread adoption.

Read more

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.