LLMs like ChatGPT-4o and AmbossGPT can accurately classify bone fractures in CT radiology reports, aiding radiologists.
Key Details
- 1Study assessed four LLMs (ChatGPT-4o, AmbossGPT, Claude 3.5 Sonnet, Gemini 2.0 Flash) on 292 artificial CT reports representing 310 fractures.
- 2ChatGPT-4o and AmbossGPT showed highest overall classification accuracy (74.6% and 74.3%).
- 3Bone recognition rates were high for all models (90%-99%), but fracture subtype classification was lower (71%-77%).
- 4Statistically significant accuracy differences were noted between LLMs by fracture type and anatomical location.
- 5Validation with real-world reports (145 fractures) using LLaMA 3.3-70B yielded similar results to artificial datasets (~70% performance).
- 6Authors note need for further validation on large, multi-center real-world datasets.
Why It Matters

Source
AuntMinnie
Related News

Toronto Study: LLMs Must Cite Sources for Radiology Decision Support
University of Toronto researchers found that large language models (LLMs) such as DeepSeek V3 and GPT-4o offer promising support for radiology decision-making in pancreatic cancer when their recommendations cite guideline sources.

AI Model Using Mammograms Enhances Five-Year Breast Cancer Risk Assessment
A new image-only AI model more accurately predicts five-year breast cancer risk than breast density alone, according to multinational research presented at RSNA 2025.

AI Model Uses CT Scans to Reveal Biomarker for Chronic Stress
Researchers developed an AI model to measure chronic stress using adrenal gland volume on routine CT scans.