Study: Large Language Models Outperform on Chest CT Report Analysis

July 30, 2025

Advanced large language models like GPT-4 accurately identify thoracic diseases in chest CT reports, enhancing pre-operative surgical planning.

Key Details

  • Five LLMs (GPT-4, Claude-3.5, Qwen-Max, GPT-3.5-Turbo, Gemini-Pro) compared using 13,489 real-world chest CT reports.
  • GPT-4 achieved up to 75% accuracy in identifying 13 common chest diseases with multiple-choice prompts.
  • Multiple-choice prompts significantly improved model accuracy compared to open-ended questions.
  • Fine-tuning GPT-3.5-Turbo increased its accuracy from 42% to 65% in challenging cases.
  • No single LLM was best for all diseases, suggesting a tailored approach may be optimal.
  • Future research will use explainable AI tools to increase transparency and reliability.

Why It Matters

The study demonstrates that modern LLMs can act as accurate 'second readers' for radiology reports, possibly reducing diagnostic errors and alleviating radiologist workload. Fine-tuning and prompt design further boost performance, potentially making AI support accessible even in resource-limited settings.

Read more

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.