Back to all papers

Reinforcement learning improves LLM accuracy and reasoning in disease classification from radiology reports.

April 30, 2026pubmed logopapers

Authors

Wei Y,Lin Y,Flanders A,Shih G,Peng Y

Affiliations (5)

  • Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
  • Department of Radiology, Weill Cornell Medicine, New York, NY, USA.
  • Department of Radiology, Thomas Jefferson University, Philadelphia, PA, USA.
  • Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA. [email protected].
  • Department of Radiology, Weill Cornell Medicine, New York, NY, USA. [email protected].

Abstract

Accurate disease classification from radiology reports is essential for many applications. While supervised fine-tuning (SFT) of lightweight LLMs improves accuracy, it can degrade reasoning. We propose a two-stage approach: SFT on disease labels followed by Group Relative Policy Optimization (GRPO) to refine predictions by optimizing accuracy and format without reasoning supervision. Across three radiologist-annotated datasets, SFT outperformed baselines and GRPO further improved classification and enhanced reasoning recall and comprehensiveness.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.