Reinforcement Learning for Clinical Reasoning: Aligning LLMs with ACR Imaging Appropriateness Criteria
Authors
Abstract
Medical imaging has revolutionized diagnosis, yet unnecessary procedures are rising, exposing patients to radiation and stress, limiting equitable access, and straining healthcare systems. The American College of Radiology Appropriateness Criteria, developed through extensive multidisciplinary review, provide evidence-based guidance but remain underutilized. Leveraging advances in LLM reasoning, we introduce a Reasoning Agent trained with Reinforcement Learning (RL), specifically Group Relative Policy Optimization (GRPO), to replicate expert clinical reasoning from the ACR Criteria. We present a novel RL approach for structured medical reasoning, systematically comparing reasoning-focused reward functions and evidence integration strategies. Our lightweight 8B model, MedReason-Embed, improves macro F1 by 18% over baseline, shows stronger reasoning alignment, and outperforms both larger and alternatively trained models, showing that reasoning-based supervision enables efficient, trustworthy clinical AI. Building on this, we design a modular end-to-end agentic architecture that automates imaging referrals: mapping diagnoses to ICD codes, retrieving PubMed evidence, and recommending optimal procedures. Crucially, the ability to generalize beyond static ACR guidelines not only enables clinicians to handle out-of-distribution cases, but also supports scaling the guideline development process itself, potentially reducing the significant effort required to create and update them. This work shows the potential of reasoning-focused RL within agentic architectures to deliver transparent, scalable, and reliable clinical decision support. Our code is available at: https://anonymous.4open.science/r/agentic-imaging-recommender-iclr-877D