Intelligent documentation in medical education: can AI replace manual case logging?
Authors
Affiliations (2)
Affiliations (2)
- Department of Computer Science, University of California, Davis, CA, United States.
- Department of Radiology, University of California, Davis, CA, United States.
Abstract
This study investigates the feasibility of using large language models (LLMs) to automate procedural case log documentation in radiology training. We evaluate whether AI can replace manual logging, identify which procedure types are most challenging for extraction, and assess integration into clinical workflows. We retrospectively analyzed 414 curated radiology reports authored by nine interventional radiology residents between 2018 and 2024. A set of candidate models, including local (Qwen-2.5) and commercial (Claude-3.5), were tested under instruction and chain-of-thought prompting. Performance was measured by sensitivity, specificity, and F1-score, along with inference time and token efficiency to estimate operational cost. Both local and commercial LLMs outperformed the standard benchmark. Qwen-2.5 achieved F1-scores of 86.66 with chain-of-thought prompting, while Claude-3.5-Haiku reached an F1-score of 86.89%. Commercial inference delivered sub-2s latency and concise outputs, while local deployment traded speed for lower recurring cost. Automation could save over 35 hours of manual annotation per resident annually. LLMs can provide a scalable and accurate solution for radiology case log documentation. Optimizing for procedure-specific challenges and ensuring seamless integration with existing systems will be essential. Future work should validate across larger, multi-institution datasets and explore additional prompting strategies. LLMs show promise for automating radiology case log documentation, potentially reducing resident clerical burden. However, this single-institution feasibility study underscores the need for broader validation across diverse institutions, assessment of real-world workflow integration, and safeguards against misclassification before clinical adoption.