Back to all papers

Readability versus accuracy in LLM-transformed radiology reports: stakeholder preferences across reading grade levels.

Authors

Lee HS,Kim S,Kim S,Seo J,Kim WH,Kim J,Han K,Hwang SH,Lee YH

Affiliations (12)

  • Department of Radiology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea.
  • Department of Medical Device Engineering and Management, The Graduate School, Yonsei University College of Medicine, Seoul, Republic of Korea.
  • Department of Integrative Medicine, The Graduate School, Yonsei University College of Medicine, Seoul, Republic of Korea.
  • Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea.
  • Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.
  • School of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of Korea.
  • BeamWorks Inc., Daegu, Republic of Korea.
  • Department of Radiology, School of Medicine, Kyungpook National University, Kyungpook National University Chilgok Hospital, Daegu, Republic of Korea.
  • Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea.
  • Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea.
  • Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea. [email protected].
  • Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea. [email protected].

Abstract

To examine how reading grade levels affect stakeholder preferences based on a trade-off between accuracy and readability. A retrospective study of 500 radiology reports from academic and community hospitals across five imaging modalities was conducted. Reports were transformed into 11 reading grade levels (7-17) using Gemini. Accuracy, readability, and preference were rated on a 5-point scale by radiologists, physicians, and laypersons. Errors (generalizations, omissions, hallucinations) and potential changes in patient management (PCPM) were identified. Ordinal logistic regression analyzed preference predictors, and weighted kappa measured interobserver reliability. Preferences varied across reading grade levels depending on stakeholder group, modality, and clinical setting. Overall, preferences peaked at grade 16, but declined at grade 17, particularly among laypersons. Lower reading grades improved readability but increased errors, while higher grades improved accuracy but reduced readability. In multivariable analysis, accuracy was the strongest predictor of preference for all groups (OR: 30.29, 33.05, and 2.16; p <0 .001), followed by readability (OR: 2.73, 1.70, 2.01; p <0.001). Higher-grade levels were generally preferred due to better accuracy, with a range of 12-17. Further increasing grade levels reduced readability sharply, limiting preference. These findings highlight the limitations of unsupervised LLM transformations and suggest the need for hybrid approaches that maintain original reports while incorporating explanatory content to balance accuracy and readability.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.