Back to all papers

A High-Accuracy Rule-Based Algorithm for Automated Extraction of Coronary Artery Calcium Scores from Mixed-Language Radiology Reports.

June 11, 2026pubmed logopapers

Authors

Hung WC,Chen CC,Lee HC,Huang SJ,Hong WW,Liao JY,Wu CH

Affiliations (12)

  • Department of Family Medicine and Community Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan. [email protected].
  • School of Medicine for International Students, College of Medicine, I-Shou University School, Kaohsiung, Taiwan. [email protected].
  • School of Medicine, College of Medicine, I-Shou University School, Kaohsiung, Taiwan. [email protected].
  • Department of Information Engineering, I-Shou University, Kaohsiung, Taiwan.
  • Department of Family Medicine and Community Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan.
  • School of Medicine, College of Medicine, I-Shou University School, Kaohsiung, Taiwan.
  • Department of Obstetrics and Gynecology, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan.
  • E-Da Dachang Hospital, I-Shou University School, Kaohsiung, Taiwan.
  • Department of Obstetrics and Gynecology, Morsani College of Medicine, University of South Florida, Tampa, USA.
  • Department of Nursing, I-Shou University, E-Da Hospital, Kaohsiung, Taiwan.
  • Institute of Gerontology, College of Medicine, National Cheng Kung University, Tainan, Taiwan. [email protected].
  • Department of Family Medicine, College of Medicine, National Cheng Kung University Hospital, National Cheng Kung University, Tainan, Taiwan. [email protected].

Abstract

Coronary artery calcium (CAC) score is a vital biomarker for cardiovascular risk stratification, yet this value is often locked within unstructured radiology reports. Manual extraction is impractical for large-scale research, while automated extraction is hindered in multilingual settings where reports feature complex, mixed-language text. This study aimed to develop a rule-based algorithm to extract CAC scores from mixed Chinese-English radiology reports. This retrospective study analyzed 21,874 free-text cardiac CT reports (2004-2024) from a Taiwanese center. Authored by 23 radiologists, the reports showed high stylistic diversity and a 96.7% prevalence of mixed text. An iterative rule-based algorithm was developed and validated against a gold standard established by the manual review of two independent physicians, with discrepancies adjudicated by a third expert. The algorithm achieved 99.98% accuracy, correctly identifying scores in 21,870 reports, with 100% coverage. It processed the dataset in 17.59 seconds, contrasting with an estimated 21 hours for manual review. Error analysis revealed that the rare misclassifications (n=4) were exclusively attributable to linguistic ambiguity involving multiple unseparated numeric values, rather than algorithmic failure in pattern recognition. This rule-based algorithm accurately and efficiently extracts structured data from complex multilingual radiology reports. This automated approach provides a practical solution to unlock valuable historical clinical data for large-scale cardiovascular research, overcoming a major barrier in multilingual clinical informatics.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.