Back to all papers

Neuro-symbolic AI for auditable cognitive information extraction from medical reports.

November 21, 2025pubmed logopapers

Authors

Prenosil GA,Weitzel TK,Bello SC,Mingels C,Manzini G,Meier LP,Shi KY,Rominger A,Afshar-Oromieh A

Affiliations (5)

  • Department of Nuclear Medicine, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland. [email protected].
  • Zentit GmbH, Muri bei Bern, Switzerland. [email protected].
  • SCCE - Scientific Consulting, Computing and Engineering GmbH, Kirchlindach, Bern, Switzerland.
  • Zentit GmbH, Muri bei Bern, Switzerland.
  • Department of Nuclear Medicine, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.

Abstract

Large language models (LLMs) such as GPT-4 can interpret free text, but unreliable answers, opaque reasoning, and privacy risks limit their use in healthcare. In contrast, rule-based artificial intelligence (AI) provides transparent and reproducible results but struggles with free text. We aimed to combine the strengths of both approaches to test whether such a hybrid system can autonomously and reliably extract clinical data from diagnostic imaging reports. We developed a neuro-symbolic AI that connects GPT-4 with a rule-based expert system through a semantic integration platform. GPT-4 extracted candidate facts from free-text reports, while the expert system verified them against medical rules, producing traceable, deterministic labels. We evaluated the system on 206 consecutive prostate cancer PET/CT scan reports, requiring extraction of 26 clinical parameters per report, generating 5356 data points, and answering three study questions: study inclusion, recurrent cancer identification, and prostate-specific antigen (PSA) level retrieval. Outputs were compared against physician-derived references, and discrepancies were reviewed by a blinded adjudicator. Here we show that neuro-symbolic AI outperforms GPT-4 alone and matches physicians in structuring and analysing reports. GPT-4 alone achieves F1 scores of 0.63 for study inclusion and 0.95 for recurrence detection, with 96.6% correct PSA values. Physicians reach F1 scores of 1.00 and 0.99, with 98.1% PSA accuracy. The neuro-symbolic AI scores twice 1.00 with 100% PSA accuracy and delivers always an auditable chain of reasoning. It intercepts two intentionally introduced reports with residual identifiers, preventing unintended transfer of sensitive data. Unlike standalone LLMs, neuro-symbolic AI can safely automate data extraction for clinical research and may provide a path toward trustworthy AI in healthcare practice.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.