Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.

June 10, 2025

papers

DOI: 10.1038/s41746-025-01646-7 PMID: 40494945

Authors

Riaz IB,Naqvi SAA,Ashraf N,Harris GJ,Kehl KL

Affiliations (5)

Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA. [email protected].
Mayo Clinic, Phoenix, AZ, USA. [email protected].
Mayo Clinic, Phoenix, AZ, USA.
Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
Massachusetts General Hospital, Boston, MA, USA.

Abstract

Phenotypic information for cancer research is embedded in unstructured electronic health records (EHR), requiring effort to extract. Deep learning models can automate this but face scalability issues due to privacy concerns. We evaluated techniques for applying a teacher-student framework to extract longitudinal clinical outcomes from EHRs. We focused on the challenging task of ascertaining two cancer outcomes-overall response and progression according to Response Evaluation Criteria in Solid Tumors (RECIST)-from free-text radiology reports. Teacher models with hierarchical Transformer architecture were trained on data from Dana-Farber Cancer Institute (DFCI). These models labeled public datasets (MIMIC-IV, Wiki-text) and GPT-4-generated synthetic data. "Student" models were then trained to mimic the teachers' predictions. DFCI "teacher" models achieved high performance, and student models trained on MIMIC-IV data showed comparable results, demonstrating effective knowledge transfer. However, student models trained on Wiki-text and synthetic data performed worse, emphasizing the need for in-domain public datasets for model distillation.

View Source Full Text PDF

Topics

Journal Article

Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.

Authors

Affiliations (5)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?