Adaptive, Privacy-Preserving Small Language Models for Multi-Task Clinical Assistance.

March 13, 2026

papers

DOI: 10.1007/s10278-026-01912-4 PMID: 41826595

Authors

Zheng G,Kamel P,Pillai JJ,Akhbardeh A,Braverman V,Jacobs MA,Parekh VS

Affiliations (10)

Department of Computer Science, The Johns Hopkins University, Baltimore, MD, USA.
Department of Computer Science, Rice University, Houston, TX, USA.
Department of Neuroradiology, Division of Diagnostic Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Division of Neuroradiology, Department of Radiology, Mayo Clinic, Rochester, MN, USA.
Department of Diagnostic and Interventional Imaging, McGovern Medical School, UTHealth Houston, Houston, TX, USA.
GSBS, MD Anderson Cancer Center, The University of Texas, Houston, TX, USA.
Department of Russell H. Morgan Radiology and Radiological Science and Sidney Kimmel Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA. [email protected].
Department of Diagnostic and Interventional Imaging, McGovern Medical School, UTHealth Houston, Houston, TX, USA. [email protected].

Abstract

The purpose of this study is to evaluate whether a single, fine-tuned SLM can match or exceed the performance of LLMs across diverse clinical tasks, enabling hospitals to build tailored, privacy-preserving, efficient, and deployable language models that do not require managing multiple task-specific systems. We used SLMs of varying sizes and applied low-rank adaptation (LoRA) for fine-tuning across three clinical tasks: (1) medical report labeling, (2) DICOM series description harmonization, and (3) impression generation from findings. These tasks were constructed using two datasets: the public Open-i Indiana University Chest X-ray Dataset and an in-house brain MRI DICOM metadata dataset. We compared single-task SLMs, a multi-task SLM (representing our proposed configuration), and GPT-4o using zero-shot and few-shot prompting. We found OPT-350 m to be the optimal SLM. In medical report labeling, the multi-task SLM achieved an F1 score of 0.894 compared to additional prompt-engineered GPT-4o's 0.728. In DICOM series description harmonization, the multi-task achieved an accuracy of 0.975 compared to additional prompt-engineered GPT-4o's 0.878. In impression generation from findings, the multi-task SLM achieved an average Likert scale score of 4.39 ± 1.00, compared to GPT-4o's 3.65 ± 1.00 (p = 0.0008). This study demonstrates that a single fine-tuned SLM can serve as a general-purpose clinical assistant, offering performance on par with or better than larger models. With lower resource requirements, greater customizability, privacy protection, and strong task generalization, fine-tuning one SLM to support multiple clinical tasks meets the practical demands of clinical AI deployment in both high-resource and resource-limited healthcare settings.

View Source Full Text PDF

Topics

Journal Article

Adaptive, Privacy-Preserving Small Language Models for Multi-Task Clinical Assistance.

Authors

Affiliations (10)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?