A Pretraining Approach for Small-sample Training Employing Radiographs (PASTER): a Multimodal Transformer Trained by Chest Radiography and Free-text Reports.
Authors
Affiliations (16)
Affiliations (16)
- Graduate Institute of Life Sciences, College of Biomedical Sciences, National Defense Medical University, Taipei, Taiwan, R.O.C.
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- School of Public Health, College of Public Health, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Medical Technology Education Center, School of Medicine, College of Medicine, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Tri-Service General Hospital, Military Digital Medical Center, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Department of Radiology, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Department of Family and Community Medicine, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Division of Cardiology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Division of Nephrology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Department of Neurological Surgery, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Division of Cardiovascular Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical University, Taipei, Taiwan, R.O.C.
- Graduate Institute of Life Sciences, College of Biomedical Sciences, National Defense Medical University, Taipei, Taiwan, R.O.C.. [email protected].
- Medical Technology Education Center, School of Medicine, College of Medicine, National Defense Medical University, Taipei, Taiwan, R.O.C.. [email protected].
- Tri-Service General Hospital, Military Digital Medical Center, National Defense Medical University, Taipei, Taiwan, R.O.C.. [email protected].
Abstract
While deep convolutional neural networks (DCNNs) have achieved remarkable performance in chest X-ray interpretation, their success typically depends on access to large-scale, expertly annotated datasets. However, collecting such data in real-world clinical settings can be difficult because of limited labeling resources, privacy concerns, and patient variability. In this study, we applied a multimodal Transformer pretrained on free-text reports and their paired CXRs to evaluate the effectiveness of this method in settings with limited labeled data. Our dataset consisted of more than 1 million CXRs, each accompanied by reports from board-certified radiologists and 31 structured labels. The results indicated that a linear model trained on embeddings from the pretrained model achieved AUCs of 0.907 and 0.903 on internal and external test sets, respectively, using only 128 cases and 384 controls; the results were comparable those of DenseNet trained on the entire dataset, whose AUCs were 0.908 and 0.903, respectively. Additionally, we demonstrated similar results by extending the application of this approach to a subset annotated with structured echocardiographic reports. Furthermore, this multimodal model exhibited excellent small sample learning capabilities when tested on external validation sets such as CheXpert and ChestX-ray14. This research significantly reduces the sample size necessary for future artificial intelligence advancements in CXR interpretation.