A lightweight transformer-based hybrid encoder-decoder model for chest X-ray medical report generation.

March 11, 2026

papers

DOI: 10.1038/s41598-026-40710-4 PMID: 41813717

Authors

Ucan M,Kaya B,Kaya M,Alhajj R

Affiliations (6)

Department of Computer Technologies, Dicle University, 21200, Diyarbakir, Turkey. [email protected].
Department of Electronics and Automation, Firat University, 23119, Elazig, Turkey.
Department of Computer Engineering, Firat University, 23119, Elazig, Turkey. [email protected].
Department of Computer Science, University of Calgary, Calgary, AB, Canada. [email protected].
Department of Computer Engineering, İstanbul Medipol University, İstanbul, Turkey. [email protected].
Department of Health Informatics, University of Southern Denmark, Odense, Denmark. [email protected].

Abstract

Diagnosing diseases from medical images and reporting them at the paragraph level is a significant challenge for deep learning-based autonomous systems. Existing work primarily focuses on achieving high accuracy, often paying less attention to the computational cost of training and testing. The goal of this work is to build a low computational cost and high-performance hybrid encoder-decoder architecture capable of producing autonomous medical reports. On the encoder side of our architecture, called FAST-MRG, features are extracted from images with a transformer-based encoder enriched with distillation techniques, while on the decoder side, a generative pre-training transformer generates paragraph-level text using the extracted features. Numerical analysis with word matching evaluation metrics, temporal analysis and observational analysis were performed to measure the success of the architecture. Our hybrid encoder-decoder architecture was trained and tested using chest X-ray images and reports from the Indiana University Chest X-ray collection dataset. The FAST-MRG architecture achieved scores of 0.373, 0.226 and 0.332 on the Bleu-1, Meteor and Rouge evaluation metrics, respectively. It also has an average time efficiency of 66% compared to previous work using similar GPU environments. The study demonstrates through experiments that meaningful reports are produced that can support doctors in diagnosis and treatment processes. In the study, the results are presented not only with measurable average values but also with a density distribution graph and the test results are analyzed in depth. With its low runtime and high performance, the proposed architecture can serve as a basis for future work.

View Source Full Text PDF

Topics

Radiography, ThoracicDeep LearningImage Processing, Computer-AssistedJournal Article

A lightweight transformer-based hybrid encoder-decoder model for chest X-ray medical report generation.

Authors

Affiliations (6)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?