Back to all papers

Soft label-guided transformer for radiology report generation.

Authors

Liu X,Xin J,Shen Q,Huang Z,Wang Z

Affiliations (5)

  • College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China. Electronic address: [email protected].
  • College of Computer Science and Engineering, Northeastern University, Shenyang, 110169, China. Electronic address: [email protected].
  • College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China. Electronic address: [email protected].
  • School of Physics and Engineering Technology, University of York, York, YO10 5DD, United Kingdom. Electronic address: [email protected].
  • College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China. Electronic address: [email protected].

Abstract

Radiology report provides important references for physicians' treatment decisions by including descriptions and diagnostic results of imaging. Automatic generation of radiology report reduces the workload of physicians and significantly improves work efficiency. However, the existing report generation methods use image-text conversion to generate reports directly from medical images, and fail to fully simulate the radiologist's diagnostic process of "examine first, describe later". Therefore, existing methods often can only generate general normal descriptions, and it is difficult to accurately describe the specific lesion features. To address this issue, we mimic the working mode of radiologists by first checking whether the patient suffers from a certain disease, and then using the learned medical knowledge to describe the images to form a report. We propose a soft label-guided transformer (SLGT) for radiology report generation. Firstly, the pseudo-labels of the samples are obtained, and the soft label-guided attention mechanism is utilized to highlight features related to the disease labels in the encoding stage. Secondly, text features from the decoding phase and image features are aligned, and the generated text features are used to guide the potential representations. Finally, a hybrid loss is designed that includes losses for text generation, disease classification, and visual-textual alignment. Optimization of SLGT using the hybrid loss allows the model to learn richer features that are more relevant to disease abnormalities, which improves the performance of the model. The proposed SLGT is evaluated on the widely used IU X-ray, MIMIC-CXR, and COV-CTR datasets. The experiments show that the proposed model SLGT outperforms the previous state-of-the-art models on three datasets. This work improves the performance of automatically generating medical reports, making their application in computer-aided diagnosis feasible.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.