LUNETR: Language-Infused UNETR for precise pancreatic tumor segmentation in 3D medical image.

Authors

Shi Z,Zhang R,Wei X,Yu C,Xie H,Hu Z,Chen X,Zhang Y,Xie B,Luo Z,Peng W,Xie X,Li F,Long X,Li L,Hu L

Affiliations (5)

  • School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China.
  • Department of Radiology, Zhuzhou Hospital Affiliated to Xiangya' School of Medicine, Central South University, Zhuzhou 412002, China.
  • Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha 410011, China.
  • School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China. Electronic address: [email protected].
  • Department of Radiology, Zhuzhou Hospital Affiliated to Xiangya' School of Medicine, Central South University, Zhuzhou 412002, China. Electronic address: [email protected].

Abstract

The identification of early micro-lesions and adjacent blood vessels in CT scans plays a pivotal role in the clinical diagnosis of pancreatic cancer, considering its aggressive nature and high fatality rate. Despite the widespread application of deep learning methods for this task, several challenges persist: (1) the complex background environment in abdominal CT scans complicates the accurate localization of potential micro-tumors; (2) the subtle contrast between micro-lesions within pancreatic tissue and the surrounding tissues makes it challenging for models to capture these features accurately; and (3) tumors that invade adjacent blood vessels pose significant barriers to surgical procedures. To address these challenges, we propose LUNETR (Language-Infused UNETR), an advanced multimodal encoder model that combines textual and image information for precise medical image segmentation. The integration of an autoencoding language model with cross-attention enabling our model to effectively leverage semantic associations between textual and image data, thereby facilitating precise localization of potential pancreatic micro-tumors. Additionally, we designed a Multi-scale Aggregation Attention (MSAA) module to comprehensively capture both spatial and channel characteristics of global multi-scale image data, enhancing the model's capacity to extract features from micro-lesions embedded within pancreatic tissue. Furthermore, in order to facilitate precise segmentation of pancreatic tumors and nearby blood vessels and address the scarcity of multimodal medical datasets, we collaborated with Zhuzhou Central Hospital to construct a multimodal dataset comprising CT images and corresponding pathology reports from 135 pancreatic cancer patients. Our experimental results surpass current state-of-the-art models, with the incorporation of the semantic encoder improving the average Dice score for pancreatic tumor segmentation by 2.23 %. For the Medical Segmentation Decathlon (MSD) liver and lung cancer datasets, our model achieved an average Dice score improvement of 4.31 % and 3.67 %, respectively, demonstrating the efficacy of the LUNETR.

Topics

Pancreatic NeoplasmsDeep LearningImaging, Three-DimensionalLanguageImage Processing, Computer-AssistedJournal ArticleReview

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.