A monocular endoscopic image depth estimation method based on a window-adaptive asymmetric dual-branch Siamese network.

Authors

Chong N,Yang F,Wei K

Affiliations (4)

  • School of Information and Intelligence Engineering, Tianjin Renai College, Jinjing Road, Jinghai District, Tianjin, 301636, China. [email protected].
  • School of Electronic and Information Engineering, Hebei University of Technology, No.5340 Xiping Road, Beichen District, Tianjin, 300401, China. [email protected].
  • School of Electronic and Information Engineering, Hebei University of Technology, No.5340 Xiping Road, Beichen District, Tianjin, 300401, China. [email protected].
  • Department of Orthopaedics, Tianjin Fifth Central Hospital, No. 41 Zhejiang Road, Binhai New Area, Tianjin, 300450, China.

Abstract

Minimally invasive surgery involves entering the body through small incisions or natural orifices, using a medical endoscope for observation and clinical procedures. However, traditional endoscopic images often suffer from low texture and uneven illumination, which can negatively impact surgical and diagnostic outcomes. To address these challenges, many researchers have applied deep learning methods to enhance the processing of endoscopic images. This paper proposes a monocular medical endoscopic image depth estimation method based on a window-adaptive asymmetric dual-branch Siamese network. In this network, one branch focuses on processing global image information, while the other branch concentrates on local details. An improved lightweight Squeeze-and-Excitation (SE) module is added to the final layer of each branch, dynamically adjusting the inter-channel weights through self-attention. The outputs from both branches are then integrated using a lightweight cross-attention feature fusion module, enabling cross-branch feature interaction and enhancing the overall feature representation capability of the network. Extensive ablation and comparative experiments were conducted on medical datasets (EAD2019, Hamlyn, M2caiSeg, UCL) and a non-medical dataset (NYUDepthV2), with both qualitative and quantitative results-measured in terms of RMSE, AbsRel, FLOPs and running time-demonstrating the superiority of the proposed model. Additionally, comparisons with CT images show good organ boundary matching capability, highlighting the potential of our method for clinical applications. The key code of this paper is available at: https://github.com/superchongcnn/AttenAdapt_DE .

Topics

EndoscopyImage Processing, Computer-AssistedJournal Article
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.