Transformer-based robotic ultrasound 3D tracking for capsule robot in GI tract.
Authors
Affiliations (8)
Affiliations (8)
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada.
- Discipline of Medical Engineering, University of Newcastle, Newcastle, NSW, Australia.
- Division of Cardiology, Department of Medicine, University of Toronto, Toronto, ON, Canada.
- Department of Materials Science and Engineering, University of Toronto, Toronto, ON, Canada.
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada.
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada. [email protected].
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada. [email protected].
- University of Toronto Robotics Institute, University of Toronto, Toronto, ON, Canada. [email protected].
Abstract
Ultrasound (US) imaging is a promising modality for real-time monitoring of robotic capsule endoscopes navigating through the gastrointestinal (GI) tract. It offers high temporal resolution and safety but is limited by a narrow field of view, low visibility in gas-filled regions and challenges in detecting out-of-plane motions. This work addresses these issues by proposing a novel robotic ultrasound tracking system capable of long-distance 3D tracking and active re-localization when the capsule is lost due to motion or artifacts. We develop a hybrid deep learning-based tracking framework combining convolutional neural networks (CNNs) and a transformer backbone. The CNN component efficiently encodes spatial features, while the transformer captures long-range contextual dependencies in B-mode US images. This model is integrated with a robotic arm that adaptively scans and tracks the capsule. The system's performance is evaluated using ex vivo colon phantoms under varying imaging conditions, with physical perturbations introduced to simulate realistic clinical scenarios. The proposed system achieved continuous 3D tracking over distances exceeding 90 cm, with a mean centroid localization error of 1.5 mm and over 90% detection accuracy. We demonstrated 3D tracking in a more complex workspace featuring two curved sections to simulate anatomical challenges. This suggests the strong resilience of the tracking system to motion-induced artifacts and geometric variability. The system maintained real-time tracking at 9-12 FPS and successfully re-localized the capsule within seconds after tracking loss, even under gas artifacts and acoustic shadowing. This study presents a hybrid CNN-transformer system for automatic, real-time 3D ultrasound tracking of capsule robots over long distances. The method reliably handles occlusions, view loss and image artifacts, offering millimeter-level tracking accuracy. It significantly reduces clinical workload through autonomous detection and re-localization. Future work includes improving probe-tissue interaction handling and validating performance in live animal and human trials to assess physiological impacts.