Back to all papers

Toward Next-generation Medical Vision Backbones: Modeling Finer-grained Long-range Visual Dependency

September 14, 2025arxiv logopreprintarXiv: 2509.11328v1

Authors

Mingyuan Meng

Abstract

Medical Image Computing (MIC) is a broad research topic covering both pixel-wise (e.g., segmentation, registration) and image-wise (e.g., classification, regression) vision tasks. Effective analysis demands models that capture both global long-range context and local subtle visual characteristics, necessitating fine-grained long-range visual dependency modeling. Compared to Convolutional Neural Networks (CNNs) that are limited by intrinsic locality, transformers excel at long-range modeling; however, due to the high computational loads of self-attention, transformers typically cannot process high-resolution features (e.g., full-scale image features before downsampling or patch embedding) and thus face difficulties in modeling fine-grained dependency among subtle medical image details. Concurrently, Multi-layer Perceptron (MLP)-based visual models are recognized as computation/memory-efficient alternatives in modeling long-range visual dependency but have yet to be widely investigated in the MIC community. This doctoral research advances deep learning-based MIC by investigating effective long-range visual dependency modeling. It first presents innovative use of transformers for both pixel- and image-wise medical vision tasks. The focus then shifts to MLPs, pioneeringly developing MLP-based visual models to capture fine-grained long-range visual dependency in medical images. Extensive experiments confirm the critical role of long-range dependency modeling in MIC and reveal a key finding: MLPs provide feasibility in modeling finer-grained long-range dependency among higher-resolution medical features containing enriched anatomical/pathological details. This finding establishes MLPs as a superior paradigm over transformers/CNNs, consistently enhancing performance across various medical vision tasks and paving the way for next-generation medical vision backbones.

Topics

cs.CV

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.