Sort by:
Page 21 of 59587 results

MM2CT: MR-to-CT translation for multi-modal image fusion with mamba

Chaohui Gong, Zhiying Wu, Zisheng Huang, Gaofeng Meng, Zhen Lei, Hongbin Liu

arxiv logopreprintAug 7 2025
Magnetic resonance (MR)-to-computed tomography (CT) translation offers significant advantages, including the elimination of radiation exposure associated with CT scans and the mitigation of imaging artifacts caused by patient motion. The existing approaches are based on single-modality MR-to-CT translation, with limited research exploring multimodal fusion. To address this limitation, we introduce Multi-modal MR to CT (MM2CT) translation method by leveraging multimodal T1- and T2-weighted MRI data, an innovative Mamba-based framework for multi-modal medical image synthesis. Mamba effectively overcomes the limited local receptive field in CNNs and the high computational complexity issues in Transformers. MM2CT leverages this advantage to maintain long-range dependencies modeling capabilities while achieving multi-modal MR feature integration. Additionally, we incorporate a dynamic local convolution module and a dynamic enhancement module to improve MRI-to-CT synthesis. The experiments on a public pelvis dataset demonstrate that MM2CT achieves state-of-the-art performance in terms of Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Our code is publicly available at https://github.com/Gots-ch/MM2CT.

Enhancing Domain Generalization in Medical Image Segmentation With Global and Local Prompts.

Zhao C, Li X

pubmed logopapersAug 7 2025
Enhancing domain generalization (DG) is a crucial and compelling research pursuit within the field of medical image segmentation, owing to the inherent heterogeneity observed in medical images. The recent success with large-scale pre-trained vision models (PVMs), such as Vision Transformer (ViT), inspires us to explore their application in this specific area. While a straightforward strategy involves fine-tuning the PVM using supervised signals from the source domains, this approach overlooks the domain shift issue and neglects the rich knowledge inherent in the instances themselves. To overcome these limitations, we introduce a novel framework enhanced by global and local prompts (GLPs). Specifically, to adapt PVM in the medical DG scenario, we explicitly separate domain-shared and domain-specific knowledge in the form of GLPs. Furthermore, we develop an individualized domain adapter to intricately investigate the relationship between each target domain sample and the source domains. To harness the inherent knowledge within instances, we devise two innovative regularization terms from both the consistency and anatomy perspectives, encouraging the model to preserve instance discriminability and organ position invariance. Extensive experiments and in-depth discussions in both vanilla and semi-supervised DG scenarios deriving from five diverse medical datasets consistently demonstrate the superior segmentation performance achieved by GLP. Our code and datasets are publicly available at https://github.com/xmed-lab/GLP.

X-UNet:A novel global context-aware collaborative fusion U-shaped network with progressive feature fusion of codec for medical image segmentation.

Xu S, Chen Y, Zhang X, Sun F, Chen S, Ou Y, Luo C

pubmed logopapersAug 7 2025
Due to the inductive bias of convolutions, CNNs perform hierarchical feature extraction efficiently in the field of medical image segmentation. However, the local correlation assumption of inductive bias limits the ability of convolutions to focus on global information, which has led to the performance of Transformer-based methods surpassing that of CNNs in some segmentation tasks in recent years. Although combining with Transformers can solve this problem, it also introduces computational complexity and considerable parameters. In addition, narrowing the encoder-decoder semantic gap for high-quality mask generation is a key challenge, addressed in recent works through feature aggregation from different skip connections. However, this often results in semantic mismatches and additional noise. In this paper, we propose a novel segmentation method, X-UNet, whose backbones employ the CFGC (Collaborative Fusion with Global Context-aware) module. The CFGC module enables multi-scale feature extraction and effective global context modeling. Simultaneously, we employ the CSPF (Cross Split-channel Progressive Fusion) module to progressively align and fuse features from corresponding encoder and decoder stages through channel-wise operations, offering a novel approach to feature integration. Experimental results demonstrate that X-UNet, with fewer computations and parameters, exhibits superior performance on various medical image datasets.The code and models are available on https://github.com/XSJ0410/X-UNet.

CT-GRAPH: Hierarchical Graph Attention Network for Anatomy-Guided CT Report Generation

Hamza Kalisch, Fabian Hörst, Jens Kleesiek, Ken Herrmann, Constantin Seibold

arxiv logopreprintAug 7 2025
As medical imaging is central to diagnostic processes, automating the generation of radiology reports has become increasingly relevant to assist radiologists with their heavy workloads. Most current methods rely solely on global image features, failing to capture fine-grained organ relationships crucial for accurate reporting. To this end, we propose CT-GRAPH, a hierarchical graph attention network that explicitly models radiological knowledge by structuring anatomical regions into a graph, linking fine-grained organ features to coarser anatomical systems and a global patient context. Our method leverages pretrained 3D medical feature encoders to obtain global and organ-level features by utilizing anatomical masks. These features are further refined within the graph and then integrated into a large language model to generate detailed medical reports. We evaluate our approach for the task of report generation on the large-scale chest CT dataset CT-RATE. We provide an in-depth analysis of pretrained feature encoders for CT report generation and show that our method achieves a substantial improvement of absolute 7.9\% in F1 score over current state-of-the-art methods. The code is publicly available at https://github.com/hakal104/CT-GRAPH.

UltimateSynth: MRI Physics for Pan-Contrast AI

Adams, R., Huynh, K. M., Zhao, W., Hu, S., Lyu, W., Ahmad, S., Ma, D., Yap, P.-T.

biorxiv logopreprintAug 7 2025
Magnetic resonance imaging (MRI) is commonly used in healthcare for its ability to generate diverse tissue contrasts without ionizing radiation. However, this flexibility complicates downstream analysis, as computational tools are often tailored to specific types of MRI and lack generalizability across the full spectrum of scans used in healthcare. Here, we introduce a versatile framework for the development and validation of AI models that can robustly process and analyze the full spectrum of scans achievable with MRI, enabling model deployment across scanner models, scan sequences, and age groups. Core to our framework is UltimateSynth, a technology that combines tissue physiology and MR physics in synthesizing realistic images across a comprehensive range of meaningful contrasts. This pan-contrast capability bolsters the AI development life cycle through efficient data labeling, generalizable model training, and thorough performance benchmarking. We showcase the effectiveness of UltimateSynth by training an off-the-shelf U-Net to generalize anatomical segmentation across any MR contrast. The U-Net yields highly robust tissue volume estimates, with variability under 4% across 150,000 unique-contrast images, 3.8% across 2,000+ low-field 0.3T scans, and 3.5% across 8,000+ images spanning the human lifespan from ages 0 to 100.

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Can Zhao, Pengfei Guo, Dong Yang, Yucheng Tang, Yufan He, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu

arxiv logopreprintAug 7 2025
Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability that only work for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to enhance the sensitivity to region of interest. Our experiments show that MAISI-v2 can achieve SOTA image quality with $33 \times$ acceleration for latent diffusion model. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.

MedCLIP-SAMv2: Towards universal text-driven medical image segmentation.

Koleilat T, Asgariandehkordi H, Rivaz H, Xiao Y

pubmed logopapersAug 7 2025
Segmentation of anatomical structures and pathologies in medical images is essential for modern disease diagnosis, clinical research, and treatment planning. While significant advancements have been made in deep learning-based segmentation techniques, many of these methods still suffer from limitations in data efficiency, generalizability, and interactivity. As a result, developing robust segmentation methods that require fewer labeled datasets remains a critical challenge in medical image analysis. Recently, the introduction of foundation models like CLIP and Segment-Anything-Model (SAM), with robust cross-domain representations, has paved the way for interactive and universal image segmentation. However, further exploration of these models for data-efficient segmentation in medical imaging is an active field of research. In this paper, we introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans using text prompts, in both zero-shot and weakly supervised settings. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, and leveraging the Multi-modal Information Bottleneck (M2IB) to create visual prompts for generating segmentation masks with SAM in the zero-shot setting. We also investigate using zero-shot segmentation labels in a weakly supervised paradigm to enhance segmentation quality further. Extensive validation across four diverse segmentation tasks and medical imaging modalities (breast tumor ultrasound, brain tumor MRI, lung X-ray, and lung CT) demonstrates the high accuracy of our proposed framework. Our code is available at https://github.com/HealthX-Lab/MedCLIP-SAMv2.

MLAgg-UNet: Advancing Medical Image Segmentation with Efficient Transformer and Mamba-Inspired Multi-Scale Sequence.

Jiang J, Lei S, Li H, Sun Y

pubmed logopapersAug 7 2025
Transformers and state space sequence models (SSMs) have attracted interest in biomedical image segmentation for their ability to capture long-range dependency. However, traditional visual state space (VSS) methods suffer from the incompatibility of image tokens with autoregressive assumption. Although Transformer attention does not require this assumption, its high computational cost limits effective channel-wise information utilization. To overcome these limitations, we propose the Mamba-Like Aggregated UNet (MLAgg-UNet), which introduces Mamba-inspired mechanism to enrich Transformer channel representation and exploit implicit autoregressive characteristic within U-shaped architecture. For establishing dependencies among image tokens in single scale, the Mamba-Like Aggregated Attention (MLAgg) block is designed to balance representational ability and computational efficiency. Inspired by the human foveal vision system, Mamba macro-structure, and differential attention, MLAgg block can slide its focus over each image token, suppress irrelevant tokens, and simultaneously strengthen channel-wise information utilization. Moreover, leveraging causal relationships between consecutive low-level and high-level features in U-shaped architecture, we propose the Multi-Scale Mamba Module with Implicit Causality (MSMM) to optimize complementary information across scales. Embedded within skip connections, this module enhances semantic consistency between encoder and decoder features. Extensive experiments on four benchmark datasets, including AbdomenMRI, ACDC, BTCV, and EndoVis17, which cover MRI, CT, and endoscopy modalities, demonstrate that the proposed MLAgg-UNet consistently outperforms state-of-the-art CNN-based, Transformer-based, and Mamba-based methods. Specifically, it achieves improvements of at least 1.24%, 0.20%, 0.33%, and 0.39% in DSC scores on these datasets, respectively. These results highlight the model's ability to effectively capture feature correlations and integrate complementary multi-scale information, providing a robust solution for medical image segmentation. The implementation is publicly available at https://github.com/aticejiang/MLAgg-UNet.

Improving 3D Thin Vessel Segmentation in Brain TOF-MRA via a Dual-space Context-Aware Network.

Shan W, Li X, Wang X, Li Q, Wang Z

pubmed logopapersAug 6 2025
3D cerebrovascular segmentation poses a significant challenge, akin to locating a line within a vast 3D environment. This complexity can be substantially reduced by projecting the vessels onto a 2D plane, enabling easier segmentation. In this paper, we create a vessel-segmentation-friendly space using a clinical visualization technique called maximum intensity projection (MIP). Leveraging this, we propose a Dual-space Context-Aware Network (DCANet) for 3D vessel segmentation, designed to capture even the finest vessel structures accurately. DCANet begins by transforming a magnetic resonance angiography (MRA) volume into a 3D Regional-MIP volume, where each Regional-MIP slice is constructed by projecting adjacent MRA slices. This transformation highlights vessels as prominent continuous curves rather than the small circular or ellipsoidal cross-sections seen in MRA slices. DCANet encodes vessels separately in the MRA and the projected Regional-MIP spaces and introduces the Regional-MIP Image Fusion Block (MIFB) between these dual spaces to selectively integrate contextual features from Regional-MIP into MRA. Following dual-space encoding, DCANet employs a Dual-mask Spatial Guidance TransFormer (DSGFormer) decoder to focus on vessel regions while effectively excluding background areas, which reduces the learning burden and improves segmentation accuracy. We benchmark DCANet on four datasets: two public datasets, TubeTK and IXI-IOP, and two in-house datasets, Xiehe and IXI-HH. The results demonstrate that DCANet achieves superior performance, with improvements in average DSC values of at least 2.26%, 2.17%, 2.62%, and 2.58% for thin vessels, respectively. Codes are available at: https://github.com/shanwq/DCANet.

DDTracking: A Deep Generative Framework for Diffusion MRI Tractography with Streamline Local-Global Spatiotemporal Modeling

Yijie Li, Wei Zhang, Xi Zhu, Ye Wu, Yogesh Rathi, Lauren J. O'Donnell, Fan Zhang

arxiv logopreprintAug 6 2025
This paper presents DDTracking, a novel deep generative framework for diffusion MRI tractography that formulates streamline propagation as a conditional denoising diffusion process. In DDTracking, we introduce a dual-pathway encoding network that jointly models local spatial encoding (capturing fine-scale structural details at each streamline point) and global temporal dependencies (ensuring long-range consistency across the entire streamline). Furthermore, we design a conditional diffusion model module, which leverages the learned local and global embeddings to predict streamline propagation orientations for tractography in an end-to-end trainable manner. We conduct a comprehensive evaluation across diverse, independently acquired dMRI datasets, including both synthetic and clinical data. Experiments on two well-established benchmarks with ground truth (ISMRM Challenge and TractoInferno) demonstrate that DDTracking largely outperforms current state-of-the-art tractography methods. Furthermore, our results highlight DDTracking's strong generalizability across heterogeneous datasets, spanning varying health conditions, age groups, imaging protocols, and scanner types. Collectively, DDTracking offers anatomically plausible and robust tractography, presenting a scalable, adaptable, and end-to-end learnable solution for broad dMRI applications. Code is available at: https://github.com/yishengpoxiao/DDtracking.git
Page 21 of 59587 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.