Sort by:
Page 15 of 45448 results

DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder

Dan He, Weisheng Li, Guofen Wang, Yuping Huang, Shiqiang Liu

arxiv logopreprintJun 18 2025
Multimodal medical image fusion (MMIF) extracts the most meaningful information from multiple source images, enabling a more comprehensive and accurate diagnosis. Achieving high-quality fusion results requires a careful balance of brightness, color, contrast, and detail; this ensures that the fused images effectively display relevant anatomical structures and reflect the functional status of the tissues. However, existing MMIF methods have limited capacity to capture detailed features during conventional training and suffer from insufficient cross-modal feature interaction, leading to suboptimal fused image quality. To address these issues, this study proposes a two-stage diffusion model-based fusion network (DM-FNet) to achieve unified MMIF. In Stage I, a diffusion process trains UNet for image reconstruction. UNet captures detailed information through progressive denoising and represents multilevel data, providing a rich set of feature representations for the subsequent fusion network. In Stage II, noisy images at various steps are input into the fusion network to enhance the model's feature recognition capability. Three key fusion modules are also integrated to process medical images from different modalities adaptively. Ultimately, the robust network structure and a hybrid loss function are integrated to harmonize the fused image's brightness, color, contrast, and detail, enhancing its quality and information density. The experimental results across various medical image types demonstrate that the proposed method performs exceptionally well regarding objective evaluation metrics. The fused image preserves appropriate brightness, a comprehensive distribution of radioactive tracers, rich textures, and clear edges. The code is available at https://github.com/HeDan-11/DM-FNet.

Interactive prototype learning and self-learning for few-shot medical image segmentation.

Song Y, Xu C, Wang B, Du X, Chen J, Zhang Y, Li S

pubmed logopapersJun 18 2025
Few-shot learning alleviates the heavy dependence of medical image segmentation on large-scale labeled data, but it shows strong performance gaps when dealing with new tasks compared with traditional deep learning. Existing methods mainly learn the class knowledge of a few known (support) samples and extend it to unknown (query) samples. However, the large distribution differences between the support image and the query image lead to serious deviations in the transfer of class knowledge, which can be specifically summarized as two segmentation challenges: Intra-class inconsistency and Inter-class similarity, blurred and confused boundaries. In this paper, we propose a new interactive prototype learning and self-learning network to solve the above challenges. First, we propose a deep encoding-decoding module to learn the high-level features of the support and query images to build peak prototypes with the greatest semantic information and provide semantic guidance for segmentation. Then, we propose an interactive prototype learning module to improve intra-class feature consistency and reduce inter-class feature similarity by conducting mid-level features-based mean prototype interaction and high-level features-based peak prototype interaction. Last, we propose a query features-guided self-learning module to separate foreground and background at the feature level and combine low-level feature maps to complement boundary information. Our model achieves competitive segmentation performance on benchmark datasets and shows substantial improvement in generalization ability.

D2Diff : A Dual Domain Diffusion Model for Accurate Multi-Contrast MRI Synthesis

Sanuwani Dayarathna, Himashi Peiris, Kh Tohidul Islam, Tien-Tsin Wong, Zhaolin Chen

arxiv logopreprintJun 18 2025
Multi contrast MRI synthesis is inherently challenging due to the complex and nonlinear relationships among different contrasts. Each MRI contrast highlights unique tissue properties, but their complementary information is difficult to exploit due to variations in intensity distributions and contrast specific textures. Existing methods for multi contrast MRI synthesis primarily utilize spatial domain features, which capture localized anatomical structures but struggle to model global intensity variations and distributed patterns. Conversely, frequency domain features provide structured inter contrast correlations but lack spatial precision, limiting their ability to retain finer details. To address this, we propose a dual domain learning framework that integrates spatial and frequency domain information across multiple MRI contrasts for enhanced synthesis. Our method employs two mutually trained denoising networks, one conditioned on spatial domain and the other on frequency domain contrast features through a shared critic network. Additionally, an uncertainty driven mask loss directs the models focus toward more critical regions, further improving synthesis accuracy. Extensive experiments show that our method outperforms SOTA baselines, and the downstream segmentation performance highlights the diagnostic value of the synthetic results.

Frequency-Calibrated Membership Inference Attacks on Medical Image Diffusion Models

Xinkai Zhao, Yuta Tokuoka, Junichiro Iwasawa, Keita Oda

arxiv logopreprintJun 17 2025
The increasing use of diffusion models for image generation, especially in sensitive areas like medical imaging, has raised significant privacy concerns. Membership Inference Attack (MIA) has emerged as a potential approach to determine if a specific image was used to train a diffusion model, thus quantifying privacy risks. Existing MIA methods often rely on diffusion reconstruction errors, where member images are expected to have lower reconstruction errors than non-member images. However, applying these methods directly to medical images faces challenges. Reconstruction error is influenced by inherent image difficulty, and diffusion models struggle with high-frequency detail reconstruction. To address these issues, we propose a Frequency-Calibrated Reconstruction Error (FCRE) method for MIAs on medical image diffusion models. By focusing on reconstruction errors within a specific mid-frequency range and excluding both high-frequency (difficult to reconstruct) and low-frequency (less informative) regions, our frequency-selective approach mitigates the confounding factor of inherent image difficulty. Specifically, we analyze the reverse diffusion process, obtain the mid-frequency reconstruction error, and compute the structural similarity index score between the reconstructed and original images. Membership is determined by comparing this score to a threshold. Experiments on several medical image datasets demonstrate that our FCRE method outperforms existing MIA methods.

A Semi-supervised Ultrasound Image Segmentation Network Integrating Enhanced Mask Learning and Dynamic Temperature-controlled Self-distillation.

Xu L, Huang Y, Zhou H, Mao Q, Yin W

pubmed logopapersJun 16 2025
Ultrasound imaging is widely used in clinical practice due to its advantages of no radiation and real-time capability. However, its image quality is often degraded by speckle noise, low contrast, and blurred boundaries, which pose significant challenges for automatic segmentation. In recent years, deep learning methods have achieved notable progress in ultrasound image segmentation. Nonetheless, these methods typically require large-scale annotated datasets, incur high computational costs, and suffer from slow inference speeds, limiting their clinical applicability. To overcome these limitations, we propose EML-DMSD, a novel semi-supervised segmentation network that combines Enhanced Mask Learning (EML) and Dynamic Temperature-Controlled Multi-Scale Self-Distillation (DMSD). The EML module improves the model's robustness to noise and boundary ambiguity, while the DMSD module introduces a teacher-free, multi-scale self-distillation strategy with dynamic temperature adjustment to boost inference efficiency and reduce reliance on extensive resources. Experiments on multiple ultrasound benchmark datasets demonstrate that EML-DMSD achieves superior segmentation accuracy with efficient inference, highlighting its strong generalization ability and clinical potential.

PRO: Projection Domain Synthesis for CT Imaging

Kang Chen, Bin Huang, Xuebin Yang, Junyan Zhang, Qiegen Liu

arxiv logopreprintJun 16 2025
Synthesizing high quality CT images remains a signifi-cant challenge due to the limited availability of annotat-ed data and the complex nature of CT imaging. In this work, we present PRO, a novel framework that, to the best of our knowledge, is the first to perform CT image synthesis in the projection domain using latent diffusion models. Unlike previous approaches that operate in the image domain, PRO learns rich structural representa-tions from raw projection data and leverages anatomi-cal text prompts for controllable synthesis. This projec-tion domain strategy enables more faithful modeling of underlying imaging physics and anatomical structures. Moreover, PRO functions as a foundation model, capa-ble of generalizing across diverse downstream tasks by adjusting its generative behavior via prompt inputs. Experimental results demonstrated that incorporating our synthesized data significantly improves perfor-mance across multiple downstream tasks, including low-dose and sparse-view reconstruction, even with limited training data. These findings underscore the versatility and scalability of PRO in data generation for various CT applications. These results highlight the potential of projection domain synthesis as a powerful tool for data augmentation and robust CT imaging. Our source code is publicly available at: https://github.com/yqx7150/PRO.

ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs

Bikram Keshari Parida, Anusree P. Sunilkumar, Abhijit Sen, Wonsang You

arxiv logopreprintJun 16 2025
Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) providing 2D oral cavity representations, and Cone-Beam Computed Tomography (CBCT) offering detailed 3D anatomical information. While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility. Existing reconstruction models further complicate the process by requiring CBCT flattening or prior dental arch information, often unavailable clinically. We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX. Our key innovations include: (1) enhancing the NeBLa framework with Vision Transformers for improved reconstruction capabilities without requiring CBCT flattening or prior dental arch information, (2) implementing a novel horseshoe-shaped point sampling strategy with non-intersecting rays that eliminates intermediate density aggregation required by existing models due to intersecting rays, reducing sampling point computations by $52 \%$, (3) replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior global and local feature extraction, and (4) implementing learnable hash positional encoding for better higher-dimensional representation of 3D sample points compared to existing Fourier-based dense positional encoding. Experiments demonstrate that ViT-NeBLa significantly outperforms prior state-of-the-art methods both quantitatively and qualitatively, offering a cost-effective, radiation-efficient alternative for enhanced dental diagnostics.

TCFNet: Bidirectional face-bone transformation via a Transformer-based coarse-to-fine point movement network.

Zhang R, Jie B, He Y, Wang J

pubmed logopapersJun 16 2025
Computer-aided surgical simulation is a critical component of orthognathic surgical planning, where accurately simulating face-bone shape transformations is significant. The traditional biomechanical simulation methods are limited by their computational time consumption levels, labor-intensive data processing strategies and low accuracy. Recently, deep learning-based simulation methods have been proposed to view this problem as a point-to-point transformation between skeletal and facial point clouds. However, these approaches cannot process large-scale points, have limited receptive fields that lead to noisy points, and employ complex preprocessing and postprocessing operations based on registration. These shortcomings limit the performance and widespread applicability of such methods. Therefore, we propose a Transformer-based coarse-to-fine point movement network (TCFNet) to learn unique, complicated correspondences at the patch and point levels for dense face-bone point cloud transformations. This end-to-end framework adopts a Transformer-based network and a local information aggregation network (LIA-Net) in the first and second stages, respectively, which reinforce each other to generate precise point movement paths. LIA-Net can effectively compensate for the neighborhood precision loss of the Transformer-based network by modeling local geometric structures (edges, orientations and relative position features). The previous global features are employed to guide the local displacement using a gated recurrent unit. Inspired by deformable medical image registration, we propose an auxiliary loss that can utilize expert knowledge for reconstructing critical organs. Our framework is an unsupervised algorithm, and this loss is optional. Compared with the existing state-of-the-art (SOTA) methods on gathered datasets, TCFNet achieves outstanding evaluation metrics and visualization results. The code is available at https://github.com/Runshi-Zhang/TCFNet.

PRO: Projection Domain Synthesis for CT Imaging

Kang Chen, Bin Huang, Xuebin Yang, Junyan Zhang, Qiegen Liu

arxiv logopreprintJun 16 2025
Synthesizing high quality CT projection data remains a significant challenge due to the limited availability of annotated data and the complex nature of CT imaging. In this work, we present PRO, a projection domain synthesis foundation model for CT imaging. To the best of our knowledge, this is the first study that performs CT synthesis in the projection domain. Unlike previous approaches that operate in the image domain, PRO learns rich structural representations from raw projection data and leverages anatomical text prompts for controllable synthesis. This projection domain strategy enables more faithful modeling of underlying imaging physics and anatomical structures. Moreover, PRO functions as a foundation model, capable of generalizing across diverse downstream tasks by adjusting its generative behavior via prompt inputs. Experimental results demonstrated that incorporating our synthesized data significantly improves performance across multiple downstream tasks, including low-dose and sparse-view reconstruction. These findings underscore the versatility and scalability of PRO in data generation for various CT applications. These results highlight the potential of projection domain synthesis as a powerful tool for data augmentation and robust CT imaging. Our source code is publicly available at: https://github.com/yqx7150/PRO.

MoNetV2: Enhanced Motion Network for Freehand 3D Ultrasound Reconstruction

Mingyuan Luo, Xin Yang, Zhongnuo Yan, Yan Cao, Yuanji Zhang, Xindi Hu, Jin Wang, Haoxuan Ding, Wei Han, Litao Sun, Dong Ni

arxiv logopreprintJun 16 2025
Three-dimensional (3D) ultrasound (US) aims to provide sonographers with the spatial relationships of anatomical structures, playing a crucial role in clinical diagnosis. Recently, deep-learning-based freehand 3D US has made significant advancements. It reconstructs volumes by estimating transformations between images without external tracking. However, image-only reconstruction poses difficulties in reducing cumulative drift and further improving reconstruction accuracy, particularly in scenarios involving complex motion trajectories. In this context, we propose an enhanced motion network (MoNetV2) to enhance the accuracy and generalizability of reconstruction under diverse scanning velocities and tactics. First, we propose a sensor-based temporal and multi-branch structure that fuses image and motion information from a velocity perspective to improve image-only reconstruction accuracy. Second, we devise an online multi-level consistency constraint that exploits the inherent consistency of scans to handle various scanning velocities and tactics. This constraint exploits both scan-level velocity consistency, path-level appearance consistency, and patch-level motion consistency to supervise inter-frame transformation estimation. Third, we distill an online multi-modal self-supervised strategy that leverages the correlation between network estimation and motion information to further reduce cumulative errors. Extensive experiments clearly demonstrate that MoNetV2 surpasses existing methods in both reconstruction quality and generalizability performance across three large datasets.
Page 15 of 45448 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.