Sort by:
Page 81 of 2922917 results

MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation

Peiting Tian, Xi Chen, Haixia Bi, Fan Li

arxiv logopreprintJun 30 2025
Medical image segmentation plays a crucial role in clinical diagnosis and treatment planning, where accurate boundary delineation is essential for precise lesion localization, organ identification, and quantitative assessment. In recent years, deep learning-based methods have significantly advanced segmentation accuracy. However, two major challenges remain. First, the performance of these methods heavily relies on large-scale annotated datasets, which are often difficult to obtain in medical scenarios due to privacy concerns and high annotation costs. Second, clinically challenging scenarios, such as low contrast in certain imaging modalities and blurry lesion boundaries caused by malignancy, still pose obstacles to precise segmentation. To address these challenges, we propose MedSAM-CA, an architecture-level fine-tuning approach that mitigates reliance on extensive manual annotations by adapting the pretrained foundation model, Medical Segment Anything (MedSAM). MedSAM-CA introduces two key components: the Convolutional Attention-Enhanced Boundary Refinement Network (CBR-Net) and the Attention-Enhanced Feature Fusion Block (Atte-FFB). CBR-Net operates in parallel with the MedSAM encoder to recover boundary information potentially overlooked by long-range attention mechanisms, leveraging hierarchical convolutional processing. Atte-FFB, embedded in the MedSAM decoder, fuses multi-level fine-grained features from skip connections in CBR-Net with global representations upsampled within the decoder to enhance boundary delineation accuracy. Experiments on publicly available datasets covering dermoscopy, CT, and MRI imaging modalities validate the effectiveness of MedSAM-CA. On dermoscopy dataset, MedSAM-CA achieves 94.43% Dice with only 2% of full training data, reaching 97.25% of full-data training performance, demonstrating strong effectiveness in low-resource clinical settings.

GUSL: A Novel and Efficient Machine Learning Model for Prostate Segmentation on MRI

Jiaxin Yang, Vasileios Magoulianitis, Catherine Aurelia Christie Alexander, Jintang Xue, Masatomo Kaneko, Giovanni Cacciamani, Andre Abreu, Vinay Duddalwar, C. -C. Jay Kuo, Inderbir S. Gill, Chrysostomos Nikias

arxiv logopreprintJun 30 2025
Prostate and zonal segmentation is a crucial step for clinical diagnosis of prostate cancer (PCa). Computer-aided diagnosis tools for prostate segmentation are based on the deep learning (DL) paradigm. However, deep neural networks are perceived as "black-box" solutions by physicians, thus making them less practical for deployment in the clinical setting. In this paper, we introduce a feed-forward machine learning model, named Green U-shaped Learning (GUSL), suitable for medical image segmentation without backpropagation. GUSL introduces a multi-layer regression scheme for coarse-to-fine segmentation. Its feature extraction is based on a linear model, which enables seamless interpretability during feature extraction. Also, GUSL introduces a mechanism for attention on the prostate boundaries, which is an error-prone region, by employing regression to refine the predictions through residue correction. In addition, a two-step pipeline approach is used to mitigate the class imbalance, an issue inherent in medical imaging problems. After conducting experiments on two publicly available datasets and one private dataset, in both prostate gland and zonal segmentation tasks, GUSL achieves state-of-the-art performance among other DL-based models. Notably, GUSL features a very energy-efficient pipeline, since it has a model size several times smaller and less complexity than the rest of the solutions. In all datasets, GUSL achieved a Dice Similarity Coefficient (DSC) performance greater than $0.9$ for gland segmentation. Considering also its lightweight model size and transparency in feature extraction, it offers a competitive and practical package for medical imaging applications.

VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation

Peng Huang, Junhu Fu, Bowen Guo, Zeju Li, Yuanyuan Wang, Yi Guo

arxiv logopreprintJun 30 2025
As the appearance of medical images is influenced by multiple underlying factors, generative models require rich attribute information beyond labels to produce realistic and diverse images. For instance, generating an image of skin lesion with specific patterns demands descriptions that go beyond diagnosis, such as shape, size, texture, and color. However, such detailed descriptions are not always accessible. To address this, we explore a framework, termed Visual Attribute Prompts (VAP)-Diffusion, to leverage external knowledge from pre-trained Multi-modal Large Language Models (MLLMs) to improve the quality and diversity of medical image generation. First, to derive descriptions from MLLMs without hallucination, we design a series of prompts following Chain-of-Thoughts for common medical imaging tasks, including dermatologic, colorectal, and chest X-ray images. Generated descriptions are utilized during training and stored across different categories. During testing, descriptions are randomly retrieved from the corresponding category for inference. Moreover, to make the generator robust to unseen combination of descriptions at the test time, we propose a Prototype Condition Mechanism that restricts test embeddings to be similar to those from training. Experiments on three common types of medical imaging across four datasets verify the effectiveness of VAP-Diffusion.

Diffusion Model-based Data Augmentation Method for Fetal Head Ultrasound Segmentation

Fangyijie Wang, Kevin Whelan, Félix Balado, Guénolé Silvestre, Kathleen M. Curran

arxiv logopreprintJun 30 2025
Medical image data is less accessible than in other domains due to privacy and regulatory constraints. In addition, labeling requires costly, time-intensive manual image annotation by clinical experts. To overcome these challenges, synthetic medical data generation offers a promising solution. Generative AI (GenAI), employing generative deep learning models, has proven effective at producing realistic synthetic images. This study proposes a novel mask-guided GenAI approach using diffusion models to generate synthetic fetal head ultrasound images paired with segmentation masks. These synthetic pairs augment real datasets for supervised fine-tuning of the Segment Anything Model (SAM). Our results show that the synthetic data captures real image features effectively, and this approach reaches state-of-the-art fetal head segmentation, especially when trained with a limited number of real image-mask pairs. In particular, the segmentation reaches Dice Scores of 94.66\% and 94.38\% using a handful of ultrasound images from the Spanish and African cohorts, respectively. Our code, models, and data are available on GitHub.

Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound

Yuhao Huang, Yueyue Xu, Haoran Dou, Jiaxiao Deng, Xin Yang, Hongyu Zheng, Dong Ni

arxiv logopreprintJun 30 2025
Congenital uterine anomalies (CUAs) can lead to infertility, miscarriage, preterm birth, and an increased risk of pregnancy complications. Compared to traditional 2D ultrasound (US), 3D US can reconstruct the coronal plane, providing a clear visualization of the uterine morphology for assessing CUAs accurately. In this paper, we propose an intelligent system for simultaneous automated plane localization and CUA diagnosis. Our highlights are: 1) we develop a denoising diffusion model with local (plane) and global (volume/text) guidance, using an adaptive weighting strategy to optimize attention allocation to different conditions; 2) we introduce a reinforcement learning-based framework with unsupervised rewards to extract the key slice summary from redundant sequences, fully integrating information across multiple planes to reduce learning difficulty; 3) we provide text-driven uncertainty modeling for coarse prediction, and leverage it to adjust the classification probability for overall performance improvement. Extensive experiments on a large 3D uterine US dataset show the efficacy of our method, in terms of plane localization and CUA diagnosis. Code is available at https://github.com/yuhoo0302/CUA-US.

FD-DiT: Frequency Domain-Directed Diffusion Transformer for Low-Dose CT Reconstruction

Qiqing Liu, Guoquan Wei, Zekun Zhou, Yiyang Wen, Liu Shi, Qiegen Liu

arxiv logopreprintJun 30 2025
Low-dose computed tomography (LDCT) reduces radiation exposure but suffers from image artifacts and loss of detail due to quantum and electronic noise, potentially impacting diagnostic accuracy. Transformer combined with diffusion models has been a promising approach for image generation. Nevertheless, existing methods exhibit limitations in preserving finegrained image details. To address this issue, frequency domain-directed diffusion transformer (FD-DiT) is proposed for LDCT reconstruction. FD-DiT centers on a diffusion strategy that progressively introduces noise until the distribution statistically aligns with that of LDCT data, followed by denoising processing. Furthermore, we employ a frequency decoupling technique to concentrate noise primarily in high-frequency domain, thereby facilitating effective capture of essential anatomical structures and fine details. A hybrid denoising network is then utilized to optimize the overall data reconstruction process. To enhance the capability in recognizing high-frequency noise, we incorporate sliding sparse local attention to leverage the sparsity and locality of shallow-layer information, propagating them via skip connections for improving feature representation. Finally, we propose a learnable dynamic fusion strategy for optimal component integration. Experimental results demonstrate that at identical dose levels, LDCT images reconstructed by FD-DiT exhibit superior noise and artifact suppression compared to state-of-the-art methods.

Contrastive Learning with Diffusion Features for Weakly Supervised Medical Image Segmentation

Dewen Zeng, Xinrong Hu, Yu-Jen Chen, Yawen Wu, Xiaowei Xu, Yiyu Shi

arxiv logopreprintJun 30 2025
Weakly supervised semantic segmentation (WSSS) methods using class labels often rely on class activation maps (CAMs) to localize objects. However, traditional CAM-based methods struggle with partial activations and imprecise object boundaries due to optimization discrepancies between classification and segmentation. Recently, the conditional diffusion model (CDM) has been used as an alternative for generating segmentation masks in WSSS, leveraging its strong image generation capabilities tailored to specific class distributions. By modifying or perturbing the condition during diffusion sampling, the related objects can be highlighted in the generated images. Yet, the saliency maps generated by CDMs are prone to noise from background alterations during reverse diffusion. To alleviate the problem, we introduce Contrastive Learning with Diffusion Features (CLDF), a novel method that uses contrastive learning to train a pixel decoder to map the diffusion features from a frozen CDM to a low-dimensional embedding space for segmentation. Specifically, we integrate gradient maps generated from CDM external classifier with CAMs to identify foreground and background pixels with fewer false positives/negatives for contrastive learning, enabling robust pixel embedding learning. Experimental results on four segmentation tasks from two public medical datasets demonstrate that our method significantly outperforms existing baselines.

Efficient Cerebral Infarction Segmentation Using U-Net and U-Net3 + Models.

Yuce E, Sahin ME, Ulutas H, Erkoç MF

pubmed logopapersJun 30 2025
Cerebral infarction remains a leading cause of mortality and long-term disability globally, underscoring the critical importance of early diagnosis and timely intervention to enhance patient outcomes. This study introduces a novel approach to cerebral infarction segmentation using a novel dataset comprising MRI scans of 110 patients, retrospectively collected from Yozgat Bozok University Research Hospital. Two convolutional neural network architectures, the basic U-Net and the advanced U-Net3 + , are employed to segment infarction regions with high precision. Ground-truth annotations are generated under the supervision of an experienced radiologist, and data augmentation techniques are applied to address dataset limitations, resulting in 6732 balanced images for training, validation, and testing. Performance evaluation is conducted using metrics such as the dice score, Intersection over Union (IoU), pixel accuracy, and specificity. The basic U-Net achieved superior performance with a dice score of 0.8947, a mean IoU of 0.8798, a pixel accuracy of 0.9963, and a specificity of 0.9984, outperforming U-Net3 + despite its simpler architecture. U-Net3 + , with its complex structure and advanced features, delivered competitive results, highlighting the potential trade-off between model complexity and performance in medical imaging tasks. This study underscores the significance of leveraging deep learning for precise and efficient segmentation of cerebral infarction. The results demonstrate the capability of CNN-based architectures to support medical decision-making, offering a promising pathway for advancing stroke diagnosis and treatment planning.

Evaluation of Cone-Beam Computed Tomography Images with Artificial Intelligence.

Arı T, Bayrakdar IS, Çelik Ö, Bilgir E, Kuran A, Orhan K

pubmed logopapersJun 30 2025
This study aims to evaluate the success of artificial intelligence models developed using convolutional neural network-based algorithms on CBCT images. Labeling was done by segmentation method for 15 different conditions including caries, restorative filling material, root-canal filling material, dental implant, implant supported crown, crown, pontic, impacted tooth, supernumerary tooth, residual root, osteosclerotic area, periapical lesion, radiolucent jaw lesion, radiopaque jaw lesion, and mixed appearing jaw lesion on the data set consisting of 300 CBCT images. In model development, the Mask R-CNN architecture and ResNet 101 model were used as a transfer learning method. The success metrics of the model were calculated with the confusion matrix method. When the F1 scores of the developed models were evaluated, the most successful dental implant was found to be 1, and the lowest F1 score was found to be a mixed appearing jaw lesion. F1 scores were respectively dental implant, root canal filling material, implant supported crown, restorative filling material, radiopaque jaw lesion, crown, pontic, impacted tooth, caries, residual tooth root, radiolucent jaw lesion, osteosclerotic area, periapical lesion, supernumerary tooth, for mixed appearing jaw lesion; 1 is 0.99, 0.98, 0.98, 0.97, 0.96, 0.96, 0.95, 0.94, 0.94, 0.94, 0.90, 0.90, 0.87, and 0.8. Interpreting CBCT images is a time-consuming process and requires expertise. In the era of digital transformation, artificial intelligence-based systems that can automatically evaluate images and convert them into report format as a decision support mechanism will contribute to reducing the workload of physicians, thus increasing the time allocated to the interpretation of pathologies.

Deep learning for automated, motion-resolved tumor segmentation in radiotherapy.

Sarkar S, Teo PT, Abazeed ME

pubmed logopapersJun 30 2025
Accurate tumor delineation is foundational to radiotherapy. In the era of deep learning, the automation of this labor-intensive and variation-prone process is increasingly tractable. We developed a deep neural network model to segment gross tumor volumes (GTVs) in the lung and propagate them across 4D CT images to generate an internal target volume (ITV), capturing tumor motion during respiration. Using a multicenter cohort-based registry from 9 clinics across 2 health systems, we trained a 3D UNet model (iSeg) on pre-treatment CT images and corresponding GTV masks (n = 739, 5-fold cross-validation) and validated it on two independent cohorts (n = 161; n = 102). The internal cohort achieved a median Dice (DSC) of 0.73 [IQR: 0.62-0.80], with comparable performance in external cohorts (DSC = 0.70 [0.52-0.78] and 0.71 [0.59-79]), indicating multi-site validation. iSeg matched human inter-observer variability and was robust to image quality and tumor motion (DSC = 0.77 [0.68-0.86]). Machine-generated ITVs were significantly smaller than physician delineated contours (p < 0.0001), indicating more precise delineation. Notably, higher false positive voxel rate (regions segmented by the machine but not the human) were associated with increased local failure (HR: 1.01 per voxel, p = 0.03), suggesting the clinical relevance of these discordant regions. These results mark a leap in automated target volume segmentation and suggest that machine delineation can enhance the accuracy, reproducibility, and efficiency of this core task in radiotherapy.
Page 81 of 2922917 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.