Sort by:
Page 8 of 45448 results

MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation

Peiting Tian, Xi Chen, Haixia Bi, Fan Li

arxiv logopreprintJun 30 2025
Medical image segmentation plays a crucial role in clinical diagnosis and treatment planning, where accurate boundary delineation is essential for precise lesion localization, organ identification, and quantitative assessment. In recent years, deep learning-based methods have significantly advanced segmentation accuracy. However, two major challenges remain. First, the performance of these methods heavily relies on large-scale annotated datasets, which are often difficult to obtain in medical scenarios due to privacy concerns and high annotation costs. Second, clinically challenging scenarios, such as low contrast in certain imaging modalities and blurry lesion boundaries caused by malignancy, still pose obstacles to precise segmentation. To address these challenges, we propose MedSAM-CA, an architecture-level fine-tuning approach that mitigates reliance on extensive manual annotations by adapting the pretrained foundation model, Medical Segment Anything (MedSAM). MedSAM-CA introduces two key components: the Convolutional Attention-Enhanced Boundary Refinement Network (CBR-Net) and the Attention-Enhanced Feature Fusion Block (Atte-FFB). CBR-Net operates in parallel with the MedSAM encoder to recover boundary information potentially overlooked by long-range attention mechanisms, leveraging hierarchical convolutional processing. Atte-FFB, embedded in the MedSAM decoder, fuses multi-level fine-grained features from skip connections in CBR-Net with global representations upsampled within the decoder to enhance boundary delineation accuracy. Experiments on publicly available datasets covering dermoscopy, CT, and MRI imaging modalities validate the effectiveness of MedSAM-CA. On dermoscopy dataset, MedSAM-CA achieves 94.43% Dice with only 2% of full training data, reaching 97.25% of full-data training performance, demonstrating strong effectiveness in low-resource clinical settings.

GUSL: A Novel and Efficient Machine Learning Model for Prostate Segmentation on MRI

Jiaxin Yang, Vasileios Magoulianitis, Catherine Aurelia Christie Alexander, Jintang Xue, Masatomo Kaneko, Giovanni Cacciamani, Andre Abreu, Vinay Duddalwar, C. -C. Jay Kuo, Inderbir S. Gill, Chrysostomos Nikias

arxiv logopreprintJun 30 2025
Prostate and zonal segmentation is a crucial step for clinical diagnosis of prostate cancer (PCa). Computer-aided diagnosis tools for prostate segmentation are based on the deep learning (DL) paradigm. However, deep neural networks are perceived as "black-box" solutions by physicians, thus making them less practical for deployment in the clinical setting. In this paper, we introduce a feed-forward machine learning model, named Green U-shaped Learning (GUSL), suitable for medical image segmentation without backpropagation. GUSL introduces a multi-layer regression scheme for coarse-to-fine segmentation. Its feature extraction is based on a linear model, which enables seamless interpretability during feature extraction. Also, GUSL introduces a mechanism for attention on the prostate boundaries, which is an error-prone region, by employing regression to refine the predictions through residue correction. In addition, a two-step pipeline approach is used to mitigate the class imbalance, an issue inherent in medical imaging problems. After conducting experiments on two publicly available datasets and one private dataset, in both prostate gland and zonal segmentation tasks, GUSL achieves state-of-the-art performance among other DL-based models. Notably, GUSL features a very energy-efficient pipeline, since it has a model size several times smaller and less complexity than the rest of the solutions. In all datasets, GUSL achieved a Dice Similarity Coefficient (DSC) performance greater than $0.9$ for gland segmentation. Considering also its lightweight model size and transparency in feature extraction, it offers a competitive and practical package for medical imaging applications.

MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation

Sunggu Kyung, Jinyoung Seo, Hyunseok Lim, Dongyeong Kim, Hyungbin Park, Jimin Sung, Jihyun Kim, Wooyoung Jo, Yoojin Nam, Namkug Kim

arxiv logopreprintJun 29 2025
The recent release of RadGenome-Chest CT has significantly advanced CT-based report generation. However, existing methods primarily focus on global features, making it challenging to capture region-specific details, which may cause certain abnormalities to go unnoticed. To address this, we propose MedRegion-CT, a region-focused Multi-Modal Large Language Model (MLLM) framework, featuring three key innovations. First, we introduce Region Representative ($R^2$) Token Pooling, which utilizes a 2D-wise pretrained vision model to efficiently extract 3D CT features. This approach generates global tokens representing overall slice features and region tokens highlighting target areas, enabling the MLLM to process comprehensive information effectively. Second, a universal segmentation model generates pseudo-masks, which are then processed by a mask encoder to extract region-centric features. This allows the MLLM to focus on clinically relevant regions, using six predefined region masks. Third, we leverage segmentation results to extract patient-specific attributions, including organ size, diameter, and locations. These are converted into text prompts, enriching the MLLM's understanding of patient-specific contexts. To ensure rigorous evaluation, we conducted benchmark experiments on report generation using the RadGenome-Chest CT. MedRegion-CT achieved state-of-the-art performance, outperforming existing methods in natural language generation quality and clinical relevance while maintaining interpretability. The code for our framework is publicly available.

CA-Diff: Collaborative Anatomy Diffusion for Brain Tissue Segmentation

Qilong Xing, Zikai Song, Yuteng Ye, Yuke Chen, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang

arxiv logopreprintJun 28 2025
Segmentation of brain structures from MRI is crucial for evaluating brain morphology, yet existing CNN and transformer-based methods struggle to delineate complex structures accurately. While current diffusion models have shown promise in image segmentation, they are inadequate when applied directly to brain MRI due to neglecting anatomical information. To address this, we propose Collaborative Anatomy Diffusion (CA-Diff), a framework integrating spatial anatomical features to enhance segmentation accuracy of the diffusion model. Specifically, we introduce distance field as an auxiliary anatomical condition to provide global spatial context, alongside a collaborative diffusion process to model its joint distribution with anatomical structures, enabling effective utilization of anatomical features for segmentation. Furthermore, we introduce a consistency loss to refine relationships between the distance field and anatomical structures and design a time adapted channel attention module to enhance the U-Net feature fusion procedure. Extensive experiments show that CA-Diff outperforms state-of-the-art (SOTA) methods.

Inpainting is All You Need: A Diffusion-based Augmentation Method for Semi-supervised Medical Image Segmentation

Xinrong Hu, Yiyu Shi

arxiv logopreprintJun 28 2025
Collecting pixel-level labels for medical datasets can be a laborious and expensive process, and enhancing segmentation performance with a scarcity of labeled data is a crucial challenge. This work introduces AugPaint, a data augmentation framework that utilizes inpainting to generate image-label pairs from limited labeled data. AugPaint leverages latent diffusion models, known for their ability to generate high-quality in-domain images with low overhead, and adapts the sampling process for the inpainting task without need for retraining. Specifically, given a pair of image and label mask, we crop the area labeled with the foreground and condition on it during reversed denoising process for every noise level. Masked background area would gradually be filled in, and all generated images are paired with the label mask. This approach ensures the accuracy of match between synthetic images and label masks, setting it apart from existing dataset generation methods. The generated images serve as valuable supervision for training downstream segmentation models, effectively addressing the challenge of limited annotations. We conducted extensive evaluations of our data augmentation method on four public medical image segmentation datasets, including CT, MRI, and skin imaging. Results across all datasets demonstrate that AugPaint outperforms state-of-the-art label-efficient methodologies, significantly improving segmentation performance.

Quantifying Sagittal Craniosynostosis Severity: A Machine Learning Approach With CranioRate.

Tao W, Somorin TJ, Kueper J, Dixon A, Kass N, Khan N, Iyer K, Wagoner J, Rogers A, Whitaker R, Elhabian S, Goldstein JA

pubmed logopapersJun 27 2025
ObjectiveTo develop and validate machine learning (ML) models for objective and comprehensive quantification of sagittal craniosynostosis (SCS) severity, enhancing clinical assessment, management, and research.DesignA cross-sectional study that combined the analysis of computed tomography (CT) scans and expert ratings.SettingThe study was conducted at a children's hospital and a major computer imaging institution. Our survey collected expert ratings from participating surgeons.ParticipantsThe study included 195 patients with nonsyndromic SCS, 221 patients with nonsyndromic metopic craniosynostosis (CS), and 178 age-matched controls. Fifty-four craniofacial surgeons participated in rating 20 patients head CT scans.InterventionsComputed tomography scans for cranial morphology assessment and a radiographic diagnosis of nonsyndromic SCS.Main OutcomesAccuracy of the proposed Sagittal Severity Score (SSS) in predicting expert ratings compared to cephalic index (CI). Secondary outcomes compared Likert ratings with SCS status, the predictive power of skull-based versus skin-based landmarks, and assessments of an unsupervised ML model, the Cranial Morphology Deviation (CMD), as an alternative without ratings.ResultsThe SSS achieved significantly higher accuracy in predicting expert responses than CI (<i>P</i> < .05). Likert ratings outperformed SCS status in supervising ML models to quantify within-group variations. Skin-based landmarks demonstrated equivalent predictive power as skull landmarks (<i>P</i> < .05, threshold 0.02). The CMD demonstrated a strong correlation with the SSS (Pearson coefficient: 0.92, Spearman coefficient: 0.90, <i>P</i> < .01).ConclusionsThe SSS and CMD can provide accurate, consistent, and comprehensive quantification of SCS severity. Implementing these data-driven ML models can significantly advance CS care through standardized assessments, enhanced precision, and informed surgical planning.

<sup>Advanced glaucoma disease segmentation and classification with grey wolf optimized U</sup> <sup>-Net++ and capsule networks</sup>.

Govindharaj I, Deva Priya W, Soujanya KLS, Senthilkumar KP, Shantha Shalini K, Ravichandran S

pubmed logopapersJun 27 2025
Early detection of glaucoma represents a vital factor in securing vision while the disease retains its position as one of the central causes of blindness worldwide. The current glaucoma screening strategies with expert interpretation depend on complex and time-consuming procedures which slow down both diagnosis processes and intervention timing. This research adopts a complex automated glaucoma diagnostic system that combines optimized segmentation solutions together with classification platforms. The proposed segmentation approach implements an enhanced version of U-Net++ using dynamic parameter control provided by GWO to segment optic disc and cup regions in retinal fundus images. Through the implementation of GWO the algorithm uses wolf-pack hunting strategies to adjust parameters dynamically which enables it to locate diverse textural patterns inside images. The system uses a CapsNet capsule network for classification because it maintains visual spatial organization to detect glaucoma-related patterns precisely. The developed system secures an evaluation accuracy of 95.1% in segmentation and classification tasks better than typical approaches. The automated system eliminates and enhances clinical diagnostic speed as well as diagnostic precision. The tool stands out because of its supreme detection accuracy and reliability thus making it an essential clinical early-stage glaucoma diagnostic system and a scalable healthcare deployment solution. To develop an advanced automated glaucoma diagnostic system by integrating an optimized U-Net++ segmentation model with a Capsule Network (CapsNet) classifier, enhanced through Grey Wolf Optimization Algorithm (GWOA), for precise segmentation of optic disc and cup regions and accurate glaucoma classification from retinal fundus images. This study proposes a two-phase computer-assisted diagnosis (CAD) framework. In the segmentation phase, an enhanced U-Net++ model, optimized by GWOA, is employed to accurately delineate the optic disc and cup regions in fundus images. The optimization dynamically tunes hyperparameters based on grey wolf hunting behavior for improved segmentation precision. In the classification phase, a CapsNet architecture is used to maintain spatial hierarchies and effectively classify images as glaucomatous or normal based on segmented outputs. The performance of the proposed model was validated using the ORIGA retinal fundus image dataset, and evaluated against conventional approaches. The proposed GWOA-UNet++ and CapsNet framework achieved a segmentation and classification accuracy of 95.1%, outperforming existing benchmark models such as MTA-CS, ResFPN-Net, DAGCN, MRSNet and AGCT. The model demonstrated robustness against image irregularities, including variations in optic disc size and fundus image quality, and showed superior performance across accuracy, sensitivity, specificity, precision, and F1-score metrics. The developed automated glaucoma detection system exhibits enhanced diagnostic accuracy, efficiency, and reliability, offering significant potential for early-stage glaucoma detection and clinical decision support. Future work will involve large-scale multi-ethnic dataset validation, integration with clinical workflows, and deployment as a mobile or cloud-based screening tool.

A two-step automatic identification of contrast phases for abdominal CT images based on residual networks.

Liu Q, Jiang J, Wu K, Zhang Y, Sun N, Luo J, Ba T, Lv A, Liu C, Yin Y, Yang Z, Xu H

pubmed logopapersJun 27 2025
To develop a deep learning model based on Residual Networks (ResNet) for the automated and accurate identification of contrast phases in abdominal CT images. A dataset of 1175 abdominal contrast-enhanced CT scans was retrospectively collected for the model development, and another independent dataset of 215 scans from five hospitals was collected for external testing. Each contrast phase was independently annotated by two radiologists. A ResNet-based model was developed to automatically classify phases into the early arterial phase (EAP) or late arterial phase (LAP), portal venous phase (PVP), and delayed phase (DP). Strategy A identified EAP or LAP, PVP, and DP in one step. Strategy B used a two-step approach: first classifying images as arterial phase (AP), PVP, and DP, then further classifying AP images into EAP or LAP. Model performance and strategy comparison were evaluated. In the internal test set, the overall accuracy of the two-step strategy was 98.3% (283/288; p < 0.001), significantly higher than that of the one-step strategy (91.7%, 264/288; p < 0.001). In the external test set, the two-step model achieved an overall accuracy of 99.1% (639/645), with sensitivities of 95.1% (EAP), 99.4% (LAP), 99.5% (PVP), and 99.5% (DP). The proposed two-step ResNet-based model provides highly accurate and robust identification of contrast phases in abdominal CT images, outperforming the conventional one-step strategy. Automated and accurate identification of contrast phases in abdominal CT images provides a robust tool for improving image quality control and establishes a strong foundation for AI-driven applications, particularly those leveraging contrast-enhanced abdominal imaging data. Accurate identification of contrast phases is crucial in abdominal CT imaging. The two-step ResNet-based model achieved superior accuracy across internal and external datasets. Automated phase classification strengthens imaging quality control and supports precision AI applications.

Prospective quality control in chest radiography based on the reconstructed 3D human body.

Tan Y, Ye Z, Ye J, Hou Y, Li S, Liang Z, Li H, Tang J, Xia C, Li Z

pubmed logopapersJun 27 2025
Chest radiography requires effective quality control (QC) to reduce high retake rates. However, existing QC measures are all retrospective and implemented after exposure, often necessitating retakes when image quality fails to meet standards and thereby increasing radiation exposure to patients. To address this issue, we proposed a 3D human body (3D-HB) reconstruction algorithm to realize prospective QC. Our objective was to investigate the feasibility of using the reconstructed 3D-HB for prospective QC in chest radiography and evaluate its impact on retake rates.&#xD;Approach: This prospective study included patients indicated for posteroanterior (PA) and lateral (LA) chest radiography in May 2024. A 3D-HB reconstruction algorithm integrating the SMPL-X model and the HybrIK-X algorithm was proposed to convert patients' 2D images into 3D-HBs. QC metrics regarding patient positioning and collimation were assessed using chest radiographs (reference standard) and 3D-HBs, with results compared using ICCs, linear regression, and receiver operating characteristic curves. For retake rate evaluation, a real-time 3D-HB visualization interface was developed and chest radiography was conducted in two four-week phases: the first without prospective QC and the second with prospective QC. Retake rates between the two phases were compared using chi-square tests. &#xD;Main results: 324 participants were included (mean age, 42 years±19 [SD]; 145 men; 324 PA and 294 LA examinations). The ICCs for the clavicle and midaxillary line angles were 0.80 and 0.78, respectively. Linear regression showed good relation for clavicle angles (R2: 0.655) and midaxillary line angles (R2: 0.616). In PA chest radiography, the AUCs of 3D-HBs were 0.89, 0.87, 0.91 and 0.92 for assessing scapula rotation, lateral tilt, centered positioning and central X-ray alignment respectively, with 97% accuracy in collimation assessment. In LA chest radiography, the AUCs of 3D-HBs were 0.87, 0.84, 0.87 and 0.88 for assessing arms raised, chest rotation, centered positioning and central X-ray alignment respectively, with 94% accuracy in collimation assessment. In retake rate evaluation, 3995 PA and 3295 LA chest radiographs were recorded. The implementation of prospective QC based on the 3D-HB reduced retake rates from 8.6% to 3.5% (PA) and 19.6% to 4.9% (LA) (p < .001).&#xD;Significance: The reconstructed 3D-HB is a feasible tool for prospective QC in chest radiography, providing real-time feedback on patient positioning and collimation before exposure. Prospective QC based on the reconstructed 3D-HB has the potential to reshape the future of radiography QC by significantly reducing retake rates and improving clinical standardization.

Noise-Inspired Diffusion Model for Generalizable Low-Dose CT Reconstruction

Qi Gao, Zhihao Chen, Dong Zeng, Junping Zhang, Jianhua Ma, Hongming Shan

arxiv logopreprintJun 27 2025
The generalization of deep learning-based low-dose computed tomography (CT) reconstruction models to doses unseen in the training data is important and remains challenging. Previous efforts heavily rely on paired data to improve the generalization performance and robustness through collecting either diverse CT data for re-training or a few test data for fine-tuning. Recently, diffusion models have shown promising and generalizable performance in low-dose CT (LDCT) reconstruction, however, they may produce unrealistic structures due to the CT image noise deviating from Gaussian distribution and imprecise prior information from the guidance of noisy LDCT images. In this paper, we propose a noise-inspired diffusion model for generalizable LDCT reconstruction, termed NEED, which tailors diffusion models for noise characteristics of each domain. First, we propose a novel shifted Poisson diffusion model to denoise projection data, which aligns the diffusion process with the noise model in pre-log LDCT projections. Second, we devise a doubly guided diffusion model to refine reconstructed images, which leverages LDCT images and initial reconstructions to more accurately locate prior information and enhance reconstruction fidelity. By cascading these two diffusion models for dual-domain reconstruction, our NEED requires only normal-dose data for training and can be effectively extended to various unseen dose levels during testing via a time step matching strategy. Extensive qualitative, quantitative, and segmentation-based evaluations on two datasets demonstrate that our NEED consistently outperforms state-of-the-art methods in reconstruction and generalization performance. Source code is made available at https://github.com/qgao21/NEED.
Page 8 of 45448 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.