Sort by:
Page 74 of 99986 results

Query Nearby: Offset-Adjusted Mask2Former enhances small-organ segmentation

Xin Zhang, Dongdong Meng, Sheng Li

arxiv logopreprintJun 6 2025
Medical segmentation plays an important role in clinical applications like radiation therapy and surgical guidance, but acquiring clinically acceptable results is difficult. In recent years, progress has been witnessed with the success of utilizing transformer-like models, such as combining the attention mechanism with CNN. In particular, transformer-based segmentation models can extract global information more effectively, compensating for the drawbacks of CNN modules that focus on local features. However, utilizing transformer architecture is not easy, because training transformer-based models can be resource-demanding. Moreover, due to the distinct characteristics in the medical field, especially when encountering mid-sized and small organs with compact regions, their results often seem unsatisfactory. For example, using ViT to segment medical images directly only gives a DSC of less than 50\%, which is far lower than the clinically acceptable score of 80\%. In this paper, we used Mask2Former with deformable attention to reduce computation and proposed offset adjustment strategies to encourage sampling points within the same organs during attention weights computation, thereby integrating compact foreground information better. Additionally, we utilized the 4th feature map in Mask2Former to provide a coarse location of organs, and employed an FCN-based auxiliary head to help train Mask2Former more quickly using Dice loss. We show that our model achieves SOTA (State-of-the-Art) performance on the HaNSeg and SegRap2023 datasets, especially on mid-sized and small organs.Our code is available at link https://github.com/earis/Offsetadjustment\_Background-location\_Decoder\_Mask2former.

Full Conformal Adaptation of Medical Vision-Language Models

Julio Silva-Rodríguez, Leo Fillioux, Paul-Henry Cournède, Maria Vakalopoulou, Stergios Christodoulidis, Ismail Ben Ayed, Jose Dolz

arxiv logopreprintJun 6 2025
Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities and are being progressively integrated into medical image analysis. Although its discriminative potential has been widely explored, its reliability aspect remains overlooked. This work investigates their behavior under the increasingly popular split conformal prediction (SCP) framework, which theoretically guarantees a given error level on output sets by leveraging a labeled calibration set. However, the zero-shot performance of VLMs is inherently limited, and common practice involves few-shot transfer learning pipelines, which cannot absorb the rigid exchangeability assumptions of SCP. To alleviate this issue, we propose full conformal adaptation, a novel setting for jointly adapting and conformalizing pre-trained foundation models, which operates transductively over each test data point using a few-shot adaptation set. Moreover, we complement this framework with SS-Text, a novel training-free linear probe solver for VLMs that alleviates the computational cost of such a transductive approach. We provide comprehensive experiments using 3 different modality-specialized medical VLMs and 9 adaptation tasks. Our framework requires exactly the same data as SCP, and provides consistent relative improvements of up to 27% on set efficiency while maintaining the same coverage guarantees.

Ensemble of weak spectral total-variation learners: a PET-CT case study.

Rosenberg A, Kennedy J, Keidar Z, Zeevi YY, Gilboa G

pubmed logopapersJun 5 2025
Solving computer vision problems through machine learning, one often encounters lack of sufficient training data. To mitigate this, we propose the use of ensembles of weak learners based on spectral total-variation (STV) features (Gilboa G. 2014 A total variation spectral framework for scale and texture analysis. <i>SIAM J. Imaging Sci</i>. <b>7</b>, 1937-1961. (doi:10.1137/130930704)). The features are related to nonlinear eigenfunctions of the total-variation subgradient and can characterize well textures at various scales. It was shown (Burger M, Gilboa G, Moeller M, Eckardt L, Cremers D. 2016 Spectral decompositions using one-homogeneous functionals. <i>SIAM J. Imaging Sci</i>. <b>9</b>, 1374-1408. (doi:10.1137/15m1054687)) that, in the one-dimensional case, orthogonal features are generated, whereas in two dimensions the features are empirically lowly correlated. Ensemble learning theory advocates the use of lowly correlated weak learners. We thus propose here to design ensembles using learners based on STV features. To show the effectiveness of this paradigm, we examine a hard real-world medical imaging problem: the predictive value of computed tomography (CT) data for high uptake in positron emission tomography (PET) for patients suspected of skeletal metastases. The database consists of 457 scans with 1524 unique pairs of registered CT and PET slices. Our approach is compared with deep-learning methods and to radiomics features, showing STV learners perform best (AUC=[Formula: see text]), compared with neural nets (AUC=[Formula: see text]) and radiomics (AUC=[Formula: see text]). We observe that fine STV scales in CT images are especially indicative of the presence of high uptake in PET.This article is part of the theme issue 'Partial differential equations in data science'.

Automatic cervical tumors segmentation in PET/MRI by parallel encoder U-net.

Liu S, Tan Z, Gong T, Tang X, Sun H, Shang F

pubmed logopapersJun 5 2025
Automatic segmentation of cervical tumors is important in quantitative analysis and radiotherapy planning. A parallel encoder U-Net (PEU-Net) integrating the multi-modality information of PET/MRI was proposed to segment cervical tumor, which consisted of two parallel encoders with the same structure for PET and MR images. The features of the two modalities were extracted separately and fused at each layer of the decoder. Res2Net module on skip connection aggregated the features of various scales and refined the segmentation performance. PET/MRI images of 165 patients with cervical cancer were included in this study. U-Net, TransUNet, and nnU-Net with single or multi-modality (PET or/and T2WI) input were used for comparison. The Dice similarity coefficient (DSC) with volume data, DSC and the 95th percentile of Hausdorff distance (HD95) with tumor slices were calculated to evaluate the performance. The proposed PEU-Net exhibited the best performance (DSC<sub>3d</sub>: 0.726 ± 0.204, HD<sub>95</sub>: 4.603 ± 4.579 mm), DSC<sub>2d</sub> (0.871 ± 0.113) was comparable to the best result of TransUNet with PET/MRI (0.873 ± 0.125). The networks with multi-modality input outperformed those with single-modality images as input. The results showed that the proposed PEU-Net could use multi-modality information more effectively through the redesigned structure and achieved competitive performance.

Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images

Rifat Sadik, Tanvir Rahman, Arpan Bhattacharjee, Bikash Chandra Halder, Ismail Hossain

arxiv logopreprintJun 5 2025
Deep learning models have shown remarkable success in dermatological image analysis, offering potential for automated skin disease diagnosis. Previously, convolutional neural network(CNN) based architectures have achieved immense popularity and success in computer vision (CV) based task like skin image recognition, generation and video analysis. But with the emergence of transformer based models, CV tasks are now are nowadays carrying out using these models. Vision Transformers (ViTs) is such a transformer-based models that have shown success in computer vision. It uses self-attention mechanisms to achieve state-of-the-art performance across various tasks. However, their reliance on global attention mechanisms makes them susceptible to adversarial perturbations. This paper aims to investigate the susceptibility of ViTs for medical images to adversarial watermarking-a method that adds so-called imperceptible perturbations in order to fool models. By generating adversarial watermarks through Projected Gradient Descent (PGD), we examine the transferability of such attacks to CNNs and analyze the performance defense mechanism -- adversarial training. Results indicate that while performance is not compromised for clean images, ViTs certainly become much more vulnerable to adversarial attacks: an accuracy drop of as low as 27.6%. Nevertheless, adversarial training raises it up to 90.0%.

StrokeNeXt: an automated stroke classification model using computed tomography and magnetic resonance images.

Ekingen E, Yildirim F, Bayar O, Akbal E, Sercek I, Hafeez-Baig A, Dogan S, Tuncer T

pubmed logopapersJun 5 2025
Stroke ranks among the leading causes of disability and death worldwide. Timely detection can reduce its impact. Machine learning delivers powerful tools for image‑based diagnosis. This study introduces StrokeNeXt, a lightweight convolutional neural network (CNN) for computed tomography (CT) and magnetic resonance (MR) scans, and couples it with deep feature engineering (DFE) to improve accuracy and facilitate clinical deployment. We assembled a multimodal dataset of CT and MR images, each labeled as stroke or control. StrokeNeXt employs a ConvNeXt‑inspired block and a squeeze‑and‑excitation (SE) unit across four stages: stem, StrokeNeXt block, downsampling, and output. In the DFE pipeline, StrokeNeXt extracts features from fixed‑size patches, iterative neighborhood component analysis (INCA) selects the top features, and a t algorithm-based k-nearest neighbors (tkNN) classifier has been utilized for classification. StrokeNeXt achieved 93.67% test accuracy on the assembled dataset. Integrating DFE raised accuracy to 97.06%. This combined approach outperformed StrokeNeXt alone and reduced classification time. StrokeNeXt paired with DFE offers an effective solution for stroke detection on CT and MR images. Its high accuracy and fewer learnable parameters make it lightweight and it is suitable for integration into clinical workflows. This research lays a foundation for real‑time decision support in emergency and radiology settings.

A radiogenomics study on <sup>18</sup>F-FDG PET/CT in endometrial cancer by a novel deep learning segmentation algorithm.

Li X, Shi W, Zhang Q, Lin X, Sun H

pubmed logopapersJun 5 2025
To create an automated PET/CT segmentation method and radiomics model to forecast Mismatch repair (MMR) and TP53 gene expression in endometrial cancer patients, and to examine the effect of gene expression variability on image texture features. We generated two datasets in this retrospective and exploratory study. The first, with 123 histopathologically confirmed patient cases, was used to develop an endometrial cancer segmentation model. The second dataset, including 249 patients for MMR and 179 for TP53 mutation prediction, was derived from PET/CT exams and immunohistochemical analysis. A PET-based Attention-U Net network was used for segmentation, followed by region-growing with co-registered PET and CT images. Feature models were constructed using PET, CT, and combined data, with model selection based on performance comparison. Our segmentation model achieved 99.99% training accuracy and a dice coefficient of 97.35%, with validation accuracy at 99.93% and a dice coefficient of 84.81%. The combined PET + CT model demonstrated superior predictive power for both genes, with AUCs of 0.8146 and 0.8102 for MMR, and 0.8833 and 0.8150 for TP53 in training and test sets, respectively. MMR-related protein heterogeneity and TP53 expression differences were predominantly seen in PET images. An efficient deep learning algorithm for endometrial cancer segmentation has been established, highlighting the enhanced predictive power of integrated PET and CT radiomics for MMR and TP53 expression. The study underscores the distinct influences of MMR and TP53 gene expression on tumor characteristics.

GNNs surpass transformers in tumor medical image segmentation.

Xiao H, Yang G, Li Z, Yi C

pubmed logopapersJun 5 2025
To assess the suitability of Transformer-based architectures for medical image segmentation and investigate the potential advantages of Graph Neural Networks (GNNs) in this domain. We analyze the limitations of the Transformer, which models medical images as sequences of image patches, limiting its flexibility in capturing complex and irregular tumor structures. To address it, we propose U-GNN, a pure GNN-based U-shaped architecture designed for medical image segmentation. U-GNN retains the U-Net-inspired inductive bias while leveraging GNNs' topological modeling capabilities. The architecture consists of Vision GNN blocks stacked into a U-shaped structure. Additionally, we introduce the concept of multi-order similarity and propose a zero-computation-cost approach to incorporate higher-order similarity in graph construction. Each Vision GNN block segments the image into patch nodes, constructs multi-order similarity graphs, and aggregates node features via multi-order node information aggregation. Experimental evaluations on multi-organ and cardiac segmentation datasets demonstrate that U-GNN significantly outperforms existing CNN- and Transformer-based models. U-GNN achieves a 6% improvement in Dice Similarity Coefficient (DSC) and an 18% reduction in Hausdorff Distance (HD) compared to state-of-the-art methods. The source code will be released upon paper acceptance.

Stable Vision Concept Transformers for Medical Diagnosis

Lijie Hu, Songning Lai, Yuan Hua, Shu Yang, Jingfeng Zhang, Di Wang

arxiv logopreprintJun 5 2025
Transparency is a paramount concern in the medical field, prompting researchers to delve into the realm of explainable AI (XAI). Among these XAI methods, Concept Bottleneck Models (CBMs) aim to restrict the model's latent space to human-understandable high-level concepts by generating a conceptual layer for extracting conceptual features, which has drawn much attention recently. However, existing methods rely solely on concept features to determine the model's predictions, which overlook the intrinsic feature embeddings within medical images. To address this utility gap between the original models and concept-based models, we propose Vision Concept Transformer (VCT). Furthermore, despite their benefits, CBMs have been found to negatively impact model performance and fail to provide stable explanations when faced with input perturbations, which limits their application in the medical field. To address this faithfulness issue, this paper further proposes the Stable Vision Concept Transformer (SVCT) based on VCT, which leverages the vision transformer (ViT) as its backbone and incorporates a conceptual layer. SVCT employs conceptual features to enhance decision-making capabilities by fusing them with image features and ensures model faithfulness through the integration of Denoised Diffusion Smoothing. Comprehensive experiments on four medical datasets demonstrate that our VCT and SVCT maintain accuracy while remaining interpretable compared to baselines. Furthermore, even when subjected to perturbations, our SVCT model consistently provides faithful explanations, thus meeting the needs of the medical field.

SAM-aware Test-time Adaptation for Universal Medical Image Segmentation

Jianghao Wu, Yicheng Wu, Yutong Xie, Wenjia Bai, You Zhang, Feilong Tang, Yulong Li, Yasmeen George, Imran Razzak

arxiv logopreprintJun 5 2025
Universal medical image segmentation using the Segment Anything Model (SAM) remains challenging due to its limited adaptability to medical domains. Existing adaptations, such as MedSAM, enhance SAM's performance in medical imaging but at the cost of reduced generalization to unseen data. Therefore, in this paper, we propose SAM-aware Test-Time Adaptation (SAM-TTA), a fundamentally different pipeline that preserves the generalization of SAM while improving its segmentation performance in medical imaging via a test-time framework. SAM-TTA tackles two key challenges: (1) input-level discrepancies caused by differences in image acquisition between natural and medical images and (2) semantic-level discrepancies due to fundamental differences in object definition between natural and medical domains (e.g., clear boundaries vs. ambiguous structures). Specifically, our SAM-TTA framework comprises (1) Self-adaptive Bezier Curve-based Transformation (SBCT), which adaptively converts single-channel medical images into three-channel SAM-compatible inputs while maintaining structural integrity, to mitigate the input gap between medical and natural images, and (2) Dual-scale Uncertainty-driven Mean Teacher adaptation (DUMT), which employs consistency learning to align SAM's internal representations to medical semantics, enabling efficient adaptation without auxiliary supervision or expensive retraining. Extensive experiments on five public datasets demonstrate that our SAM-TTA outperforms existing TTA approaches and even surpasses fully fine-tuned models such as MedSAM in certain scenarios, establishing a new paradigm for universal medical image segmentation. Code can be found at https://github.com/JianghaoWu/SAM-TTA.
Page 74 of 99986 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.