Sort by:
Page 17 of 45448 results

InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation

Daniya Najiha Abdul Kareem, Abdul Hannan, Mubashir Noman, Jean Lahoud, Mustansar Fiaz, Hisham Cholakkal

arxiv logopreprintJun 13 2025
Accurate microscopic medical image segmentation plays a crucial role in diagnosing various cancerous cells and identifying tumors. Driven by advancements in deep learning, convolutional neural networks (CNNs) and transformer-based models have been extensively studied to enhance receptive fields and improve medical image segmentation task. However, they often struggle to capture complex cellular and tissue structures in challenging scenarios such as background clutter and object overlap. Moreover, their reliance on the availability of large datasets for improved performance, along with the high computational cost, limit their practicality. To address these issues, we propose an efficient framework for the segmentation task, named InceptionMamba, which encodes multi-stage rich features and offers both performance and computational efficiency. Specifically, we exploit semantic cues to capture both low-frequency and high-frequency regions to enrich the multi-stage features to handle the blurred region boundaries (e.g., cell boundaries). These enriched features are input to a hybrid model that combines an Inception depth-wise convolution with a Mamba block, to maintain high efficiency and capture inherent variations in the scales and shapes of the regions of interest. These enriched features along with low-resolution features are fused to get the final segmentation mask. Our model achieves state-of-the-art performance on two challenging microscopic segmentation datasets (SegPC21 and GlaS) and two skin lesion segmentation datasets (ISIC2017 and ISIC2018), while reducing computational cost by about 5 times compared to the previous best performing method.

Task Augmentation-Based Meta-Learning Segmentation Method for Retinopathy.

Wang J, Mateen M, Xiang D, Zhu W, Shi F, Huang J, Sun K, Dai J, Xu J, Zhang S, Chen X

pubmed logopapersJun 12 2025
Deep learning (DL) requires large amounts of labeled data, which is extremely time-consuming and laborintensive to obtain for medical image segmentation tasks. Metalearning focuses on developing learning strategies that enable quick adaptation to new tasks with limited labeled data. However, rich-class medical image segmentation datasets for constructing meta-learning multi-tasks are currently unavailable. In addition, data collected from various healthcare sites and devices may present significant distribution differences, potentially degrading model's performance. In this paper, we propose a task augmentation-based meta-learning method for retinal image segmentation (TAMS) to meet labor-intensive annotation demand. A retinal Lesion Simulation Algorithm (LSA) is proposed to automatically generate multi-class retinal disease datasets with pixel-level segmentation labels, such that metalearning tasks can be augmented without collecting data from various sources. In addition, a novel simulation function library is designed to control generation process and ensure interpretability. Moreover, a generative simulation network (GSNet) with an improved adversarial training strategy is introduced to maintain high-quality representations of complex retinal diseases. TAMS is evaluated on three different OCT and CFP image datasets, and comprehensive experiments have demonstrated that TAMS achieves superior segmentation performance than state-of-the-art models.

Simulation-free workflow for lattice radiation therapy using deep learning predicted synthetic computed tomography: A feasibility study.

Zhu L, Yu NY, Ahmed SK, Ashman JB, Toesca DS, Grams MP, Deufel CL, Duan J, Chen Q, Rong Y

pubmed logopapersJun 12 2025
Lattice radiation therapy (LRT) is a form of spatially fractionated radiation therapy that allows increased total dose delivery aiming for improved treatment response without an increase in toxicities, commonly utilized for palliation of bulky tumors. The LRT treatment planning process is complex, while eligible patients often have an urgent need for expedited treatment start. In this study, we aimed to develop a simulation-free workflow for volumetric modulated arc therapy (VMAT)-based LRT planning via deep learning-predicted synthetic CT (sCT) to expedite treatment initiation. Two deep learning models were initially trained using 3D U-Net architecture to generate sCT from diagnostic CTs (dCT) of the thoracic and abdomen regions using a training dataset of 50 patients. The models were then tested on an independent dataset of 15 patients using image similarity analysis assessing mean absolute error (MAE) and structural similarity index measure (SSIM) as metrics. VMAT-based LRT plans were generated based on sCT and recalculated on the planning CT (pCT) for dosimetric accuracy comparison. Differences in dose volume histogram (DVH) metrics between pCT and sCT plans were assessed using the Wilcoxon signed-rank test. The final sCT prediction model demonstrated high image similarity to pCT, with a MAE and SSIM of 38.93 ± 14.79 Hounsfield Units (HU) and 0.92 ± 0.05 for the thoracic region, and 73.60 ± 22.90 HU and 0.90 ± 0.03 for the abdominal region, respectively. There were no statistically significant differences between sCT and pCT plans in terms of organ-at-risk and target volume DVH parameters, including maximum dose (Dmax), mean dose (Dmean), dose delivered to 90% (D90%) and 50% (D50%) of target volume, except for minimum dose (Dmin) and (D10%). With demonstrated high image similarity and adequate dose agreement between sCT and pCT, our study is a proof-of-concept for using deep learning predicted sCT for a simulation-free treatment planning workflow for VMAT-based LRT.

Med-URWKV: Pure RWKV With ImageNet Pre-training For Medical Image Segmentation

Zhenhuan Zhou

arxiv logopreprintJun 12 2025
Medical image segmentation is a fundamental and key technology in computer-aided diagnosis and treatment. Previous methods can be broadly classified into three categories: convolutional neural network (CNN) based, Transformer based, and hybrid architectures that combine both. However, each of them has its own limitations, such as restricted receptive fields in CNNs or the computational overhead caused by the quadratic complexity of Transformers. Recently, the Receptance Weighted Key Value (RWKV) model has emerged as a promising alternative for various vision tasks, offering strong long-range modeling capabilities with linear computational complexity. Some studies have also adapted RWKV to medical image segmentation tasks, achieving competitive performance. However, most of these studies focus on modifications to the Vision-RWKV (VRWKV) mechanism and train models from scratch, without exploring the potential advantages of leveraging pre-trained VRWKV models for medical image segmentation tasks. In this paper, we propose Med-URWKV, a pure RWKV-based architecture built upon the U-Net framework, which incorporates ImageNet-based pretraining to further explore the potential of RWKV in medical image segmentation tasks. To the best of our knowledge, Med-URWKV is the first pure RWKV segmentation model in the medical field that can directly reuse a large-scale pre-trained VRWKV encoder. Experimental results on seven datasets demonstrate that Med-URWKV achieves comparable or even superior segmentation performance compared to other carefully optimized RWKV models trained from scratch. This validates the effectiveness of using a pretrained VRWKV encoder in enhancing model performance. The codes will be released.

MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models

Yu Huang, Zelin Peng, Yichen Zhao, Piao Yang, Xiaokang Yang, Wei Shen

arxiv logopreprintJun 12 2025
Medical image segmentation is crucial for clinical diagnosis, yet existing models are limited by their reliance on explicit human instructions and lack the active reasoning capabilities to understand complex clinical questions. While recent advancements in multimodal large language models (MLLMs) have improved medical question-answering (QA) tasks, most methods struggle to generate precise segmentation masks, limiting their application in automatic medical diagnosis. In this paper, we introduce medical image reasoning segmentation, a novel task that aims to generate segmentation masks based on complex and implicit medical instructions. To address this, we propose MedSeg-R, an end-to-end framework that leverages the reasoning abilities of MLLMs to interpret clinical questions while also capable of producing corresponding precise segmentation masks for medical images. It is built on two core components: 1) a global context understanding module that interprets images and comprehends complex medical instructions to generate multi-modal intermediate tokens, and 2) a pixel-level grounding module that decodes these tokens to produce precise segmentation masks and textual responses. Furthermore, we introduce MedSeg-QA, a large-scale dataset tailored for the medical image reasoning segmentation task. It includes over 10,000 image-mask pairs and multi-turn conversations, automatically annotated using large language models and refined through physician reviews. Experiments show MedSeg-R's superior performance across several benchmarks, achieving high segmentation accuracy and enabling interpretable textual analysis of medical images.

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

Andrea Moglia, Matteo Leccardi, Matteo Cavicchioli, Alice Maccarini, Marco Marcon, Luca Mainardi, Pietro Cerveri

arxiv logopreprintJun 12 2025
Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.

RealKeyMorph: Keypoints in Real-world Coordinates for Resolution-agnostic Image Registration

Mina C. Moghadam, Alan Q. Wang, Omer Taub, Martin R. Prince, Mert R. Sabuncu

arxiv logopreprintJun 12 2025
Many real-world settings require registration of a pair of medical images that differ in spatial resolution, which may arise from differences in image acquisition parameters like pixel spacing, slice thickness, and field-of-view. However, all previous machine learning-based registration techniques resample images onto a fixed resolution. This is suboptimal because resampling can introduce artifacts due to interpolation. To address this, we present RealKeyMorph (RKM), a resolution-agnostic method for image registration. RKM is an extension of KeyMorph, a registration framework which works by training a network to learn corresponding keypoints for a given pair of images, after which a closed-form keypoint matching step is used to derive the transformation that aligns them. To avoid resampling and enable operating on the raw data, RKM outputs keypoints in real-world coordinates of the scanner. To do this, we leverage the affine matrix produced by the scanner (e.g., MRI machine) that encodes the mapping from voxel coordinates to real world coordinates. By transforming keypoints into real-world space and integrating this into the training process, RKM effectively enables the extracted keypoints to be resolution-agnostic. In our experiments, we demonstrate the advantages of RKM on the registration task for orthogonal 2D stacks of abdominal MRIs, as well as 3D volumes with varying resolutions in brain datasets.

PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis

Marzieh Oghbaie, Teresa Araújoa, Hrvoje Bogunović

arxiv logopreprintJun 12 2025
Background and Objective: Prototype-based methods improve interpretability by learning fine-grained part-prototypes; however, their visualization in the input pixel space is not always consistent with human-understandable biomarkers. In addition, well-known prototype-based approaches typically learn extremely granular prototypes that are less interpretable in medical imaging, where both the presence and extent of biomarkers and lesions are critical. Methods: To address these challenges, we propose PiPViT (Patch-based Visual Interpretable Prototypes), an inherently interpretable prototypical model for image recognition. Leveraging a vision transformer (ViT), PiPViT captures long-range dependencies among patches to learn robust, human-interpretable prototypes that approximate lesion extent only using image-level labels. Additionally, PiPViT benefits from contrastive learning and multi-resolution input processing, which enables effective localization of biomarkers across scales. Results: We evaluated PiPViT on retinal OCT image classification across four datasets, where it achieved competitive quantitative performance compared to state-of-the-art methods while delivering more meaningful explanations. Moreover, quantitative evaluation on a hold-out test set confirms that the learned prototypes are semantically and clinically relevant. We believe PiPViT can transparently explain its decisions and assist clinicians in understanding diagnostic outcomes. Github page: https://github.com/marziehoghbaie/PiPViT

Preclinical Investigation of Artificial Intelligence-Assisted Implant Surgery Planning for Single Tooth Defects: A Case Series Study.

Ma H, Wu Y, Bai H, Xu Z, Ding P, Deng X, Tang Z

pubmed logopapersJun 12 2025
Dental implant surgery has become a prevalent treatment option for patients with single tooth defects. However, the success of this surgery relies heavily on precise planning and execution. This study investigates the application of artificial intelligence (AI) in assisting the planning process of dental implant surgery for single tooth defects. Single tooth defects in the oral cavity pose a significant challenge in restorative dentistry. Dental implant restoration has emerged as an effective solution for rehabilitating such defects. However, the complexity of the procedure and the need for accurate treatment planning necessitate the integration of advanced technologies. In this study, we propose the utilisation of AI to enhance the precision and efficiency of implant surgery planning for single tooth defects. A total of twenty patients with single tooth loss were enrolled. Cone-beam computed tomography (CBCT) and intra-oral scans were obtained and imported into the AI-dentist software for 3D reconstruction. AI assisted in implant selection, tooth position identification, and crown fabrication. Evaluation included subjective verification and objective assessments. A paired samples t-test was used to compare planning times (dentist vs. AI), with a significance level of p < 0.05. Twenty patients (9 male, 11 female; mean age 59.5 ± 11.86 years) with single missing teeth participated in this study. Implant margins were carefully positioned: 3.05 ± 1.44 mm from adjacent roots, 2.52 ± 0.65 mm from bone plate edges, 3.05 ± 1.44 mm from sinus/canal, and 3.85 ± 1.23 mm from gingival height. Manual planning (21.50 ± 4.87 min) was statistically significantly slower than AI (11.84 ± 3.22 min, p < 0.01). Implant planning met 100% buccolingual/proximal/distal bone volume criteria and 90% sinus/canal distance criteria. Two patients required sinus lifting and bone grafting due to insufficient bone volume. This study highlights the promising role of AI in enhancing the precision and efficiency of dental implant surgery planning for single tooth defects. Further studies are necessary to validate the effectiveness and safety of AI-assisted planning in a larger patient population.

Accelerated MRI in temporomandibular joints using AI-assisted compressed sensing technique: a feasibility study.

Ye Z, Lyu X, Zhao R, Fan P, Yang S, Xia C, Li Z, Xiong X

pubmed logopapersJun 12 2025
To investigate the feasibility of accelerated MRI with artificial intelligence-assisted compressed sensing (ACS) technique in the temporomandibular joint (TMJ) and compare its performance with parallel imaging (PI) protocol and standard (STD) protocol. Participants with TMJ-related symptoms were prospectively enrolled from April 2023 to May 2024, and underwent bilateral TMJ imaging examinations using ACS protocol (6:08 min), PI protocol (10:57 min), and STD protocol (13:28 min). Overall image quality and visibility of TMJ relevant structures were qualitatively evaluated by a 4-point Likert scale. Quantitative analysis of signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of TMJ disc, condyle, and lateral pterygoid muscle (LPM) was performed. Diagnostic agreement of joint effusion and disc displacement among protocols and investigators was assessed by Fleiss' kappa analysis. A total of 51 participants (16 male and 35 female) with 102 TMJs were included. The overall image quality and most structures of the ACS protocol were significantly higher than the STD protocol (all p < 0.05), and similar to the PI protocol. For quantitative analysis, the ACS protocol demonstrated significantly higher SNR and CNR than the STD protocol in the TMJ disc, condyle, and LPM (all p < 0.05), and the ACS protocol showed comparable SNR to the PI protocol in most sequences. Good to excellent inter-protocol and inter-observer agreement was observed for diagnosing TMJ abnormalities (κ = 0.699-1.000). Accelerated MRI with ACS technique can significantly reduce the acquisition time of TMJ, while providing superior or equivalent image quality and great diagnostic agreement with PI and STD protocols. Question Patients with TMJ disorders often cannot endure long MRI examinations due to orofacial pain, necessitating accelerated MRI to improve patient comfort. Findings ACS technique can significantly reduce acquisition time in TMJ imaging while providing superior or equivalent image quality. Clinical relevance The time-saving ACS technique improves image quality and achieves excellent diagnostic agreement in the evaluation of joint effusion and disc displacement. It helps optimize clinical MRI workflow in patients with TMJ disorders.
Page 17 of 45448 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.