Sort by:
Page 3 of 59584 results

Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation

Jinpeng Lu, Linghan Cai, Yinda Chen, Guo Tang, Songhan Jiang, Haoyuan Shi, Zhiwei Xiong

arxiv logopreprintSep 26 2025
Lightweight 3D medical image segmentation remains constrained by a fundamental "efficiency / robustness conflict", particularly when processing complex anatomical structures and heterogeneous modalities. In this paper, we study how to redesign the framework based on the characteristics of high-dimensional 3D images, and explore data synergy to overcome the fragile representation of lightweight methods. Our approach, VeloxSeg, begins with a deployable and extensible dual-stream CNN-Transformer architecture composed of Paired Window Attention (PWA) and Johnson-Lindenstrauss lemma-guided convolution (JLC). For each 3D image, we invoke a "glance-and-focus" principle, where PWA rapidly retrieves multi-scale information, and JLC ensures robust local feature extraction with minimal parameters, significantly enhancing the model's ability to operate with low computational budget. Followed by an extension of the dual-stream architecture that incorporates modal interaction into the multi-scale image-retrieval process, VeloxSeg efficiently models heterogeneous modalities. Finally, Spatially Decoupled Knowledge Transfer (SDKT) via Gram matrices injects the texture prior extracted by a self-supervised network into the segmentation network, yielding stronger representations than baselines at no extra inference cost. Experimental results on multimodal benchmarks show that VeloxSeg achieves a 26% Dice improvement, alongside increasing GPU throughput by 11x and CPU by 48x. Codes are available at https://github.com/JinPLu/VeloxSeg.

InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning

Guanghao Zhu, Zhitian Hou, Zeyu Liu, Zhijie Sang, Congkai Xie, Hongxia Yang

arxiv logopreprintSep 26 2025
Multimodal large language models (MLLMs) have shown remarkable potential in various domains, yet their application in the medical field is hindered by several challenges. General-purpose MLLMs often lack the specialized knowledge required for medical tasks, leading to uncertain or hallucinatory responses. Knowledge distillation from advanced models struggles to capture domain-specific expertise in radiology and pharmacology. Additionally, the computational cost of continual pretraining with large-scale medical data poses significant efficiency challenges. To address these issues, we propose InfiMed-Foundation-1.7B and InfiMed-Foundation-4B, two medical-specific MLLMs designed to deliver state-of-the-art performance in medical applications. We combined high-quality general-purpose and medical multimodal data and proposed a novel five-dimensional quality assessment framework to curate high-quality multimodal medical datasets. We employ low-to-high image resolution and multimodal sequence packing to enhance training efficiency, enabling the integration of extensive medical data. Furthermore, a three-stage supervised fine-tuning process ensures effective knowledge extraction for complex medical tasks. Evaluated on the MedEvalKit framework, InfiMed-Foundation-1.7B outperforms Qwen2.5VL-3B, while InfiMed-Foundation-4B surpasses HuatuoGPT-V-7B and MedGemma-27B-IT, demonstrating superior performance in medical visual question answering and diagnostic tasks. By addressing key challenges in data quality, training efficiency, and domain-specific knowledge extraction, our work paves the way for more reliable and effective AI-driven solutions in healthcare. InfiMed-Foundation-4B model is available at \href{https://huggingface.co/InfiX-ai/InfiMed-Foundation-4B}{InfiMed-Foundation-4B}.

Conditional Virtual Imaging for Few-Shot Vascular Image Segmentation.

He Y, Ge R, Tang H, Liu Y, Su M, Coatrieux JL, Shu H, Chen Y, He Y

pubmed logopapersSep 25 2025
In the field of medical image processing, vascular image segmentation plays a crucial role in clinical diagnosis, treatment planning, prognosis, and medical decision-making. Accurate and automated segmentation of vascular images can assist clinicians in understanding the vascular network structure, leading to more informed medical decisions. However, manual annotation of vascular images is time-consuming and challenging due to the fine and low-contrast vascular branches, especially in the medical imaging domain where annotation requires specialized knowledge and clinical expertise. Data-driven deep learning models struggle to achieve good performance when only a small number of annotated vascular images are available. To address this issue, this paper proposes a novel Conditional Virtual Imaging (CVI) framework for few-shot vascular image segmentation learning. The framework combines limited annotated data with extensive unlabeled data to generate high-quality images, effectively improving the accuracy and robustness of segmentation learning. Our approach primarily includes two innovations: First, aligned image-mask pair generation, which leverages the powerful image generation capabilities of large pre-trained models to produce high-quality vascular images with complex structures using only a few training images; Second, the Dual-Consistency Learning (DCL) strategy, which simultaneously trains the generator and segmentation model, allowing them to learn from each other and maximize the utilization of limited data. Experimental results demonstrate that our CVI framework can generate high-quality medical images and effectively enhance the performance of segmentation models in few-shot scenarios. Our code will be made publicly available online.

An open deep learning-based framework and model for tooth instance segmentation in dental CBCT.

Zhou Y, Xu Y, Khalil B, Nalley A, Tarce M

pubmed logopapersSep 25 2025
Current dental CBCT segmentation tools often lack accuracy, accessibility, or comprehensive anatomical coverage. To address this, we constructed a densely annotated dental CBCT dataset and developed a deep learning model, OraSeg, for tooth-level instance segmentation, which is then deployed as a one-click tool and made freely accessible for non-commercial use. We established a standardized annotated dataset covering 35 key oral anatomical structures and employed UNetR as the backbone network, combining Swin Transformer and the spatial Mamba module for multi-scale residual feature fusion. The OralSeg model was designed and optimized for precise instance segmentation of dental CBCT images, and integrated into the 3D Slicer platform, providing a graphical user interface for one-click segmentation. OralSeg had a Dice similarity coefficient of 0.8316 ± 0.0305 on CBCT instance segmentation compared to SwinUNETR and 3D U-Net. The model significantly improves segmentation performance, especially in complex oral anatomical structures, such as apical areas, alveolar bone margins, and mandibular nerve canals. The OralSeg model presented in this study provides an effective solution for instance segmentation of dental CBCT images. The tool allows clinical dentists and researchers with no AI background to perform one-click segmentation, and may be applicable in various clinical and research contexts. OralSeg can offer researchers and clinicians a user-friendly tool for tooth-level instance segmentation, which may assist in clinical diagnosis, educational training, and research, and contribute to the broader adoption of digital dentistry in precision medicine.

Mammo-CLIP Dissect: A Framework for Analysing Mammography Concepts in Vision-Language Models

Suaiba Amina Salahuddin, Teresa Dorszewski, Marit Almenning Martiniussen, Tone Hovda, Antonio Portaluri, Solveig Thrun, Michael Kampffmeyer, Elisabeth Wetzer, Kristoffer Wickstrøm, Robert Jenssen

arxiv logopreprintSep 25 2025
Understanding what deep learning (DL) models learn is essential for the safe deployment of artificial intelligence (AI) in clinical settings. While previous work has focused on pixel-based explainability methods, less attention has been paid to the textual concepts learned by these models, which may better reflect the reasoning used by clinicians. We introduce Mammo-CLIP Dissect, the first concept-based explainability framework for systematically dissecting DL vision models trained for mammography. Leveraging a mammography-specific vision-language model (Mammo-CLIP) as a "dissector," our approach labels neurons at specified layers with human-interpretable textual concepts and quantifies their alignment to domain knowledge. Using Mammo-CLIP Dissect, we investigate three key questions: (1) how concept learning differs between DL vision models trained on general image datasets versus mammography-specific datasets; (2) how fine-tuning for downstream mammography tasks affects concept specialisation; and (3) which mammography-relevant concepts remain underrepresented. We show that models trained on mammography data capture more clinically relevant concepts and align more closely with radiologists' workflows than models not trained on mammography data. Fine-tuning for task-specific classification enhances the capture of certain concept categories (e.g., benign calcifications) but can reduce coverage of others (e.g., density-related features), indicating a trade-off between specialisation and generalisation. Our findings show that Mammo-CLIP Dissect provides insights into how convolutional neural networks (CNNs) capture mammography-specific knowledge. By comparing models across training data and fine-tuning regimes, we reveal how domain-specific training and task-specific adaptation shape concept learning. Code and concept set are available: https://github.com/Suaiba/Mammo-CLIP-Dissect.

PHASE: Personalized Head-based Automatic Simulation for Electromagnetic Properties in 7T MRI.

Lu Z, Liang H, Lu M, Martin D, Hardy BM, Dawant BM, Wang X, Yan X, Huo Y

pubmed logopapersSep 25 2025
Accurate and individualized human head models are becoming increasingly important for electromagnetic (EM) simulations. These simulations depend on precise anatomical representations to realistically model electric and magnetic field distributions, particularly when evaluating Specific Absorption Rate (SAR) within safety guidelines. State of the art simulations use the Virtual Population due to limited public resources and the impracticality of manually annotating patient data at scale. This paper introduces Personalized Head-based Automatic Simulation for EM properties (PHASE), an automated open-source toolbox that generates high-resolution, patient-specific head models for EM simulations using paired T1-weighted (T1w) magnetic resonance imaging (MRI) and computed tomography (CT) scans with 14 tissue labels. To evaluate the performance of PHASE models, we conduct semi-automated segmentation and EM simulations on 15 real human patients, serving as the gold standard reference. The PHASE model achieved comparable global SAR and localized SAR averaged over 10 grams of tissue (SAR-10 g), demonstrating its potential as a promising tool for generating large-scale human model datasets in the future. The code and models of PHASE toolbox have been made publicly available: https://github.com/hrlblab/PHASE.

Clinical deployment and prospective validation of an AI model for limb-length discrepancy measurements using an open-source platform.

Tsai A, Samal S, Lamonica P, Morris N, McNeil J, Pienaar R

pubmed logopapersSep 24 2025
To deploy an AI model to measure limb-length discrepancy (LLD) and prospectively validate its performance. We encoded the inference of an LLD AI model into a docker container, incorporated it into a computational platform for clinical deployment, and conducted two prospective validation studies: a shadow trial (07/2024-9/2024) and a clinical trial (11/2024-01/2025). During each trial period, we queried for LLD EOS scanograms to serve as inputs to our model. For the shadow trial, we hid the AI-annotated outputs from the radiologists, and for the clinical trial, we displayed the AI-annotated output to the radiologists at the time of study interpretation. Afterward, we collected the bilateral femoral and tibial lengths from the radiology reports and compared them against those generated by the AI model. We used median absolute difference (MAD) and interquartile range (IQR) as summary statistics to assess the performance of our model. Our shadow trial consisted of 84 EOS scanograms from 84 children, with 168 femoral and tibial lengths. The MAD (IQR) of the femoral and tibial lengths were 0.2 cm (0.3 cm) and 0.2 cm (0.3 cm), respectively. Our clinical trial consisted of 114 EOS scanograms from 114 children, with 228 femoral and tibial lengths. The MAD (IQR) of the femoral and tibial lengths were 0.3 cm (0.4 cm) and 0.2 cm (0.3 cm), respectively. We successfully employed a computational platform for seamless integration and deployment of an LLD AI model into our clinical workflow, and prospectively validated its performance. Question No AI models have been clinically deployed for limb-length discrepancy (LLD) assessment in children, and the prospective validation of these models is unknown. Findings We deployed an LLD AI model using a homegrown platform, with prospective trials showing a median absolute difference of 0.2-0.3 cm in estimating bone lengths. Clinical relevance An LLD AI model with performance comparable to that of radiologists can serve as a secondary reader in increasing the confidence and accuracy of LLD measurements.

HiPerformer: A High-Performance Global-Local Segmentation Model with Modular Hierarchical Fusion Strategy

Dayu Tan, Zhenpeng Xu, Yansen Su, Xin Peng, Chunhou Zheng, Weimin Zhong

arxiv logopreprintSep 24 2025
Both local details and global context are crucial in medical image segmentation, and effectively integrating them is essential for achieving high accuracy. However, existing mainstream methods based on CNN-Transformer hybrid architectures typically employ simple feature fusion techniques such as serial stacking, endpoint concatenation, or pointwise addition, which struggle to address the inconsistencies between features and are prone to information conflict and loss. To address the aforementioned challenges, we innovatively propose HiPerformer. The encoder of HiPerformer employs a novel modular hierarchical architecture that dynamically fuses multi-source features in parallel, enabling layer-wise deep integration of heterogeneous information. The modular hierarchical design not only retains the independent modeling capability of each branch in the encoder, but also ensures sufficient information transfer between layers, effectively avoiding the degradation of features and information loss that come with traditional stacking methods. Furthermore, we design a Local-Global Feature Fusion (LGFF) module to achieve precise and efficient integration of local details and global semantic information, effectively alleviating the feature inconsistency problem and resulting in a more comprehensive feature representation. To further enhance multi-scale feature representation capabilities and suppress noise interference, we also propose a Progressive Pyramid Aggregation (PPA) module to replace traditional skip connections. Experiments on eleven public datasets demonstrate that the proposed method outperforms existing segmentation techniques, demonstrating higher segmentation accuracy and robustness. The code is available at https://github.com/xzphappy/HiPerformer.

An Anisotropic Cross-View Texture Transfer with Multi-Reference Non-Local Attention for CT Slice Interpolation

Kwang-Hyun Uhm, Hyunjun Cho, Sung-Hoo Hong, Seung-Won Jung

arxiv logopreprintSep 24 2025
Computed tomography (CT) is one of the most widely used non-invasive imaging modalities for medical diagnosis. In clinical practice, CT images are usually acquired with large slice thicknesses due to the high cost of memory storage and operation time, resulting in an anisotropic CT volume with much lower inter-slice resolution than in-plane resolution. Since such inconsistent resolution may lead to difficulties in disease diagnosis, deep learning-based volumetric super-resolution methods have been developed to improve inter-slice resolution. Most existing methods conduct single-image super-resolution on the through-plane or synthesize intermediate slices from adjacent slices; however, the anisotropic characteristic of 3D CT volume has not been well explored. In this paper, we propose a novel cross-view texture transfer approach for CT slice interpolation by fully utilizing the anisotropic nature of 3D CT volume. Specifically, we design a unique framework that takes high-resolution in-plane texture details as a reference and transfers them to low-resolution through-plane images. To this end, we introduce a multi-reference non-local attention module that extracts meaningful features for reconstructing through-plane high-frequency details from multiple in-plane images. Through extensive experiments, we demonstrate that our method performs significantly better in CT slice interpolation than existing competing methods on public CT datasets including a real-paired benchmark, verifying the effectiveness of the proposed framework. The source code of this work is available at https://github.com/khuhm/ACVTT.

ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression

Tom Burgert, Oliver Stoll, Paolo Rota, Begüm Demir

arxiv logopreprintSep 24 2025
The hypothesis that Convolutional Neural Networks (CNNs) are inherently texture-biased has shaped much of the discourse on feature use in deep learning. We revisit this hypothesis by examining limitations in the cue-conflict experiment by Geirhos et al. To address these limitations, we propose a domain-agnostic framework that quantifies feature reliance through systematic suppression of shape, texture, and color cues, avoiding the confounds of forced-choice conflicts. By evaluating humans and neural networks under controlled suppression conditions, we find that CNNs are not inherently texture-biased but predominantly rely on local shape features. Nonetheless, this reliance can be substantially mitigated through modern training strategies or architectures (ConvNeXt, ViTs). We further extend the analysis across computer vision, medical imaging, and remote sensing, revealing that reliance patterns differ systematically: computer vision models prioritize shape, medical imaging models emphasize color, and remote sensing models exhibit a stronger reliance towards texture. Code is available at https://github.com/tomburgert/feature-reliance.
Page 3 of 59584 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.