Sort by:
Page 9 of 85846 results

TotalRegistrator: Towards a Lightweight Foundation Model for CT Image Registration

Xuan Loc Pham, Gwendolyn Vuurberg, Marjan Doppen, Joey Roosen, Tip Stille, Thi Quynh Ha, Thuy Duong Quach, Quoc Vu Dang, Manh Ha Luu, Ewoud J. Smit, Hong Son Mai, Mattias Heinrich, Bram van Ginneken, Mathias Prokop, Alessa Hering

arxiv logopreprintAug 6 2025
Image registration is a fundamental technique in the analysis of longitudinal and multi-phase CT images within clinical practice. However, most existing methods are tailored for single-organ applications, limiting their generalizability to other anatomical regions. This work presents TotalRegistrator, an image registration framework capable of aligning multiple anatomical regions simultaneously using a standard UNet architecture and a novel field decomposition strategy. The model is lightweight, requiring only 11GB of GPU memory for training. To train and evaluate our method, we constructed a large-scale longitudinal dataset comprising 695 whole-body (thorax-abdomen-pelvic) paired CT scans from individual patients acquired at different time points. We benchmarked TotalRegistrator against a generic classical iterative algorithm and a recent foundation model for image registration. To further assess robustness and generalizability, we evaluated our model on three external datasets: the public thoracic and abdominal datasets from the Learn2Reg challenge, and a private multiphase abdominal dataset from a collaborating hospital. Experimental results on the in-house dataset show that the proposed approach generally surpasses baseline methods in multi-organ abdominal registration, with a slight drop in lung alignment performance. On out-of-distribution datasets, it achieved competitive results compared to leading single-organ models, despite not being fine-tuned for those tasks, demonstrating strong generalizability. The source code will be publicly available at: https://github.com/DIAGNijmegen/oncology_image_registration.git.

Augmentation-based Domain Generalization and Joint Training from Multiple Source Domains for Whole Heart Segmentation

Franz Thaler, Darko Stern, Gernot Plank, Martin Urschler

arxiv logopreprintAug 6 2025
As the leading cause of death worldwide, cardiovascular diseases motivate the development of more sophisticated methods to analyze the heart and its substructures from medical images like Computed Tomography (CT) and Magnetic Resonance (MR). Semantic segmentations of important cardiac structures that represent the whole heart are useful to assess patient-specific cardiac morphology and pathology. Furthermore, accurate semantic segmentations can be used to generate cardiac digital twin models which allows e.g. electrophysiological simulation and personalized therapy planning. Even though deep learning-based methods for medical image segmentation achieved great advancements over the last decade, retaining good performance under domain shift -- i.e. when training and test data are sampled from different data distributions -- remains challenging. In order to perform well on domains known at training-time, we employ a (1) balanced joint training approach that utilizes CT and MR data in equal amounts from different source domains. Further, aiming to alleviate domain shift towards domains only encountered at test-time, we rely on (2) strong intensity and spatial augmentation techniques to greatly diversify the available training data. Our proposed whole heart segmentation method, a 5-fold ensemble with our contributions, achieves the best performance for MR data overall and a performance similar to the best performance for CT data when compared to a model trained solely on CT. With 93.33% DSC and 0.8388 mm ASSD for CT and 89.30% DSC and 1.2411 mm ASSD for MR data, our method demonstrates great potential to efficiently obtain accurate semantic segmentations from which patient-specific cardiac twin models can be generated.

A Comprehensive Framework for Uncertainty Quantification of Voxel-wise Supervised Models in IVIM MRI

Nicola Casali, Alessandro Brusaferri, Giuseppe Baselli, Stefano Fumagalli, Edoardo Micotti, Gianluigi Forloni, Riaz Hussein, Giovanna Rizzo, Alfonso Mastropietro

arxiv logopreprintAug 6 2025
Accurate estimation of intravoxel incoherent motion (IVIM) parameters from diffusion-weighted MRI remains challenging due to the ill-posed nature of the inverse problem and high sensitivity to noise, particularly in the perfusion compartment. In this work, we propose a probabilistic deep learning framework based on Deep Ensembles (DE) of Mixture Density Networks (MDNs), enabling estimation of total predictive uncertainty and decomposition into aleatoric (AU) and epistemic (EU) components. The method was benchmarked against non probabilistic neural networks, a Bayesian fitting approach and a probabilistic network with single Gaussian parametrization. Supervised training was performed on synthetic data, and evaluation was conducted on both simulated and two in vivo datasets. The reliability of the quantified uncertainties was assessed using calibration curves, output distribution sharpness, and the Continuous Ranked Probability Score (CRPS). MDNs produced more calibrated and sharper predictive distributions for the D and f parameters, although slight overconfidence was observed in D*. The Robust Coefficient of Variation (RCV) indicated smoother in vivo estimates for D* with MDNs compared to Gaussian model. Despite the training data covering the expected physiological range, elevated EU in vivo suggests a mismatch with real acquisition conditions, highlighting the importance of incorporating EU, which was allowed by DE. Overall, we present a comprehensive framework for IVIM fitting with uncertainty quantification, which enables the identification and interpretation of unreliable estimates. The proposed approach can also be adopted for fitting other physical models through appropriate architectural and simulation adjustments.

Conditional Fetal Brain Atlas Learning for Automatic Tissue Segmentation

Johannes Tischer, Patric Kienast, Marlene Stümpflen, Gregor Kasprian, Georg Langs, Roxane Licandro

arxiv logopreprintAug 6 2025
Magnetic Resonance Imaging (MRI) of the fetal brain has become a key tool for studying brain development in vivo. Yet, its assessment remains challenging due to variability in brain maturation, imaging protocols, and uncertain estimates of Gestational Age (GA). To overcome these, brain atlases provide a standardized reference framework that facilitates objective evaluation and comparison across subjects by aligning the atlas and subjects in a common coordinate system. In this work, we introduce a novel deep-learning framework for generating continuous, age-specific fetal brain atlases for real-time fetal brain tissue segmentation. The framework combines a direct registration model with a conditional discriminator. Trained on a curated dataset of 219 neurotypical fetal MRIs spanning from 21 to 37 weeks of gestation. The method achieves high registration accuracy, captures dynamic anatomical changes with sharp structural detail, and robust segmentation performance with an average Dice Similarity Coefficient (DSC) of 86.3% across six brain tissues. Furthermore, volumetric analysis of the generated atlases reveals detailed neurotypical growth trajectories, providing valuable insights into the maturation of the fetal brain. This approach enables individualized developmental assessment with minimal pre-processing and real-time performance, supporting both research and clinical applications. The model code is available at https://github.com/cirmuw/fetal-brain-atlas

Benchmarking Uncertainty and its Disentanglement in multi-label Chest X-Ray Classification

Simon Baur, Wojciech Samek, Jackie Ma

arxiv logopreprintAug 6 2025
Reliable uncertainty quantification is crucial for trustworthy decision-making and the deployment of AI models in medical imaging. While prior work has explored the ability of neural networks to quantify predictive, epistemic, and aleatoric uncertainties using an information-theoretical approach in synthetic or well defined data settings like natural image classification, its applicability to real life medical diagnosis tasks remains underexplored. In this study, we provide an extensive uncertainty quantification benchmark for multi-label chest X-ray classification using the MIMIC-CXR-JPG dataset. We evaluate 13 uncertainty quantification methods for convolutional (ResNet) and transformer-based (Vision Transformer) architectures across a wide range of tasks. Additionally, we extend Evidential Deep Learning, HetClass NNs, and Deep Deterministic Uncertainty to the multi-label setting. Our analysis provides insights into uncertainty estimation effectiveness and the ability to disentangle epistemic and aleatoric uncertainties, revealing method- and architecture-specific strengths and limitations.

NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding

Zelin Peng, Yichen Zhao, Yu Huang, Piao Yang, Feilong Tang, Zhengqin Xu, Xiaokang Yang, Wei Shen

arxiv logopreprintAug 6 2025
Computer-aided medical image analysis is crucial for disease diagnosis and treatment planning, yet limited annotated datasets restrict medical-specific model development. While vision-language models (VLMs) like CLIP offer strong generalization capabilities, their direct application to medical imaging analysis is impeded by a significant domain gap. Existing approaches to bridge this gap, including prompt learning and one-way modality interaction techniques, typically focus on introducing domain knowledge to a single modality. Although this may offer performance gains, it often causes modality misalignment, thereby failing to unlock the full potential of VLMs. In this paper, we propose \textbf{NEARL-CLIP} (i\underline{N}teracted qu\underline{E}ry \underline{A}daptation with o\underline{R}thogona\underline{L} Regularization), a novel cross-modality interaction VLM-based framework that contains two contributions: (1) Unified Synergy Embedding Transformer (USEformer), which dynamically generates cross-modality queries to promote interaction between modalities, thus fostering the mutual enrichment and enhancement of multi-modal medical domain knowledge; (2) Orthogonal Cross-Attention Adapter (OCA). OCA introduces an orthogonality technique to decouple the new knowledge from USEformer into two distinct components: the truly novel information and the incremental knowledge. By isolating the learning process from the interference of incremental knowledge, OCA enables a more focused acquisition of new information, thereby further facilitating modality interaction and unleashing the capability of VLMs. Notably, NEARL-CLIP achieves these two contributions in a parameter-efficient style, which only introduces \textbf{1.46M} learnable parameters.

Automated ultrasound doppler angle estimation using deep learning

Nilesh Patil, Ajay Anand

arxiv logopreprintAug 6 2025
Angle estimation is an important step in the Doppler ultrasound clinical workflow to measure blood velocity. It is widely recognized that incorrect angle estimation is a leading cause of error in Doppler-based blood velocity measurements. In this paper, we propose a deep learning-based approach for automated Doppler angle estimation. The approach was developed using 2100 human carotid ultrasound images including image augmentation. Five pre-trained models were used to extract images features, and these features were passed to a custom shallow network for Doppler angle estimation. Independently, measurements were obtained by a human observer reviewing the images for comparison. The mean absolute error (MAE) between the automated and manual angle estimates ranged from 3.9{\deg} to 9.4{\deg} for the models evaluated. Furthermore, the MAE for the best performing model was less than the acceptable clinical Doppler angle error threshold thus avoiding misclassification of normal velocity values as a stenosis. The results demonstrate potential for applying a deep-learning based technique for automated ultrasound Doppler angle estimation. Such a technique could potentially be implemented within the imaging software on commercial ultrasound scanners.

TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation

Zunhui Xia, Hongxing Li, Libin Lan

arxiv logopreprintAug 6 2025
In recent years, transformer-based methods have achieved remarkable progress in medical image segmentation due to their superior ability to capture long-range dependencies. However, these methods typically suffer from two major limitations. First, their computational complexity scales quadratically with the input sequences. Second, the feed-forward network (FFN) modules in vanilla Transformers typically rely on fully connected layers, which limits models' ability to capture local contextual information and multiscale features critical for precise semantic segmentation. To address these issues, we propose an efficient medical image segmentation network, named TCSAFormer. The proposed TCSAFormer adopts two key ideas. First, it incorporates a Compressed Attention (CA) module, which combines token compression and pixel-level sparse attention to dynamically focus on the most relevant key-value pairs for each query. This is achieved by pruning globally irrelevant tokens and merging redundant ones, significantly reducing computational complexity while enhancing the model's ability to capture relationships between tokens. Second, it introduces a Dual-Branch Feed-Forward Network (DBFFN) module as a replacement for the standard FFN to capture local contextual features and multiscale information, thereby strengthening the model's feature representation capability. We conduct extensive experiments on three publicly available medical image segmentation datasets: ISIC-2018, CVC-ClinicDB, and Synapse, to evaluate the segmentation performance of TCSAFormer. Experimental results demonstrate that TCSAFormer achieves superior performance compared to existing state-of-the-art (SOTA) methods, while maintaining lower computational overhead, thus achieving an optimal trade-off between efficiency and accuracy.

Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification

Jianxun Yu, Ruiquan Ge, Zhipeng Wang, Cheng Yang, Chenyu Lin, Xianjun Fu, Jikui Liu, Ahmed Elazab, Changmiao Wang

arxiv logopreprintAug 6 2025
The diagnosis of medical diseases faces challenges such as the misdiagnosis of small lesions. Deep learning, particularly multimodal approaches, has shown great potential in the field of medical disease diagnosis. However, the differences in dimensionality between medical imaging and electronic health record data present challenges for effective alignment and fusion. To address these issues, we propose the Multimodal Multiscale Cross-Attention Fusion Network (MMCAF-Net). This model employs a feature pyramid structure combined with an efficient 3D multi-scale convolutional attention module to extract lesion-specific features from 3D medical images. To further enhance multimodal data integration, MMCAF-Net incorporates a multi-scale cross-attention module, which resolves dimensional inconsistencies, enabling more effective feature fusion. We evaluated MMCAF-Net on the Lung-PET-CT-Dx dataset, and the results showed a significant improvement in diagnostic accuracy, surpassing current state-of-the-art methods. The code is available at https://github.com/yjx1234/MMCAF-Net

Towards Globally Predictable k-Space Interpolation: A White-box Transformer Approach

Chen Luo, Qiyu Jin, Taofeng Xie, Xuemei Wang, Huayu Wang, Congcong Liu, Liming Tang, Guoqing Chen, Zhuo-Xu Cui, Dong Liang

arxiv logopreprintAug 6 2025
Interpolating missing data in k-space is essential for accelerating imaging. However, existing methods, including convolutional neural network-based deep learning, primarily exploit local predictability while overlooking the inherent global dependencies in k-space. Recently, Transformers have demonstrated remarkable success in natural language processing and image analysis due to their ability to capture long-range dependencies. This inspires the use of Transformers for k-space interpolation to better exploit its global structure. However, their lack of interpretability raises concerns regarding the reliability of interpolated data. To address this limitation, we propose GPI-WT, a white-box Transformer framework based on Globally Predictable Interpolation (GPI) for k-space. Specifically, we formulate GPI from the perspective of annihilation as a novel k-space structured low-rank (SLR) model. The global annihilation filters in the SLR model are treated as learnable parameters, and the subgradients of the SLR model naturally induce a learnable attention mechanism. By unfolding the subgradient-based optimization algorithm of SLR into a cascaded network, we construct the first white-box Transformer specifically designed for accelerated MRI. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches in k-space interpolation accuracy while providing superior interpretability.
Page 9 of 85846 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.