Sort by:
Page 12 of 81804 results

NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding

Zelin Peng, Yichen Zhao, Yu Huang, Piao Yang, Feilong Tang, Zhengqin Xu, Xiaokang Yang, Wei Shen

arxiv logopreprintAug 6 2025
Computer-aided medical image analysis is crucial for disease diagnosis and treatment planning, yet limited annotated datasets restrict medical-specific model development. While vision-language models (VLMs) like CLIP offer strong generalization capabilities, their direct application to medical imaging analysis is impeded by a significant domain gap. Existing approaches to bridge this gap, including prompt learning and one-way modality interaction techniques, typically focus on introducing domain knowledge to a single modality. Although this may offer performance gains, it often causes modality misalignment, thereby failing to unlock the full potential of VLMs. In this paper, we propose \textbf{NEARL-CLIP} (i\underline{N}teracted qu\underline{E}ry \underline{A}daptation with o\underline{R}thogona\underline{L} Regularization), a novel cross-modality interaction VLM-based framework that contains two contributions: (1) Unified Synergy Embedding Transformer (USEformer), which dynamically generates cross-modality queries to promote interaction between modalities, thus fostering the mutual enrichment and enhancement of multi-modal medical domain knowledge; (2) Orthogonal Cross-Attention Adapter (OCA). OCA introduces an orthogonality technique to decouple the new knowledge from USEformer into two distinct components: the truly novel information and the incremental knowledge. By isolating the learning process from the interference of incremental knowledge, OCA enables a more focused acquisition of new information, thereby further facilitating modality interaction and unleashing the capability of VLMs. Notably, NEARL-CLIP achieves these two contributions in a parameter-efficient style, which only introduces \textbf{1.46M} learnable parameters.

TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation

Zunhui Xia, Hongxing Li, Libin Lan

arxiv logopreprintAug 6 2025
In recent years, transformer-based methods have achieved remarkable progress in medical image segmentation due to their superior ability to capture long-range dependencies. However, these methods typically suffer from two major limitations. First, their computational complexity scales quadratically with the input sequences. Second, the feed-forward network (FFN) modules in vanilla Transformers typically rely on fully connected layers, which limits models' ability to capture local contextual information and multiscale features critical for precise semantic segmentation. To address these issues, we propose an efficient medical image segmentation network, named TCSAFormer. The proposed TCSAFormer adopts two key ideas. First, it incorporates a Compressed Attention (CA) module, which combines token compression and pixel-level sparse attention to dynamically focus on the most relevant key-value pairs for each query. This is achieved by pruning globally irrelevant tokens and merging redundant ones, significantly reducing computational complexity while enhancing the model's ability to capture relationships between tokens. Second, it introduces a Dual-Branch Feed-Forward Network (DBFFN) module as a replacement for the standard FFN to capture local contextual features and multiscale information, thereby strengthening the model's feature representation capability. We conduct extensive experiments on three publicly available medical image segmentation datasets: ISIC-2018, CVC-ClinicDB, and Synapse, to evaluate the segmentation performance of TCSAFormer. Experimental results demonstrate that TCSAFormer achieves superior performance compared to existing state-of-the-art (SOTA) methods, while maintaining lower computational overhead, thus achieving an optimal trade-off between efficiency and accuracy.

Automated detection of zygomatic fractures on spiral computed tomography using a deep learning model.

Yari A, Fasih P, Kamali Hakim L, Asadi A

pubmed logopapersAug 6 2025
The aim of this study was to evaluate the performance of the YOLOv8 deep learning model for detecting zygomatic fractures. Computed tomography scans with zygomatic fractures were collected, with all slices annotated to identify fracture lines across seven categories: zygomaticomaxillary suture, zygomatic arch, zygomaticofrontal suture, sphenozygomatic suture, orbital floor, zygomatic body, and maxillary sinus wall. The images were divided into training, validation, and test datasets in a 6:2:2 ratio. Performance metrics were calculated for each category. A total of 13,988 axial and 14,107 coronal slices were retrieved. The trained algorithm achieved accuracy of 94.2-97.9%. Recall exceeded 90% across all categories, with sphenozygomatic suture fractures having the highest value (96.6%). Average precision was highest for zygomatic arch fractures (0.827) and lowest for zygomatic body fractures (0.692). The highest F1 score was 96.7% for zygomaticomaxillary suture fractures, and the lowest was 82.1% for zygomatic body fractures. Area under the curve (AUC) values were also highest for zygomaticomaxillary suture (0.943) and lowest for zygomatic body fractures (0.876). The YOLOv8 model demonstrated promising results in the automated detection of zygomatic fractures, achieving the highest performance in identifying fractures of the zygomaticomaxillary suture and zygomatic arch.

A Comprehensive Framework for Uncertainty Quantification of Voxel-wise Supervised Models in IVIM MRI

Nicola Casali, Alessandro Brusaferri, Giuseppe Baselli, Stefano Fumagalli, Edoardo Micotti, Gianluigi Forloni, Riaz Hussein, Giovanna Rizzo, Alfonso Mastropietro

arxiv logopreprintAug 6 2025
Accurate estimation of intravoxel incoherent motion (IVIM) parameters from diffusion-weighted MRI remains challenging due to the ill-posed nature of the inverse problem and high sensitivity to noise, particularly in the perfusion compartment. In this work, we propose a probabilistic deep learning framework based on Deep Ensembles (DE) of Mixture Density Networks (MDNs), enabling estimation of total predictive uncertainty and decomposition into aleatoric (AU) and epistemic (EU) components. The method was benchmarked against non probabilistic neural networks, a Bayesian fitting approach and a probabilistic network with single Gaussian parametrization. Supervised training was performed on synthetic data, and evaluation was conducted on both simulated and an in vivo dataset. The reliability of the quantified uncertainties was assessed using calibration curves, output distribution sharpness, and the Continuous Ranked Probability Score (CRPS). MDNs produced more calibrated and sharper predictive distributions for the diffusion coefficient D and fraction f parameters, although slight overconfidence was observed in pseudo-diffusion coefficient D*. The Robust Coefficient of Variation (RCV) indicated smoother in vivo estimates for D* with MDNs compared to Gaussian model. Despite the training data covering the expected physiological range, elevated EU in vivo suggests a mismatch with real acquisition conditions, highlighting the importance of incorporating EU, which was allowed by DE. Overall, we present a comprehensive framework for IVIM fitting with uncertainty quantification, which enables the identification and interpretation of unreliable estimates. The proposed approach can also be adopted for fitting other physical models through appropriate architectural and simulation adjustments.

Assessing the spatial relationship between mandibular third molars and the inferior alveolar canal using a deep learning-based approach: a proof-of-concept study.

Lyu W, Lou S, Huang J, Huang Z, Zheng H, Liao H, Qiao Y, OuYang K

pubmed logopapersAug 6 2025
The distance between the mandibular third molar (M3) and the mandibular canal (MC) is a key factor in assessing the risk of injury to the inferior alveolar nerve (IAN). However, existing deep learning systems have not yet been able to accurately quantify the M3-MC distance in 3D space. The aim of this study was to develop and validate a deep learning-based system for accurate measurement of M3-MC spatial relationships in cone-beam computed tomography (CBCT) images and to evaluate its accuracy against conventional methods. We propose an innovative approach for low-resource environments, using DeeplabV3 + for semantic segmentation of CBCT-extracted 2D images, followed by multi-category 3D reconstruction and visualization. Based on the reconstruction model, we applied the KD-Tree algorithm to measure the spatial minimum distance between M3 and MC. Through internal validation with randomly selected CBCT images, we compared the differences between the AI system, conventional measurement methods on the CBCT, and the gold standard measured by senior experts. Statistical analysis was performed using one-way ANOVA with Tukey HSD post-hoc tests (p < 0.05), employing multiple error metrics for comprehensive evaluation. One-way ANOVA revealed significant differences among measurement methods. Subsequent Tukey HSD post-hoc tests showed significant differences between the AI reconstruction model and conventional methods. The measurement accuracy of the AI system compared to the gold standard was 0.19 for mean error (ME), 0.18 for mean absolute error (MAE), 0.69 for mean square error (MSE), 0.83 for root mean square error (RMSE), and 0.96 for coefficient of determination (R<sup>2</sup>) (p < 0.01). These results indicate that the proposed AI system is highly accurate and reliable in M3-MC distance measurement and provides a powerful tool for preoperative risk assessment of M3 extraction.

Towards Globally Predictable k-Space Interpolation: A White-box Transformer Approach

Chen Luo, Qiyu Jin, Taofeng Xie, Xuemei Wang, Huayu Wang, Congcong Liu, Liming Tang, Guoqing Chen, Zhuo-Xu Cui, Dong Liang

arxiv logopreprintAug 6 2025
Interpolating missing data in k-space is essential for accelerating imaging. However, existing methods, including convolutional neural network-based deep learning, primarily exploit local predictability while overlooking the inherent global dependencies in k-space. Recently, Transformers have demonstrated remarkable success in natural language processing and image analysis due to their ability to capture long-range dependencies. This inspires the use of Transformers for k-space interpolation to better exploit its global structure. However, their lack of interpretability raises concerns regarding the reliability of interpolated data. To address this limitation, we propose GPI-WT, a white-box Transformer framework based on Globally Predictable Interpolation (GPI) for k-space. Specifically, we formulate GPI from the perspective of annihilation as a novel k-space structured low-rank (SLR) model. The global annihilation filters in the SLR model are treated as learnable parameters, and the subgradients of the SLR model naturally induce a learnable attention mechanism. By unfolding the subgradient-based optimization algorithm of SLR into a cascaded network, we construct the first white-box Transformer specifically designed for accelerated MRI. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches in k-space interpolation accuracy while providing superior interpretability.

MCA-GAN: A lightweight Multi-scale Context-Aware Generative Adversarial Network for MRI reconstruction.

Hou B, Du H

pubmed logopapersAug 6 2025
Magnetic Resonance Imaging (MRI) is widely utilized in medical imaging due to its high resolution and non-invasive nature. However, the prolonged acquisition time significantly limits its clinical applicability. Although traditional compressed sensing (CS) techniques can accelerate MRI acquisition, they often lead to degraded reconstruction quality under high undersampling rates. Deep learning-based methods, including CNN- and GAN-based approaches, have improved reconstruction performance, yet are limited by their local receptive fields, making it challenging to effectively capture long-range dependencies. Moreover, these models typically exhibit high computational complexity, which hinders their efficient deployment in practical scenarios. To address these challenges, we propose a lightweight Multi-scale Context-Aware Generative Adversarial Network (MCA-GAN), which enhances MRI reconstruction through dual-domain generators that collaboratively optimize both k-space and image-domain representations. MCA-GAN integrates several lightweight modules, including Depthwise Separable Local Attention (DWLA) for efficient local feature extraction, Adaptive Group Rearrangement Block (AGRB) for dynamic inter-group feature optimization, Multi-Scale Spatial Context Modulation Bridge (MSCMB) for multi-scale feature fusion in skip connections, and Channel-Spatial Multi-Scale Self-Attention (CSMS) for improved global context modeling. Extensive experiments conducted on the IXI, MICCAI 2013, and MRNet knee datasets demonstrate that MCA-GAN consistently outperforms existing methods in terms of PSNR and SSIM. Compared to SepGAN, the latest lightweight model, MCA-GAN achieves a 27.3% reduction in parameter size and a 19.6% reduction in computational complexity, while attaining the shortest reconstruction time among all compared methods. Furthermore, MCA-GAN exhibits robust performance across various undersampling masks and acceleration rates. Cross-dataset generalization experiments further confirm its ability to maintain competitive reconstruction quality, underscoring its strong generalization potential. Overall, MCA-GAN improves MRI reconstruction quality while significantly reducing computational cost through a lightweight architecture and multi-scale feature fusion, offering an efficient and accurate solution for accelerated MRI.

ATLASS: An AnaTomicaLly-Aware Self-Supervised Learning Framework for Generalizable Retinal Disease Detection.

Khan AA, Ahmad KM, Shafiq S, Akram MU, Shao J

pubmed logopapersAug 6 2025
Medical imaging, particularly retinal fundus photography, plays a crucial role in early disease detection and treatment for various ocular disorders. However, the development of robust diagnostic systems using deep learning remains constrained by the scarcity of expertly annotated data, which is time-consuming and expensive. Self-Supervised Learning (SSL) has emerged as a promising solution, but existing models fail to effectively incorporate critical domain knowledge specific to retinal anatomy. This potentially limits their clinical relevance and diagnostic capability. We address this issue by introducing an anatomically aware SSL framework that strategically integrates domain expertise through specialized masking of vital retinal structures during pretraining. Our approach leverages vessel and optic disc segmentation maps to guide the SSL process, enabling the development of clinically relevant feature representations without extensive labeled data. The framework combines a Vision Transformer with dual-masking strategies and anatomically informed loss functions to preserve structural integrity during feature learning. Comprehensive evaluation across multiple datasets demonstrates our method's competitive performance in diverse retinal disease classification tasks, including diabetic retinopathy grading, glaucoma detection, age-related macular degeneration identification, and multi-disease classification. The evaluation results establish the effectiveness of anatomically-aware SSL in advancing automated retinal disease diagnosis while addressing the fundamental challenge of limited labeled medical data.

Equivariant Spatiotemporal Transformers with MDL-Guided Feature Selection for Malignancy Detection in Dynamic PET

Dadashkarimi, M.

medrxiv logopreprintAug 6 2025
Dynamic Positron Emission Tomography (PET) scans offer rich spatiotemporal data for detecting malignancies, but their high-dimensionality and noise pose significant challenges. We introduce a novel framework, the Equivariant Spatiotemporal Transformer with MDL-Guided Feature Selection (EST-MDL), which integrates group-theoretic symmetries, Kolmogorov complexity, and Minimum Description Length (MDL) principles. By enforcing spatial and temporal symmetries (e.g., translations and rotations) and leveraging MDL for robust feature selection, our model achieves improved generalization and interpretability. Evaluated on three realworld PET datasets--LUNG-PET, BRAIN-PET, and BREAST-PET--our approach achieves AUCs of 0.94, 0.92, and 0.95, respectively, outperforming CNNs, Vision Transformers (ViTs), and Graph Neural Networks (GNNs) in AUC, sensitivity, specificity, and computational efficiency. This framework offers a robust, interpretable solution for malignancy detection in clinical settings.

CAPoxy: a feasibility study to investigate multispectral imaging in nailfold capillaroscopy

Taylor-Williams, M., Khalil, I., Manning, J., Dinsdale, G., Berks, M., Porcu, L., Wilkinson, S., Bohndiek, S., Murray, A.

medrxiv logopreprintAug 5 2025
BackgroundNailfold capillaroscopy enables visualisation of structural abnormalities in the microvasculature of patients with systemic sclerosis (SSc). The objective of this feasibility study was to determine whether multispectral imaging could provide functional assessment (differences in haemoglobin concentration or oxygenation) of capillaries to aid discrimination between healthy controls and patients with SSc. MSI of nailfold capillaries visualizes the smallest blood vessels and the impact of SSc on angiogenesis and their deformation, making it suitable for evaluating oxygenation-sensitive imaging techniques. Imaging of the nailfold capillaries offers tissue-specific oxygenation information, unlike pulse oximetry, which measures arterial blood oxygenation as a single-point measurement. MethodsThe CAPoxy study was a single-centre, cross-sectional, feasibility study of nailfold capillary multispectral imaging, comparing a cohort of patients with SSc to controls. A nine-band multispectral camera was used to image 22 individuals (10 patients with SSc and 12 controls). Linear mixed-effects models and summary statistics were used to compare the different regions of the nailfold (capillaries, surrounding edges, and outside area) between SSc and controls. A machine learning model was used to compare the two groups. ResultsPatients with SSc exhibited higher indicators of haemoglobin concentration in the capillary and adjacent regions compared to controls, which were significant in the regions surrounding the capillaries (p<0.001). There were also spectral differences between the SSc and controls groups that could indicate differences in oxygenation of the capillaries and surrounding tissue. Additionally, a machine learning model distinguished SSc patients from healthy controls with an accuracy of 84%, suggesting potential for multispectral imaging to classify SSc based on structural and functional microvascular changes. ConclusionsData indicates that multispectral imaging differentiates between patients with SSc from controls based on differences in vascular function. Further work to develop a targeted spectral camera would further improve the contrast between patients with SSc and controls, enabling better imaging. Key messagesMultispectral imaging holds promise for providing functional oxygenation measurement in nailfold capillaroscopy. Significant oxygenation differences between individuals with systemic sclerosis and healthy controls can be detected with multispectral imaging in the tissue surrounding capillaries.
Page 12 of 81804 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.