Sort by:
Page 4 of 1095 results

SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images

Kaiyu Guo, Tan Pan, Chen Jiang, Zijian Wang, Brian C. Lovell, Limei Han, Yuan Cheng, Mahsa Baktashmotlagh

arxiv logopreprintMay 22 2025
Medical anomaly detection (AD) is crucial for early clinical intervention, yet it faces challenges due to limited access to high-quality medical imaging data, caused by privacy concerns and data silos. Few-shot learning has emerged as a promising approach to alleviate these limitations by leveraging the large-scale prior knowledge embedded in vision-language models (VLMs). Recent advancements in few-shot medical AD have treated normal and abnormal cases as a one-class classification problem, often overlooking the distinction among multiple anomaly categories. Thus, in this paper, we propose a framework tailored for few-shot medical anomaly detection in the scenario where the identification of multiple anomaly categories is required. To capture the detailed radiological signs of medical anomaly categories, our framework incorporates diverse textual descriptions for each category generated by a Large-Language model, under the assumption that different anomalies in medical images may share common radiological signs in each category. Specifically, we introduce SD-MAD, a two-stage Sign-Driven few-shot Multi-Anomaly Detection framework: (i) Radiological signs are aligned with anomaly categories by amplifying inter-anomaly discrepancy; (ii) Aligned signs are selected further to mitigate the effect of the under-fitting and uncertain-sample issue caused by limited medical data, employing an automatic sign selection strategy at inference. Moreover, we propose three protocols to comprehensively quantify the performance of multi-anomaly detection. Extensive experiments illustrate the effectiveness of our method.

CMRINet: Joint Groupwise Registration and Segmentation for Cardiac Function Quantification from Cine-MRI

Mohamed S. Elmahdy, Marius Staring, Patrick J. H. de Koning, Samer Alabed, Mahan Salehi, Faisal Alandejani, Michael Sharkey, Ziad Aldabbagh, Andrew J. Swift, Rob J. van der Geest

arxiv logopreprintMay 22 2025
Accurate and efficient quantification of cardiac function is essential for the estimation of prognosis of cardiovascular diseases (CVDs). One of the most commonly used metrics for evaluating cardiac pumping performance is left ventricular ejection fraction (LVEF). However, LVEF can be affected by factors such as inter-observer variability and varying pre-load and after-load conditions, which can reduce its reproducibility. Additionally, cardiac dysfunction may not always manifest as alterations in LVEF, such as in heart failure and cardiotoxicity diseases. An alternative measure that can provide a relatively load-independent quantitative assessment of myocardial contractility is myocardial strain and strain rate. By using LVEF in combination with myocardial strain, it is possible to obtain a thorough description of cardiac function. Automated estimation of LVEF and other volumetric measures from cine-MRI sequences can be achieved through segmentation models, while strain calculation requires the estimation of tissue displacement between sequential frames, which can be accomplished using registration models. These tasks are often performed separately, potentially limiting the assessment of cardiac function. To address this issue, in this study we propose an end-to-end deep learning (DL) model that jointly estimates groupwise (GW) registration and segmentation for cardiac cine-MRI images. The proposed anatomically-guided Deep GW network was trained and validated on a large dataset of 4-chamber view cine-MRI image series of 374 subjects. A quantitative comparison with conventional GW registration using elastix and two DL-based methods showed that the proposed model improved performance and substantially reduced computation time.

SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation

Guohao Huo, Ruiting Dai, Hao Tang

arxiv logopreprintMay 22 2025
To address the challenge of complex pathological feature extraction in automated cardiac MRI segmentation, this study proposes an innovative dual-encoder architecture named SAMba-UNet. The framework achieves cross-modal feature collaborative learning by integrating the vision foundation model SAM2, the state-space model Mamba, and the classical UNet. To mitigate domain discrepancies between medical and natural images, a Dynamic Feature Fusion Refiner is designed, which enhances small lesion feature extraction through multi-scale pooling and a dual-path calibration mechanism across channel and spatial dimensions. Furthermore, a Heterogeneous Omni-Attention Convergence Module (HOACM) is introduced, combining global contextual attention with branch-selective emphasis mechanisms to effectively fuse SAM2's local positional semantics and Mamba's long-range dependency modeling capabilities. Experiments on the ACDC cardiac MRI dataset demonstrate that the proposed model achieves a Dice coefficient of 0.9103 and an HD95 boundary error of 1.0859 mm, significantly outperforming existing methods, particularly in boundary localization for complex pathological structures such as right ventricular anomalies. This work provides an efficient and reliable solution for automated cardiac disease diagnosis, and the code will be open-sourced.

CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering

Yuren Mao, Wenyi Xu, Yuyang Qin, Yunjun Gao

arxiv logopreprintMay 22 2025
Computed Tomography (CT) scan, which produces 3D volumetric medical data that can be viewed as hundreds of cross-sectional images (a.k.a. slices), provides detailed anatomical information for diagnosis. For radiologists, creating CT radiology reports is time-consuming and error-prone. A visual question answering (VQA) system that can answer radiologists' questions about some anatomical regions on the CT scan and even automatically generate a radiology report is urgently needed. However, existing VQA systems cannot adequately handle the CT radiology question answering (CTQA) task for: (1) anatomic complexity makes CT images difficult to understand; (2) spatial relationship across hundreds slices is difficult to capture. To address these issues, this paper proposes CT-Agent, a multimodal agentic framework for CTQA. CT-Agent adopts anatomically independent tools to break down the anatomic complexity; furthermore, it efficiently captures the across-slice spatial relationship with a global-local token compression strategy. Experimental results on two 3D chest CT datasets, CT-RATE and RadGenome-ChestCT, verify the superior performance of CT-Agent.

Render-FM: A Foundation Model for Real-time Photorealistic Volumetric Rendering

Zhongpai Gao, Meng Zheng, Benjamin Planche, Anwesa Choudhuri, Terrence Chen, Ziyan Wu

arxiv logopreprintMay 22 2025
Volumetric rendering of Computed Tomography (CT) scans is crucial for visualizing complex 3D anatomical structures in medical imaging. Current high-fidelity approaches, especially neural rendering techniques, require time-consuming per-scene optimization, limiting clinical applicability due to computational demands and poor generalizability. We propose Render-FM, a novel foundation model for direct, real-time volumetric rendering of CT scans. Render-FM employs an encoder-decoder architecture that directly regresses 6D Gaussian Splatting (6DGS) parameters from CT volumes, eliminating per-scan optimization through large-scale pre-training on diverse medical data. By integrating robust feature extraction with the expressive power of 6DGS, our approach efficiently generates high-quality, real-time interactive 3D visualizations across diverse clinical CT data. Experiments demonstrate that Render-FM achieves visual fidelity comparable or superior to specialized per-scan methods while drastically reducing preparation time from nearly an hour to seconds for a single inference step. This advancement enables seamless integration into real-time surgical planning and diagnostic workflows. The project page is: https://gaozhongpai.github.io/renderfm/.

VET-DINO: Learning Anatomical Understanding Through Multi-View Distillation in Veterinary Imaging

Andre Dourson, Kylie Taylor, Xiaoli Qiao, Michael Fitzke

arxiv logopreprintMay 21 2025
Self-supervised learning has emerged as a powerful paradigm for training deep neural networks, particularly in medical imaging where labeled data is scarce. While current approaches typically rely on synthetic augmentations of single images, we propose VET-DINO, a framework that leverages a unique characteristic of medical imaging: the availability of multiple standardized views from the same study. Using a series of clinical veterinary radiographs from the same patient study, we enable models to learn view-invariant anatomical structures and develop an implied 3D understanding from 2D projections. We demonstrate our approach on a dataset of 5 million veterinary radiographs from 668,000 canine studies. Through extensive experimentation, including view synthesis and downstream task performance, we show that learning from real multi-view pairs leads to superior anatomical understanding compared to purely synthetic augmentations. VET-DINO achieves state-of-the-art performance on various veterinary imaging tasks. Our work establishes a new paradigm for self-supervised learning in medical imaging that leverages domain-specific properties rather than merely adapting natural image techniques.

Machine Learning Derived Blood Input for Dynamic PET Images of Rat Heart

Shubhrangshu Debsarkar, Bijoy Kundu

arxiv logopreprintMay 21 2025
Dynamic FDG PET imaging study of n = 52 rats including 26 control Wistar-Kyoto (WKY) rats and 26 experimental spontaneously hypertensive rats (SHR) were performed using a Siemens microPET and Albira trimodal scanner longitudinally at 1, 2, 3, 5, 9, 12 and 18 months of age. A 15-parameter dual output model correcting for spill over contamination and partial volume effects with peak fitting cost functions was developed for simultaneous estimation of model corrected blood input function (MCIF) and kinetic rate constants for dynamic FDG PET images of rat heart in vivo. Major drawbacks of this model are its dependence on manual annotations for the Image Derived Input Function (IDIF) and manual determination of crucial model parameters to compute MCIF. To overcome these limitations, we performed semi-automated segmentation and then formulated a Long-Short-Term Memory (LSTM) cell network to train and predict MCIF in test data using a concatenation of IDIFs and myocardial inputs and compared them with reference-modeled MCIF. Thresholding along 2D plane slices with two thresholds, with T1 representing high-intensity myocardium, and T2 representing lower-intensity rings, was used to segment the area of the LV blood pool. The resultant IDIF and myocardial TACs were used to compute the corresponding reference (model) MCIF for all data sets. The segmented IDIF and the myocardium formed the input for the LSTM network. A k-fold cross validation structure with a 33:8:11 split and 5 folds was utilized to create the model and evaluate the performance of the LSTM network for all datasets. To overcome the sparseness of data as time steps increase, midpoint interpolation was utilized to increase the density of datapoints beyond time = 10 minutes. The model utilizing midpoint interpolation was able to achieve a 56.4% improvement over previous Mean Squared Error (MSE).

Reconsider the Template Mesh in Deep Learning-based Mesh Reconstruction

Fengting Zhang, Boxu Liang, Qinghao Liu, Min Liu, Xiang Chen, Yaonan Wang

arxiv logopreprintMay 21 2025
Mesh reconstruction is a cornerstone process across various applications, including in-silico trials, digital twins, surgical planning, and navigation. Recent advancements in deep learning have notably enhanced mesh reconstruction speeds. Yet, traditional methods predominantly rely on deforming a standardised template mesh for individual subjects, which overlooks the unique anatomical variations between them, and may compromise the fidelity of the reconstructions. In this paper, we propose an adaptive-template-based mesh reconstruction network (ATMRN), which generates adaptive templates from the given images for the subsequent deformation, moving beyond the constraints of a singular, fixed template. Our approach, validated on cortical magnetic resonance (MR) images from the OASIS dataset, sets a new benchmark in voxel-to-cortex mesh reconstruction, achieving an average symmetric surface distance of 0.267mm across four cortical structures. Our proposed method is generic and can be easily transferred to other image modalities and anatomical structures.

X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography

Yifan Liu, Wuyang Li, Weihao Yu, Chenxin Li, Alexandre Alahi, Max Meng, Yixuan Yuan

arxiv logopreprintMay 21 2025
Computed Tomography serves as an indispensable tool in clinical workflows, providing non-invasive visualization of internal anatomical structures. Existing CT reconstruction works are limited to small-capacity model architecture, inflexible volume representation, and small-scale training data. In this paper, we present X-GRM (X-ray Gaussian Reconstruction Model), a large feedforward model for reconstructing 3D CT from sparse-view 2D X-ray projections. X-GRM employs a scalable transformer-based architecture to encode an arbitrary number of sparse X-ray inputs, where tokens from different views are integrated efficiently. Then, tokens are decoded into a new volume representation, named Voxel-based Gaussian Splatting (VoxGS), which enables efficient CT volume extraction and differentiable X-ray rendering. To support the training of X-GRM, we collect ReconX-15K, a large-scale CT reconstruction dataset containing around 15,000 CT/X-ray pairs across diverse organs, including the chest, abdomen, pelvis, and tooth etc. This combination of a high-capacity model, flexible volume representation, and large-scale training data empowers our model to produce high-quality reconstructions from various testing inputs, including in-domain and out-domain X-ray projections. Project Page: https://github.com/CUHK-AIM-Group/X-GRM.

SAMA-UNet: Enhancing Medical Image Segmentation with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning

Saqib Qamar, Mohd Fazil, Parvez Ahmad, Ghulam Muhammad

arxiv logopreprintMay 21 2025
Medical image segmentation plays an important role in various clinical applications, but existing models often struggle with the computational inefficiencies and challenges posed by complex medical data. State Space Sequence Models (SSMs) have demonstrated promise in modeling long-range dependencies with linear computational complexity, yet their application in medical image segmentation remains hindered by incompatibilities with image tokens and autoregressive assumptions. Moreover, it is difficult to achieve a balance in capturing both local fine-grained information and global semantic dependencies. To address these challenges, we introduce SAMA-UNet, a novel architecture for medical image segmentation. A key innovation is the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block, which integrates contextual self-attention with dynamic weight modulation to prioritise the most relevant features based on local and global contexts. This approach reduces computational complexity and improves the representation of complex image features across multiple scales. We also suggest the Causal-Resonance Multi-Scale Module (CR-MSM), which enhances the flow of information between the encoder and decoder by using causal resonance learning. This mechanism allows the model to automatically adjust feature resolution and causal dependencies across scales, leading to better semantic alignment between the low-level and high-level features in U-shaped architectures. Experiments on MRI, CT, and endoscopy images show that SAMA-UNet performs better in segmentation accuracy than current methods using CNN, Transformer, and Mamba. The implementation is publicly available at GitHub.
Page 4 of 1095 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.