Sort by:
Page 1 of 110 results

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Yinuo Wang, Yue Zeng, Kai Chen, Cai Meng, Chao Pan, Zhouping Tang

arxiv logopreprintMay 14 2025
Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH binary classification and subtyping. Methods: We utilized a dataset provided by RSNA, comprising 192 NCCT volumes. The study compares various MLLMs, including GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet V2, with conventional deep learning models, including ResNet50 and Vision Transformer. Carefully crafted prompts were used to guide MLLMs in tasks such as ICH presence, subtype classification, localization, and volume estimation. Results: The results indicate that in the ICH binary classification task, traditional deep learning models outperform MLLMs comprehensively. For subtype classification, MLLMs also exhibit inferior performance compared to traditional deep learning models, with Gemini 2.0 Flash achieving an macro-averaged precision of 0.41 and a macro-averaged F1 score of 0.31. Conclusion: While MLLMs excel in interactive capabilities, their overall accuracy in ICH subtyping is inferior to deep networks. However, MLLMs enhance interpretability through language interactions, indicating potential in medical imaging analysis. Future efforts will focus on model refinement and developing more precise MLLMs to improve performance in three-dimensional medical image processing.

Highly Undersampled MRI Reconstruction via a Single Posterior Sampling of Diffusion Models

Jin Liu, Qing Lin, Zhuang Xiong, Shanshan Shan, Chunyi Liu, Min Li, Feng Liu, G. Bruce Pike, Hongfu Sun, Yang Gao

arxiv logopreprintMay 13 2025
Incoherent k-space under-sampling and deep learning-based reconstruction methods have shown great success in accelerating MRI. However, the performance of most previous methods will degrade dramatically under high acceleration factors, e.g., 8$\times$ or higher. Recently, denoising diffusion models (DM) have demonstrated promising results in solving this issue; however, one major drawback of the DM methods is the long inference time due to a dramatic number of iterative reverse posterior sampling steps. In this work, a Single Step Diffusion Model-based reconstruction framework, namely SSDM-MRI, is proposed for restoring MRI images from highly undersampled k-space. The proposed method achieves one-step reconstruction by first training a conditional DM and then iteratively distilling this model. Comprehensive experiments were conducted on both publicly available fastMRI images and an in-house multi-echo GRE (QSM) subject. Overall, the results showed that SSDM-MRI outperformed other methods in terms of numerical metrics (PSNR and SSIM), qualitative error maps, image fine details, and latent susceptibility information hidden in MRI phase images. In addition, the reconstruction time for a 320*320 brain slice of SSDM-MRI is only 0.45 second, which is only comparable to that of a simple U-net, making it a highly effective solution for MRI reconstruction tasks.

ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao

arxiv logopreprintMay 12 2025
Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for preserving modality-specific edge and texture details, and Mamba's selective state-space modeling for efficient long- and short-range feature dependencies. Structurally, our dual-resolution framework leverages SAM2's image encoder to capture organ-scale semantics from high-resolution inputs, while a parallel CNNs branch extracts fine-grained local features. The Robust Feature Fusion Network (RFFN) integrates these epresentations, and the Bidirectional Mamba Residual Network (BMRN) models spatial dependencies using spiral scanning and bidirectional state-space dynamics. A three-stage skip fusion decoder enhances edge and texture fidelity. We employ Efficient Low-Rank Adaptation (LoRA+) fine-tuning to enable precise domain specialization while maintaining the foundational capabilities of the pre-trained components. Extensive experimental validation on the SynthRAD2023 and BraTS2019 datasets demonstrates that ABS-Mamba outperforms state-of-the-art methods, delivering high-fidelity cross-modal synthesis that preserves anatomical semantics and structural details to enhance diagnostic accuracy in clinical applications. The code is available at https://github.com/gatina-yone/ABS-Mamba

Multi-Plane Vision Transformer for Hemorrhage Classification Using Axial and Sagittal MRI Data

Badhan Kumar Das, Gengyan Zhao, Boris Mailhe, Thomas J. Re, Dorin Comaniciu, Eli Gibson, Andreas Maier

arxiv logopreprintMay 12 2025
Identifying brain hemorrhages from magnetic resonance imaging (MRI) is a critical task for healthcare professionals. The diverse nature of MRI acquisitions with varying contrasts and orientation introduce complexity in identifying hemorrhage using neural networks. For acquisitions with varying orientations, traditional methods often involve resampling images to a fixed plane, which can lead to information loss. To address this, we propose a 3D multi-plane vision transformer (MP-ViT) for hemorrhage classification with varying orientation data. It employs two separate transformer encoders for axial and sagittal contrasts, using cross-attention to integrate information across orientations. MP-ViT also includes a modality indication vector to provide missing contrast information to the model. The effectiveness of the proposed model is demonstrated with extensive experiments on real world clinical dataset consists of 10,084 training, 1,289 validation and 1,496 test subjects. MP-ViT achieved substantial improvement in area under the curve (AUC), outperforming the vision transformer (ViT) by 5.5% and CNN-based architectures by 1.8%. These results highlight the potential of MP-ViT in improving performance for hemorrhage detection when different orientation contrasts are needed.

Robust & Precise Knowledge Distillation-based Novel Context-Aware Predictor for Disease Detection in Brain and Gastrointestinal

Saif Ur Rehman Khan, Muhammad Nabeel Asim, Sebastian Vollmer, Andreas Dengel

arxiv logopreprintMay 9 2025
Medical disease prediction, particularly through imaging, remains a challenging task due to the complexity and variability of medical data, including noise, ambiguity, and differing image quality. Recent deep learning models, including Knowledge Distillation (KD) methods, have shown promising results in brain tumor image identification but still face limitations in handling uncertainty and generalizing across diverse medical conditions. Traditional KD methods often rely on a context-unaware temperature parameter to soften teacher model predictions, which does not adapt effectively to varying uncertainty levels present in medical images. To address this issue, we propose a novel framework that integrates Ant Colony Optimization (ACO) for optimal teacher-student model selection and a novel context-aware predictor approach for temperature scaling. The proposed context-aware framework adjusts the temperature based on factors such as image quality, disease complexity, and teacher model confidence, allowing for more robust knowledge transfer. Additionally, ACO efficiently selects the most appropriate teacher-student model pair from a set of pre-trained models, outperforming current optimization methods by exploring a broader solution space and better handling complex, non-linear relationships within the data. The proposed framework is evaluated using three publicly available benchmark datasets, each corresponding to a distinct medical imaging task. The results demonstrate that the proposed framework significantly outperforms current state-of-the-art methods, achieving top accuracy rates: 98.01% on the MRI brain tumor (Kaggle) dataset, 92.81% on the Figshare MRI dataset, and 96.20% on the GastroNet dataset. This enhanced performance is further evidenced by the improved results, surpassing existing benchmarks of 97.24% (Kaggle), 91.43% (Figshare), and 95.00% (GastroNet).

Improved Brain Tumor Detection in MRI: Fuzzy Sigmoid Convolution in Deep Learning

Muhammad Irfan, Anum Nawaz, Riku Klen, Abdulhamit Subasi, Tomi Westerlund, Wei Chen

arxiv logopreprintMay 8 2025
Early detection and accurate diagnosis are essential to improving patient outcomes. The use of convolutional neural networks (CNNs) for tumor detection has shown promise, but existing models often suffer from overparameterization, which limits their performance gains. In this study, fuzzy sigmoid convolution (FSC) is introduced along with two additional modules: top-of-the-funnel and middle-of-the-funnel. The proposed methodology significantly reduces the number of trainable parameters without compromising classification accuracy. A novel convolutional operator is central to this approach, effectively dilating the receptive field while preserving input data integrity. This enables efficient feature map reduction and enhances the model's tumor detection capability. In the FSC-based model, fuzzy sigmoid activation functions are incorporated within convolutional layers to improve feature extraction and classification. The inclusion of fuzzy logic into the architecture improves its adaptability and robustness. Extensive experiments on three benchmark datasets demonstrate the superior performance and efficiency of the proposed model. The FSC-based architecture achieved classification accuracies of 99.17%, 99.75%, and 99.89% on three different datasets. The model employs 100 times fewer parameters than large-scale transfer learning architectures, highlighting its computational efficiency and suitability for detecting brain tumors early. This research offers lightweight, high-performance deep-learning models for medical imaging applications.

FF-PNet: A Pyramid Network Based on Feature and Field for Brain Image Registration

Ying Zhang, Shuai Guo, Chenxi Sun, Yuchen Zhu, Jinhai Xiang

arxiv logopreprintMay 8 2025
In recent years, deformable medical image registration techniques have made significant progress. However, existing models still lack efficiency in parallel extraction of coarse and fine-grained features. To address this, we construct a new pyramid registration network based on feature and deformation field (FF-PNet). For coarse-grained feature extraction, we design a Residual Feature Fusion Module (RFFM), for fine-grained image deformation, we propose a Residual Deformation Field Fusion Module (RDFFM). Through the parallel operation of these two modules, the model can effectively handle complex image deformations. It is worth emphasizing that the encoding stage of FF-PNet only employs traditional convolutional neural networks without any attention mechanisms or multilayer perceptrons, yet it still achieves remarkable improvements in registration accuracy, fully demonstrating the superior feature decoding capabilities of RFFM and RDFFM. We conducted extensive experiments on the LPBA and OASIS datasets. The results show our network consistently outperforms popular methods in metrics like the Dice Similarity Coefficient.

Advancing 3D Medical Image Segmentation: Unleashing the Potential of Planarian Neural Networks in Artificial Intelligence

Ziyuan Huang, Kevin Huggins, Srikar Bellur

arxiv logopreprintMay 7 2025
Our study presents PNN-UNet as a method for constructing deep neural networks that replicate the planarian neural network (PNN) structure in the context of 3D medical image data. Planarians typically have a cerebral structure comprising two neural cords, where the cerebrum acts as a coordinator, and the neural cords serve slightly different purposes within the organism's neurological system. Accordingly, PNN-UNet comprises a Deep-UNet and a Wide-UNet as the nerve cords, with a densely connected autoencoder performing the role of the brain. This distinct architecture offers advantages over both monolithic (UNet) and modular networks (Ensemble-UNet). Our outcomes on a 3D MRI hippocampus dataset, with and without data augmentation, demonstrate that PNN-UNet outperforms the baseline UNet and several other UNet variants in image segmentation.

An imageless magnetic resonance framework for fast and cost-effective decision-making

Alba González-Cebrián, Pablo García-Cristóbal, Fernando Galve, Efe Ilıcak, Viktor Van Der Valk, Marius Staring, Andrew Webb, Joseba Alonso

arxiv logopreprintMay 7 2025
Magnetic Resonance Imaging (MRI) is the gold standard in countless diagnostic procedures, yet hardware complexity, long scans, and cost preclude rapid screening and point-of-care use. We introduce Imageless Magnetic Resonance Diagnosis (IMRD), a framework that bypasses k-space sampling and image reconstruction by analyzing raw one-dimensional MR signals. We identify potentially impactful embodiments where IMRD requires only optimized pulse sequences for time-domain contrast, minimal low-field hardware, and pattern recognition algorithms to answer clinical closed queries and quantify lesion burden. As a proof of concept, we simulate multiple sclerosis lesions in silico within brain phantoms and deploy two extremely fast protocols (approximately 3 s), with and without spatial information. A 1D convolutional neural network achieves AUC close to 0.95 for lesion detection and R2 close to 0.99 for volume estimation. We also perform robustness tests under reduced signal-to-noise ratio, partial signal omission, and relaxation-time variability. By reframing MR signals as direct diagnostic metrics, IMRD paves the way for fast, low-cost MR screening and monitoring in resource-limited environments.

3D Brain MRI Classification for Alzheimer Diagnosis Using CNN with Data Augmentation

Thien Nhan Vo, Bac Nam Ho, Thanh Xuan Truong

arxiv logopreprintMay 7 2025
A three-dimensional convolutional neural network was developed to classify T1-weighted brain MRI scans as healthy or Alzheimer. The network comprises 3D convolution, pooling, batch normalization, dense ReLU layers, and a sigmoid output. Using stochastic noise injection and five-fold cross-validation, the model achieved test set accuracy of 0.912 and area under the ROC curve of 0.961, an improvement of approximately 0.027 over resizing alone. Sensitivity and specificity both exceeded 0.90. These results align with prior work reporting up to 0.10 gain via synthetic augmentation. The findings demonstrate the effectiveness of simple augmentation for 3D MRI classification and motivate future exploration of advanced augmentation methods and architectures such as 3D U-Net and vision transformers.
Page 1 of 110 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.