Sort by:
Page 336 of 3423413 results

X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography

Yifan Liu, Wuyang Li, Weihao Yu, Chenxin Li, Alexandre Alahi, Max Meng, Yixuan Yuan

arxiv logopreprintMay 21 2025
Computed Tomography serves as an indispensable tool in clinical workflows, providing non-invasive visualization of internal anatomical structures. Existing CT reconstruction works are limited to small-capacity model architecture, inflexible volume representation, and small-scale training data. In this paper, we present X-GRM (X-ray Gaussian Reconstruction Model), a large feedforward model for reconstructing 3D CT from sparse-view 2D X-ray projections. X-GRM employs a scalable transformer-based architecture to encode an arbitrary number of sparse X-ray inputs, where tokens from different views are integrated efficiently. Then, tokens are decoded into a new volume representation, named Voxel-based Gaussian Splatting (VoxGS), which enables efficient CT volume extraction and differentiable X-ray rendering. To support the training of X-GRM, we collect ReconX-15K, a large-scale CT reconstruction dataset containing around 15,000 CT/X-ray pairs across diverse organs, including the chest, abdomen, pelvis, and tooth etc. This combination of a high-capacity model, flexible volume representation, and large-scale training data empowers our model to produce high-quality reconstructions from various testing inputs, including in-domain and out-domain X-ray projections. Project Page: https://github.com/CUHK-AIM-Group/X-GRM.

SAMA-UNet: Enhancing Medical Image Segmentation with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning

Saqib Qamar, Mohd Fazil, Parvez Ahmad, Ghulam Muhammad

arxiv logopreprintMay 21 2025
Medical image segmentation plays an important role in various clinical applications, but existing models often struggle with the computational inefficiencies and challenges posed by complex medical data. State Space Sequence Models (SSMs) have demonstrated promise in modeling long-range dependencies with linear computational complexity, yet their application in medical image segmentation remains hindered by incompatibilities with image tokens and autoregressive assumptions. Moreover, it is difficult to achieve a balance in capturing both local fine-grained information and global semantic dependencies. To address these challenges, we introduce SAMA-UNet, a novel architecture for medical image segmentation. A key innovation is the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block, which integrates contextual self-attention with dynamic weight modulation to prioritise the most relevant features based on local and global contexts. This approach reduces computational complexity and improves the representation of complex image features across multiple scales. We also suggest the Causal-Resonance Multi-Scale Module (CR-MSM), which enhances the flow of information between the encoder and decoder by using causal resonance learning. This mechanism allows the model to automatically adjust feature resolution and causal dependencies across scales, leading to better semantic alignment between the low-level and high-level features in U-shaped architectures. Experiments on MRI, CT, and endoscopy images show that SAMA-UNet performs better in segmentation accuracy than current methods using CNN, Transformer, and Mamba. The implementation is publicly available at GitHub.

Customized GPT-4V(ision) for radiographic diagnosis: can large language model detect supernumerary teeth?

Aşar EM, İpek İ, Bi Lge K

pubmed logopapersMay 21 2025
With the growing capabilities of language models like ChatGPT to process text and images, this study evaluated their accuracy in detecting supernumerary teeth on periapical radiographs. A customized GPT-4V model (CGPT-4V) was also developed to assess whether domain-specific training could improve diagnostic performance compared to standard GPT-4V and GPT-4o models. One hundred eighty periapical radiographs (90 with and 90 without supernumerary teeth) were evaluated using GPT-4 V, GPT-4o, and a fine-tuned CGPT-4V model. Each image was assessed separately with the standardized prompt "Are there any supernumerary teeth in the radiograph above?" to avoid contextual bias. Three dental experts scored the responses using a three-point Likert scale for positive cases and a binary scale for negatives. Chi-square tests and ROC analysis were used to compare model performances (p < 0.05). Among the three models, CGPT-4 V exhibited the highest accuracy, detecting supernumerary teeth correctly in 91% of cases, compared to 77% for GPT-4o and 63% for GPT-4V. The CGPT-4V model also demonstrated a significantly lower false positive rate (16%) than GPT-4V (42%). A statistically significant difference was found between CGPT-4V and GPT-4o (p < 0.001), while no significant difference was observed between GPT-4V and CGPT-4V or between GPT-4V and GPT-4o. Additionally, CGPT-4V successfully identified multiple supernumerary teeth in radiographs where present. These findings highlight the diagnostic potential of customized GPT models in dental radiology. Future research should focus on multicenter validation, seamless clinical integration, and cost-effectiveness to support real-world implementation.

X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography

Yifan Liu, Wuyang Li, Weihao Yu, Chenxin Li, Alexandre Alahi, Max Meng, Yixuan Yuan

arxiv logopreprintMay 21 2025
Computed Tomography serves as an indispensable tool in clinical workflows, providing non-invasive visualization of internal anatomical structures. Existing CT reconstruction works are limited to small-capacity model architecture and inflexible volume representation. In this work, we present X-GRM (X-ray Gaussian Reconstruction Model), a large feedforward model for reconstructing 3D CT volumes from sparse-view 2D X-ray projections. X-GRM employs a scalable transformer-based architecture to encode sparse-view X-ray inputs, where tokens from different views are integrated efficiently. Then, these tokens are decoded into a novel volume representation, named Voxel-based Gaussian Splatting (VoxGS), which enables efficient CT volume extraction and differentiable X-ray rendering. This combination of a high-capacity model and flexible volume representation, empowers our model to produce high-quality reconstructions from various testing inputs, including in-domain and out-domain X-ray projections. Our codes are available at: https://github.com/CUHK-AIM-Group/X-GRM.

"DCSLK: Combined Large Kernel Shared Convolutional Model with Dynamic Channel Sampling".

Li Z, Luo S, Li H, Li Y

pubmed logopapersMay 20 2025
This study centers around the competition between Convolutional Neural Networks (CNNs) with large convolutional kernels and Vision Transformers in the domain of computer vision, delving deeply into the issues pertaining to parameters and computational complexity that stem from the utilization of large convolutional kernels. Even though the size of the convolutional kernels has been extended up to 51×51, the enhancement of performance has hit a plateau, and moreover, striped convolution incurs a performance degradation. Enlightened by the hierarchical visual processing mechanism inherent in humans, this research innovatively incorporates a shared parameter mechanism for large convolutional kernels. It synergizes the expansion of the receptive field enabled by large convolutional kernels with the extraction of fine-grained features facilitated by small convolutional kernels. To address the surging number of parameters, a meticulously designed parameter sharing mechanism is employed, featuring fine-grained processing in the central region of the convolutional kernel and wide-ranging parameter sharing in the periphery. This not only curtails the parameter count and mitigates the model complexity but also sustains the model's capacity to capture extensive spatial relationships. Additionally, in light of the problems of spatial feature information loss and augmented memory access during the 1×1 convolutional channel compression phase, this study further puts forward a dynamic channel sampling approach, which markedly elevates the accuracy of tumor subregion segmentation. To authenticate the efficacy of the proposed methodology, a comprehensive evaluation has been conducted on three brain tumor segmentation datasets, namely BraTs2020, BraTs2024, and Medical Segmentation Decathlon Brain 2018. The experimental results evince that the proposed model surpasses the current mainstream ConvNet and Transformer architectures across all performance metrics, proffering novel research perspectives and technical stratagems for the realm of medical image segmentation.

End-to-end Cortical Surface Reconstruction from Clinical Magnetic Resonance Images

Jesper Duemose Nielsen, Karthik Gopinath, Andrew Hoopes, Adrian Dalca, Colin Magdamo, Steven Arnold, Sudeshna Das, Axel Thielscher, Juan Eugenio Iglesias, Oula Puonti

arxiv logopreprintMay 20 2025
Surface-based cortical analysis is valuable for a variety of neuroimaging tasks, such as spatial normalization, parcellation, and gray matter (GM) thickness estimation. However, most tools for estimating cortical surfaces work exclusively on scans with at least 1 mm isotropic resolution and are tuned to a specific magnetic resonance (MR) contrast, often T1-weighted (T1w). This precludes application using most clinical MR scans, which are very heterogeneous in terms of contrast and resolution. Here, we use synthetic domain-randomized data to train the first neural network for explicit estimation of cortical surfaces from scans of any contrast and resolution, without retraining. Our method deforms a template mesh to the white matter (WM) surface, which guarantees topological correctness. This mesh is further deformed to estimate the GM surface. We compare our method to recon-all-clinical (RAC), an implicit surface reconstruction method which is currently the only other tool capable of processing heterogeneous clinical MR scans, on ADNI and a large clinical dataset (n=1,332). We show a approximately 50 % reduction in cortical thickness error (from 0.50 to 0.24 mm) with respect to RAC and better recovery of the aging-related cortical thinning patterns detected by FreeSurfer on high-resolution T1w scans. Our method enables fast and accurate surface reconstruction of clinical scans, allowing studies (1) with sample sizes far beyond what is feasible in a research setting, and (2) of clinical populations that are difficult to enroll in research studies. The code is publicly available at https://github.com/simnibs/brainnet.

Expert-guided StyleGAN2 image generation elevates AI diagnostic accuracy for maxillary sinus lesions.

Zeng P, Song R, Chen S, Li X, Li H, Chen Y, Gong Z, Cai G, Lin Y, Shi M, Huang K, Chen Z

pubmed logopapersMay 20 2025
The progress of artificial intelligence (AI) research in dental medicine is hindered by data acquisition challenges and imbalanced distributions. These problems are especially apparent when planning to develop AI-based diagnostic or analytic tools for various lesions, such as maxillary sinus lesions (MSL) including mucosal thickening and polypoid lesions. Traditional unsupervised generative models struggle to simultaneously control the image realism, diversity, and lesion-type specificity. This study establishes an expert-guided framework to overcome these limitations to elevate AI-based diagnostic accuracy. A StyleGAN2 framework was developed for generating clinically relevant MSL images (such as mucosal thickening and polypoid lesion) under expert control. The generated images were then integrated into training datasets to evaluate their effect on ResNet50's diagnostic performance. Here we show: 1) Both lesion subtypes achieve satisfactory fidelity metrics, with structural similarity indices (SSIM > 0.996) and maximum mean discrepancy values (MMD < 0.032), and clinical validation scores close to those of real images; 2) Integrating baseline datasets with synthetic images significantly enhances diagnostic accuracy for both internal and external test sets, particularly improving area under the precision-recall curve (AUPRC) by approximately 8% and 14% for mucosal thickening and polypoid lesions in the internal test set, respectively. The StyleGAN2-based image generation tool effectively addressed data scarcity and imbalance through high-quality MSL image synthesis, consequently boosting diagnostic model performance. This work not only facilitates AI-assisted preoperative assessment for maxillary sinus lift procedures but also establishes a methodological framework for overcoming data limitations in medical image analysis.

A multi-modal model integrating MRI habitat and clinicopathology to predict platinum sensitivity in patients with high-grade serous ovarian cancer: a diagnostic study.

Bi Q, Ai C, Meng Q, Wang Q, Li H, Zhou A, Shi W, Lei Y, Wu Y, Song Y, Xiao Z, Li H, Qiang J

pubmed logopapersMay 20 2025
Platinum resistance of high-grade serous ovarian cancer (HGSOC) cannot currently be recognized by specific molecular biomarkers. We aimed to compare the predictive capacity of various models integrating MRI habitat, whole slide images (WSIs), and clinical parameters to predict platinum sensitivity in HGSOC patients. A retrospective study involving 998 eligible patients from four hospitals was conducted. MRI habitats were clustered using K-means algorithm on multi-parametric MRI. Following feature extraction and selection, a Habitat model was developed. Vision Transformer (ViT) and multi-instance learning were trained to derive the patch-level prediction and WSI-level prediction on hematoxylin and eosin (H&E)-stained WSIs, respectively, forming a Pathology model. Logistic regression (LR) was used to create a Clinic model. A multi-modal model integrating Clinic, Habitat, and Pathology (CHP) was constructed using Multi-Head Attention (MHA) and compared with the unimodal models and Ensemble multi-modal models. The area under the curve (AUC) and integrated discrimination improvement (IDI) value were used to assess model performance and gains. In the internal validation cohort and the external test cohort, the Habitat model showed the highest AUCs (0.722 and 0.685) compared to the Clinic model (0.683 and 0.681) and the Pathology model (0.533 and 0.565), respectively. The AUCs (0.789 and 0.807) of the multi-modal model interating CHP based on MHA were highest than those of any unimodal models and Ensemble multi-modal models, with positive IDI values. MRI-based habitat imaging showed potentials to predict platinum sensitivity in HGSOC patients. Multi-modal integration of CHP based on MHA was helpful to improve prediction performance.

RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection

Wenjun Hou, Yi Cheng, Kaishuai Xu, Heng Li, Yan Hu, Wenjie Li, Jiang Liu

arxiv logopreprintMay 20 2025
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, including radiology report generation. Previous approaches have attempted to utilize multimodal LLMs for this task, enhancing their performance through the integration of domain-specific knowledge retrieval. However, these approaches often overlook the knowledge already embedded within the LLMs, leading to redundant information integration and inefficient utilization of learned representations. To address this limitation, we propose RADAR, a framework for enhancing radiology report generation with supplementary knowledge injection. RADAR improves report generation by systematically leveraging both the internal knowledge of an LLM and externally retrieved information. Specifically, it first extracts the model's acquired knowledge that aligns with expert image-based classification outputs. It then retrieves relevant supplementary knowledge to further enrich this information. Finally, by aggregating both sources, RADAR generates more accurate and informative radiology reports. Extensive experiments on MIMIC-CXR, CheXpert-Plus, and IU X-ray demonstrate that our model outperforms state-of-the-art LLMs in both language quality and clinical accuracy

TransMedSeg: A Transferable Semantic Framework for Semi-Supervised Medical Image Segmentation

Mengzhu Wang, Jiao Li, Shanshan Wang, Long Lan, Huibin Tan, Liang Yang, Guoli Yang

arxiv logopreprintMay 20 2025
Semi-supervised learning (SSL) has achieved significant progress in medical image segmentation (SSMIS) through effective utilization of limited labeled data. While current SSL methods for medical images predominantly rely on consistency regularization and pseudo-labeling, they often overlook transferable semantic relationships across different clinical domains and imaging modalities. To address this, we propose TransMedSeg, a novel transferable semantic framework for semi-supervised medical image segmentation. Our approach introduces a Transferable Semantic Augmentation (TSA) module, which implicitly enhances feature representations by aligning domain-invariant semantics through cross-domain distribution matching and intra-domain structural preservation. Specifically, TransMedSeg constructs a unified feature space where teacher network features are adaptively augmented towards student network semantics via a lightweight memory module, enabling implicit semantic transformation without explicit data generation. Interestingly, this augmentation is implicitly realized through an expected transferable cross-entropy loss computed over the augmented teacher distribution. An upper bound of the expected loss is theoretically derived and minimized during training, incurring negligible computational overhead. Extensive experiments on medical image datasets demonstrate that TransMedSeg outperforms existing semi-supervised methods, establishing a new direction for transferable representation learning in medical image analysis.
Page 336 of 3423413 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.