Sort by:
Page 3 of 42416 results

From Explainable to Explained AI: Ideas for Falsifying and Quantifying Explanations

Yoni Schirris, Eric Marcus, Jonas Teuwen, Hugo Horlings, Efstratios Gavves

arxiv logopreprintAug 9 2025
Explaining deep learning models is essential for clinical integration of medical image analysis systems. A good explanation highlights if a model depends on spurious features that undermines generalization and harms a subset of patients or, conversely, may present novel biological insights. Although techniques like GradCAM can identify influential features, they are measurement tools that do not themselves form an explanation. We propose a human-machine-VLM interaction system tailored to explaining classifiers in computational pathology, including multi-instance learning for whole-slide images. Our proof of concept comprises (1) an AI-integrated slide viewer to run sliding-window experiments to test claims of an explanation, and (2) quantification of an explanation's predictiveness using general-purpose vision-language models. The results demonstrate that this allows us to qualitatively test claims of explanations and can quantifiably distinguish competing explanations. This offers a practical path from explainable AI to explained AI in digital pathology and beyond. Code and prompts are available at https://github.com/nki-ai/x2x.

DiffUS: Differentiable Ultrasound Rendering from Volumetric Imaging

Noe Bertramo, Gabriel Duguey, Vivek Gopalakrishnan

arxiv logopreprintAug 9 2025
Intraoperative ultrasound imaging provides real-time guidance during numerous surgical procedures, but its interpretation is complicated by noise, artifacts, and poor alignment with high-resolution preoperative MRI/CT scans. To bridge the gap between reoperative planning and intraoperative guidance, we present DiffUS, a physics-based, differentiable ultrasound renderer that synthesizes realistic B-mode images from volumetric imaging. DiffUS first converts MRI 3D scans into acoustic impedance volumes using a machine learning approach. Next, we simulate ultrasound beam propagation using ray tracing with coupled reflection-transmission equations. DiffUS formulates wave propagation as a sparse linear system that captures multiple internal reflections. Finally, we reconstruct B-mode images via depth-resolved echo extraction across fan-shaped acquisition geometry, incorporating realistic artifacts including speckle noise and depth-dependent degradation. DiffUS is entirely implemented as differentiable tensor operations in PyTorch, enabling gradient-based optimization for downstream applications such as slice-to-volume registration and volumetric reconstruction. Evaluation on the ReMIND dataset demonstrates DiffUS's ability to generate anatomically accurate ultrasound images from brain MRI data.

Text Embedded Swin-UMamba for DeepLesion Segmentation

Ruida Cheng, Tejas Sudharshan Mathai, Pritam Mukherjee, Benjamin Hou, Qingqing Zhu, Zhiyong Lu, Matthew McAuliffe, Ronald M. Summers

arxiv logopreprintAug 8 2025
Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba

Deep Learning Chest X-Ray Age, Epigenetic Aging Clocks and Associations with Age-Related Subclinical Disease in the Project Baseline Health Study.

Chandra J, Short S, Rodriguez F, Maron DJ, Pagidipati N, Hernandez AF, Mahaffey KW, Shah SH, Kiel DP, Lu MT, Raghu VK

pubmed logopapersAug 8 2025
Chronological age is an important component of medical risk scores and decision-making. However, there is considerable variability in how individuals age. We recently published an open-source deep learning model to assess biological age from chest radiographs (CXR-Age), which predicts all-cause and cardiovascular mortality better than chronological age. Here, we compare CXR-Age to two established epigenetic aging clocks (First generation-Horvath Age; Second generation-DNAm PhenoAge) to test which is more strongly associated with cardiopulmonary disease and frailty. Our cohort consisted of 2,097 participants from the Project Baseline Health Study, a prospective cohort study of individuals from four US sites. We compared the association between the different aging clocks and measures of cardiopulmonary disease, frailty, and protein abundance collected at the participant's first annual visit using linear regression models adjusted for common confounders. We found that CXR-Age was associated with coronary calcium, cardiovascular risk factors, worsening pulmonary function, increased frailty, and abundance in plasma of two proteins implicated in neuroinflammation and aging. Associations with DNAm PhenoAge were weaker for pulmonary function and all metrics in middle-age adults. We identified thirteen proteins that were associated with DNAm PhenoAge, one (CDH13) of which was also associated with CXR-Age. No associations were found with Horvath Age. These results suggest that CXR-Age may serve as a better metric of cardiopulmonary aging than epigenetic aging clocks, especially in midlife adults.

SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation

Guido Manni, Clemente Lauretti, Loredana Zollo, Paolo Soda

arxiv logopreprintAug 8 2025
Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces a novel GAN-based semi-supervised learning framework specifically designed for low labeled-data regimes, evaluated across settings with 5 to 50 labeled samples per class. Our approach integrates three specialized neural networks -- a generator for class-conditioned image translation, a discriminator for authenticity assessment and classification, and a dedicated classifier -- within a three-phase training framework. The method alternates between supervised training on limited labeled data and unsupervised learning that leverages abundant unlabeled images through image-to-image translation rather than generation from noise. We employ ensemble-based pseudo-labeling that combines confidence-weighted predictions from the discriminator and classifier with temporal consistency through exponential moving averaging, enabling reliable label estimation for unlabeled data. Comprehensive evaluation across eleven MedMNIST datasets demonstrates that our approach achieves statistically significant improvements over six state-of-the-art GAN-based semi-supervised methods, with particularly strong performance in the extreme 5-shot setting where the scarcity of labeled data is most challenging. The framework maintains its superiority across all evaluated settings (5, 10, 20, and 50 shots per class). Our approach offers a practical solution for medical imaging applications where annotation costs are prohibitive, enabling robust classification performance even with minimal labeled data. Code is available at https://github.com/GuidoManni/SPARSE.

Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Lung Nodule Malignancy Prediction.

Zhuang L, Tabatabaei SMH, Salehi-Rad R, Tran LM, Aberle DR, Prosper AE, Hsu W

pubmed logopapersAug 8 2025
Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists' assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer. We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,246 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis. Our model outperformed state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.812), nodule consistency (0.812), and pleural attachment (0.840). Our approach surpasses the SOTA models in predicting lung cancer across datasets collected from diverse clinical settings, providing explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. This approach also prevents the model from learning shortcuts and generalizes across clinical settings. The code is available at https://github.com/luotingzhuang/CLIP_nodule.

Towards MR-Based Trochleoplasty Planning

Michael Wehrli, Alicia Durrer, Paul Friedrich, Sidaty El Hadramy, Edwin Li, Luana Brahaj, Carol C. Hasler, Philippe C. Cattin

arxiv logopreprintAug 8 2025
To treat Trochlear Dysplasia (TD), current approaches rely mainly on low-resolution clinical Magnetic Resonance (MR) scans and surgical intuition. The surgeries are planned based on surgeons experience, have limited adoption of minimally invasive techniques, and lead to inconsistent outcomes. We propose a pipeline that generates super-resolved, patient-specific 3D pseudo-healthy target morphologies from conventional clinical MR scans. First, we compute an isotropic super-resolved MR volume using an Implicit Neural Representation (INR). Next, we segment femur, tibia, patella, and fibula with a multi-label custom-trained network. Finally, we train a Wavelet Diffusion Model (WDM) to generate pseudo-healthy target morphologies of the trochlear region. In contrast to prior work producing pseudo-healthy low-resolution 3D MR images, our approach enables the generation of sub-millimeter resolved 3D shapes compatible for pre- and intraoperative use. These can serve as preoperative blueprints for reshaping the femoral groove while preserving the native patella articulation. Furthermore, and in contrast to other work, we do not require a CT for our pipeline - reducing the amount of radiation. We evaluated our approach on 25 TD patients and could show that our target morphologies significantly improve the sulcus angle (SA) and trochlear groove depth (TGD). The code and interactive visualization are available at https://wehrlimi.github.io/sr-3d-planning/.

An Anisotropic Cross-View Texture Transfer with Multi-Reference Non-Local Attention for CT Slice Interpolation.

Uhm KH, Cho H, Hong SH, Jung SW

pubmed logopapersAug 8 2025
Computed tomography (CT) is one of the most widely used non-invasive imaging modalities for medical diagnosis. In clinical practice, CT images are usually acquired with large slice thicknesses due to the high cost of memory storage and operation time, resulting in an anisotropic CT volume with much lower inter-slice resolution than in-plane resolution. Since such inconsistent resolution may lead to difficulties in disease diagnosis, deep learning-based volumetric super-resolution methods have been developed to improve inter-slice resolution. Most existing methods conduct single-image super-resolution on the through-plane or synthesize intermediate slices from adjacent slices; however, the anisotropic characteristic of 3D CT volume has not been well explored. In this paper, we propose a novel cross-view texture transfer approach for CT slice interpolation by fully utilizing the anisotropic nature of 3D CT volume. Specifically, we design a unique framework that takes high-resolution in-plane texture details as a reference and transfers them to low-resolution through-plane images. To this end, we introduce a multi-reference non-local attention module that extracts meaningful features for reconstructing through-plane high-frequency details from multiple in-plane images. Through extensive experiments, we demonstrate that our method performs significantly better in CT slice interpolation than existing competing methods on public CT datasets including a real-paired benchmark, verifying the effectiveness of the proposed framework. The source code of this work is available at https://github.com/khuhm/ACVTT.

impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction

Maria Boyko, Aleksandra Beliaeva, Dmitriy Kornilov, Alexander Bernstein, Maxim Sharaev

arxiv logopreprintAug 8 2025
The use of diverse modalities, such as omics, medical images, and clinical data can not only improve the performance of prognostic models but also deepen an understanding of disease mechanisms and facilitate the development of novel treatment approaches. However, medical data are complex, often incomplete, and contains missing modalities, making effective handling its crucial for training multimodal models. We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy. It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches. Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets, integrating five modalities: genetic (DNAm, RNA-seq), imaging (MRI, WSI), and clinical data. By addressing missing data during pre-training and enabling efficient resource utilization, impuTMAE surpasses prior multimodal approaches, achieving state-of-the-art performance in glioma patient survival prediction. Our code is available at https://github.com/maryjis/mtcp

X-UNet:A novel global context-aware collaborative fusion U-shaped network with progressive feature fusion of codec for medical image segmentation.

Xu S, Chen Y, Zhang X, Sun F, Chen S, Ou Y, Luo C

pubmed logopapersAug 7 2025
Due to the inductive bias of convolutions, CNNs perform hierarchical feature extraction efficiently in the field of medical image segmentation. However, the local correlation assumption of inductive bias limits the ability of convolutions to focus on global information, which has led to the performance of Transformer-based methods surpassing that of CNNs in some segmentation tasks in recent years. Although combining with Transformers can solve this problem, it also introduces computational complexity and considerable parameters. In addition, narrowing the encoder-decoder semantic gap for high-quality mask generation is a key challenge, addressed in recent works through feature aggregation from different skip connections. However, this often results in semantic mismatches and additional noise. In this paper, we propose a novel segmentation method, X-UNet, whose backbones employ the CFGC (Collaborative Fusion with Global Context-aware) module. The CFGC module enables multi-scale feature extraction and effective global context modeling. Simultaneously, we employ the CSPF (Cross Split-channel Progressive Fusion) module to progressively align and fuse features from corresponding encoder and decoder stages through channel-wise operations, offering a novel approach to feature integration. Experimental results demonstrate that X-UNet, with fewer computations and parameters, exhibits superior performance on various medical image datasets.The code and models are available on https://github.com/XSJ0410/X-UNet.
Page 3 of 42416 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.