Latest Papers on Radiology AI. Tags: None

Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

Francesco Di Salvo, Hanh Huyen My Nguyen, Christian Ledig

•preprint•Jul 3 2025

Deep Learning (DL) has revolutionized medical imaging, yet its adoption is constrained by data scarcity and privacy regulations, limiting access to diverse datasets. Federated Learning (FL) enables decentralized training but suffers from high communication costs and is often restricted to a single downstream task, reducing flexibility. We propose a data-sharing method via Differentially Private (DP) generative models. By adopting foundation models, we extract compact, informative embeddings, reducing redundancy and lowering computational overhead. Clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-aware data distribution, supporting diverse downstream tasks. Our approach, validated across multiple feature extractors, enhances privacy, scalability, and efficiency, outperforming traditional FL classifiers while ensuring differential privacy. Additionally, DP-CVAE produces higher-fidelity embeddings than DP-CGAN while requiring $5{\times}$ fewer parameters.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Academic Lab Open Code Ethics

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

Zunhui Xia, Hongxing Li, Libin Lan

•preprint•Jul 3 2025

Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is explicitly designed to attend to the most relevant content. In addition, a detailed theoretical analysis has been conducted, demonstrating that MedFormer has superior generality and efficiency in comparison to existing medical vision transformers. Extensive experiments on a variety of imaging modality datasets consistently show that MedFormer is highly effective in enhancing performance across all three above-mentioned medical image recognition tasks. The code is available at https://github.com/XiaZunhui/MedFormer.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation

Peilin Zhang, Shaouxan Wua, Jun Feng, Zhuo Jin, Zhizezhang Gao, Jingkun Chen, Yaqiong Xing, Xiao Zhang

•preprint•Jul 3 2025

Background and objective: Medical image segmentation is a core task in various clinical applications. However, acquiring large-scale, fully annotated medical image datasets is both time-consuming and costly. Scribble annotations, as a form of sparse labeling, provide an efficient and cost-effective alternative for medical image segmentation. However, the sparsity of scribble annotations limits the feature learning of the target region and lacks sufficient boundary supervision, which poses significant challenges for training segmentation networks. Methods: We propose TAB Net, a novel weakly-supervised medical image segmentation framework, consisting of two key components: the triplet augmentation self-recovery (TAS) module and the boundary-aware pseudo-label supervision (BAP) module. The TAS module enhances feature learning through three complementary augmentation strategies: intensity transformation improves the model's sensitivity to texture and contrast variations, cutout forces the network to capture local anatomical structures by masking key regions, and jigsaw augmentation strengthens the modeling of global anatomical layout by disrupting spatial continuity. By guiding the network to recover complete masks from diverse augmented inputs, TAS promotes a deeper semantic understanding of medical images under sparse supervision. The BAP module enhances pseudo-supervision accuracy and boundary modeling by fusing dual-branch predictions into a loss-weighted pseudo-label and introducing a boundary-aware loss for fine-grained contour refinement. Results: Experimental evaluations on two public datasets, ACDC and MSCMR seg, demonstrate that TAB Net significantly outperforms state-of-the-art methods for scribble-based weakly supervised segmentation. Moreover, it achieves performance comparable to that of fully supervised methods.

MRI Segmentation Cardiac Methodology In Silico Academic Lab

Transformer attention-based neural network for cognitive score estimation from sMRI data.

Li S, Zhang Y, Zou C, Zhang L, Li F, Liu Q

•papers•Jul 3 2025

Accurately predicting cognitive scores based on structural MRI holds significant clinical value for understanding the pathological stages of dementia and forecasting Alzheimer's disease (AD). Some existing deep learning methods often depend on anatomical priors, overlooking individual-specific structural differences during AD progression. To address these limitations, this work proposes a deep neural network that incorporates Transformer attention to jointly predict multiple cognitive scores, including ADAS, CDRSB, and MMSE. The architecture first employs a 3D convolutional neural network backbone to encode sMRI, capturing preliminary local structural information. Then an improved Transformer attention block integrated with 3D positional encoding and 3D convolutional layer to adaptively capture discriminative imaging features across the brain, thereby focusing on key cognitive-related regions effectively. Finally, an attention-aware regression network enables the joint prediction of multiple clinical scores. Experimental results demonstrate that our method outperforms some existing traditional and deep learning methods based on the ADNI dataset. Further qualitative analysis reveals that the dementia-related brain regions identified by the model hold important biological significance, effectively enhancing the performance of cognitive score prediction. Our code is publicly available at: https://github.com/lshsx/CTA_MRI.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code

BrainAGE latent representation clustering is associated with longitudinal disease progression in early-onset Alzheimer's disease.

Manouvriez D, Kuchcinski G, Roca V, Sillaire AR, Bertoux M, Delbeuck X, Pruvo JP, Lecerf S, Pasquier F, Lebouvier T, Lopes R

•papers•Jul 3 2025

Early-onset Alzheimer's disease (EOAD) population is a clinically, genetically and pathologically heterogeneous condition. Identifying biomarkers related to disease progression is crucial for advancing clinical trials and improving therapeutic strategies. This study aims to differentiate EOAD patients with varying rates of progression using Brain Age Gap Estimation (BrainAGE)-based clustering algorithm applied to structural magnetic resonance images (MRI). A retrospective analysis of a longitudinal cohort consisting of 142 participants who met the criteria for early-onset probable Alzheimer's disease was conducted. Participants were assessed clinically, neuropsychologically and with structural MRI at baseline and annually for 6 years. A Brain Age Gap Estimation (BrainAGE) deep learning model pre-trained on 3,227 3D T1-weighted MRI of healthy subjects was used to extract encoded MRI representations at baseline. Then, k-means clustering was performed on these encoded representations to stratify the population. The resulting clusters were then analyzed for disease severity, cognitive phenotype and brain volumes at baseline and longitudinally. The optimal number of clusters was determined to be 2. Clusters differed significantly in BrainAGE scores (5.44 [± 8] years vs 15.25 [± 5 years], p < 0.001). The high BrainAGE cluster was associated with older age (p = 0.001) and higher proportion of female patients (p = 0.005), as well as greater disease severity based on Mini Mental State Examination (MMSE) scores (19.32 [±4.62] vs 14.14 [±6.93], p < 0.001) and gray matter volume (0.35 [±0.03] vs 0.32 [±0.02], p < 0.001). Longitudinal analyses revealed significant differences in disease progression (MMSE decline of -2.35 [±0.15] pts/year vs -3.02 [±0.25] pts/year, p = 0.02; CDR 1.58 [±0.10] pts/year vs 1.99 [±0.16] pts/year, p = 0.03). K-means clustering of BrainAGE encoded representations stratified EOAD patients based on varying rates of disease progression. These findings underscore the potential of using BrainAGE as a biomarker for better understanding and managing EOAD.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

CT-Mamba: A hybrid convolutional State Space Model for low-dose CT denoising.

Li L, Wei W, Yang L, Zhang W, Dong J, Liu Y, Huang H, Zhao W

•papers•Jul 3 2025

Low-dose CT (LDCT) significantly reduces the radiation dose received by patients, however, dose reduction introduces additional noise and artifacts. Currently, denoising methods based on convolutional neural networks (CNNs) face limitations in long-range modeling capabilities, while Transformer-based denoising methods, although capable of powerful long-range modeling, suffer from high computational complexity. Furthermore, the denoised images predicted by deep learning-based techniques inevitably exhibit differences in noise distribution compared to normal-dose CT (NDCT) images, which can also impact the final image quality and diagnostic outcomes. This paper proposes CT-Mamba, a hybrid convolutional State Space Model for LDCT image denoising. The model combines the local feature extraction advantages of CNNs with Mamba's strength in capturing long-range dependencies, enabling it to capture both local details and global context. Additionally, we introduce an innovative spatially coherent Z-shaped scanning scheme to ensure spatial continuity between adjacent pixels in the image. We design a Mamba-driven deep noise power spectrum (NPS) loss function to guide model training, ensuring that the noise texture of the denoised LDCT images closely resembles that of NDCT images, thereby enhancing overall image quality and diagnostic value. Experimental results have demonstrated that CT-Mamba performs excellently in reducing noise in LDCT images, enhancing detail preservation, and optimizing noise texture distribution, and exhibits higher statistical similarity with the radiomics features of NDCT images. The proposed CT-Mamba demonstrates outstanding performance in LDCT denoising and holds promise as a representative approach for applying the Mamba framework to LDCT denoising tasks.

CT Reconstruction Methodology In Silico Academic Lab Reproducibility

PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Michal Golovanevsky, Pranav Mahableshwarkar, Carsten Eickhoff, Ritambhara Singh

•preprint•Jul 3 2025

Multimodal deep learning holds promise for improving clinical prediction by integrating diverse patient data, including text, imaging, time-series, and structured demographics. Contrastive learning facilitates this integration by producing a unified representation that can be reused across tasks, reducing the need for separate models or encoders. Although contrastive learning has seen success in vision-language domains, its use in clinical settings remains largely limited to image and text pairs. We propose the Pipeline for Contrastive Modality Evaluation and Encoding (PiCME), which systematically assesses five clinical data types from MIMIC: discharge summaries, radiology reports, chest X-rays, demographics, and time-series. We pre-train contrastive models on all 26 combinations of two to five modalities and evaluate their utility on in-hospital mortality and phenotype prediction. To address performance plateaus with more modalities, we introduce a Modality-Gated LSTM that weights each modality according to its contrastively learned importance. Our results show that contrastive models remain competitive with supervised baselines, particularly in three-modality settings. Performance declines beyond three modalities, which supervised models fail to recover. The Modality-Gated LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from 51.27% to 62.26% in the five-modality setting. We also compare contrastively learned modality importance scores with attribution scores and evaluate generalization across demographic subgroups, highlighting strengths in interpretability and fairness. PiCME is the first to scale contrastive learning across all modality combinations in MIMIC, offering guidance for modality selection, training strategies, and equitable clinical prediction.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Lisa Herzog, Pascal Bühler, Ezequiel de la Rosa, Beate Sick, Susanne Wegener

•preprint•Jul 3 2025

Mechanical thrombectomy has become the standard of care in patients with stroke due to large vessel occlusion (LVO). However, only 50% of successfully treated patients show a favorable outcome. We developed and evaluated interpretable deep learning models to predict functional outcomes in terms of the modified Rankin Scale score alongside individualized treatment effects (ITEs) using data of 449 LVO stroke patients from a randomized clinical trial. Besides clinical variables, we considered non-contrast CT (NCCT) and angiography (CTA) scans which were integrated using novel foundation models to make use of advanced imaging information. Clinical variables had a good predictive power for binary functional outcome prediction (AUC of 0.719 [0.666, 0.774]) which could slightly be improved when adding CTA imaging (AUC of 0.737 [0.687, 0.795]). Adding NCCT scans or a combination of NCCT and CTA scans to clinical features yielded no improvement. The most important clinical predictor for functional outcome was pre-stroke disability. While estimated ITEs were well calibrated to the average treatment effect, discriminatory ability was limited indicated by a C-for-Benefit statistic of around 0.55 in all models. In summary, the models allowed us to jointly integrate CT imaging and clinical features while achieving state-of-the-art prediction performance and ITE estimates. Yet, further research is needed to particularly improve ITE estimation.

Mixed Modality Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A Pan-Organ Vision-Language Model for Generalizable 3D CT Representations.

Beeche C, Kim J, Tavolinejad H, Zhao B, Sharma R, Duda J, Gee J, Dako F, Verma A, Morse C, Hou B, Shen L, Sagreiya H, Davatzikos C, Damrauer S, Ritchie MD, Rader D, Long Q, Chen T, Kahn CE, Chirinos J, Witschey WR

•papers•Jul 3 2025

Generalizable foundation models for computed tomographic (CT) medical imaging data are emerging AI tools anticipated to vastly improve clinical workflow efficiency. However, existing models are typically trained within narrow imaging contexts, including limited anatomical coverage, contrast settings, and clinical indications. These constraints reduce their ability to generalize across the broad spectrum of real-world presentations encountered in volumetric CT imaging data. We introduce Percival, a vision-language foundation model trained on over 400,000 CT volumes and paired radiology reports from more than 50,000 participants enrolled in the Penn Medicine BioBank. Percival employs a dual-encoder architecture with a transformer-based image encoder and a BERT-style language encoder, aligned via symmetric contrastive learning. Percival was validated on over 20,000 participants imaging data encompassing over 100,000 CT volumes. In image-text recall tasks, Percival outperforms models trained on limited anatomical windows. To assess Percival's clinical knowledge, we evaluated the biologic, phenotypic and prognostic relevance using laboratory-wide, phenome-wide association studies and survival analyses, uncovering a rich latent structure aligned with physiological measurements and disease phenotypes.

CT LLM Radiology Report Whole Body Methodology In Silico Academic Lab GenAI Open Dataset

Group-derived and individual disconnection in stroke: recovery prediction and deep graph learning

Bey, P., Dhindsa, K., Rackoll, T., Feldheim, J., Bönstrup, M., Thomalla, G., Schulz, R., Cheng, B., Gerloff, C., Endres, M., Nave, A. H., Ritter, P.

•preprint•Jul 3 2025

Recent advances in the treatment of acute ischemic stroke contribute to improved patient outcomes, yet the mechanisms driving long-term disease trajectory are not well-understood. Current trends in the literature emphasize the distributed disruptive impact of stroke lesions on brain network organization. While most studies use population-derived data to investigate lesion interference on healthy tissue, the potential for individualized treatment strategies remains underexplored due to a lack of availability and effective utilization of the necessary clinical imaging data. To validate the potential for individualized patient evaluation, we explored and compared the differential information in network models based on normative and individual data. We further present our novel deep learning approach providing usable and accurate estimates of individual stroke impact utilizing minimal imaging data, thus bridging the data gap hindering individualized treatment planning. We created normative and individual disconnectomes for each of 78 patients (mean age 65.1 years, 32 females) from two independent cohort studies. MRI data and Barthel Index, as a measure of activities of daily living, were collected in the acute and early sub-acute phase after stroke (baseline) and at three months post stroke incident. Disconnectomes were subsequently described using 12 network metrics, including clustering coefficient and transitivity. Metrics were first compared between disconnectomes and further utilized as features in a classifier to predict a patients disease trajectory, as defined by three months Barthel Index. We then developed a deep learning architecture based on graph convolution and trained it to predict properties of the individual disconnectomes from the normative disconnectomes. Both disconnectomes showed statistically significant differences in topology and predictive power. Normative disconnectomes included a statistically significant larger number of connections (N=604 for normative versus N=210 for individual) and agreement between network properties ranged from r2=0.01 for clustering coefficient to r2=0.8 for assortativity, highlighting the impact of disconnectome choice on subsequent analysis. To predict patient deficit severity, individual data achieved an AUC score of 0.94 compared to an AUC score of 0.85 for normative based features. Our deep learning estimates showed high correlation with individual features (mean r2=0.94) and a comparable performance with an AUC score of 0.93. We were able to show how normative data-based analysis of stroke disconnections provides limited information regarding patient recovery. In contrast, individual data provided higher prognostic precision. We presented a novel approach to curb the need for individual data while retaining most of the differential information encoding individual patient disease trajectory.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation

Transformer attention-based neural network for cognitive score estimation from sMRI data.

BrainAGE latent representation clustering is associated with longitudinal disease progression in early-onset Alzheimer's disease.

CT-Mamba: A hybrid convolutional State Space Model for low-dose CT denoising.

PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

A Pan-Organ Vision-Language Model for Generalizable 3D CT Representations.

Group-derived and individual disconnection in stroke: recovery prediction and deep graph learning

Ready to Sharpen Your Edge?