Latest Papers on Radiology AI. Tags: Mixed Modality

VLSM-Ensemble: Ensembling CLIP-based Vision-Language Models for Enhanced Medical Image Segmentation

Julia Dietlmeier, Oluwabukola Grace Adegboro, Vayangi Ganepola, Claudia Mazo, Noel E. O'Connor

•preprint•Sep 5 2025

Vision-language models and their adaptations to image segmentation tasks present enormous potential for producing highly accurate and interpretable results. However, implementations based on CLIP and BiomedCLIP are still lagging behind more sophisticated architectures such as CRIS. In this work, instead of focusing on text prompt engineering as is the norm, we attempt to narrow this gap by showing how to ensemble vision-language segmentation models (VLSMs) with a low-complexity CNN. By doing so, we achieve a significant Dice score improvement of 6.3% on the BKAI polyp dataset using the ensembled BiomedCLIPSeg, while other datasets exhibit gains ranging from 1% to 6%. Furthermore, we provide initial results on additional four radiology and non-radiology datasets. We conclude that ensembling works differently across these datasets (from outperforming to underperforming the CRIS model), indicating a topic for future investigation by the community. The code is available at https://github.com/juliadietlmeier/VLSM-Ensemble.

Mixed Modality Segmentation Abdominal Methodology In Silico Open Code

AI-driven and Traditional Radiomic Model for Predicting Muscle Invasion in Bladder Cancer via Multi-parametric Imaging: A Systematic Review and Meta-analysis.

Wang Z, Shi H, Wang Q, Huang Y, Feng M, Yu L, Dong B, Li J, Deng X, Fu S, Zhang G, Wang H

•papers•Sep 5 2025

This study systematically evaluates the diagnostic performance of artificial intelligence (AI)-driven and conventional radiomics models in detecting muscle-invasive bladder cancer (MIBC) through meta-analytical approaches. Furthermore, it investigates their potential synergistic value with the Vesical Imaging-Reporting and Data System (VI-RADS) and assesses clinical translation prospects. This study adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We conducted a comprehensive systematic search of PubMed, Web of Science, Embase, and Cochrane Library databases up to May 13, 2025, and manually screened the references of included studies. The quality and risk of bias of the selected studies were assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) and Radiomics Quality Score (RQS) tools. We pooled the area under the curve (AUC), sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and their 95% confidence intervals (95% CI). Additionally, meta-regression and subgroup analyses were performed to identify potential sources of heterogeneity. This meta-analysis incorporated 43 studies comprising 9624 patients. The majority of included studies demonstrated low risk of bias, with a mean RQS of 18.89. Pooled analysis yielded an AUC of 0.92 (95% CI: 0.89-0.94). The aggregate sensitivity and specificity were both 0.86 (95% CI: 0.84-0.87), with heterogeneity indices of I² = 43.58 and I² = 72.76, respectively. The PLR was 5.97 (95% CI: 5.28-6.75, I² = 64.04), while the NLR was 0.17 (95% CI: 0.15-0.19, I² = 37.68). The DOR reached 35.57 (95% CI: 29.76-42.51, I² = 99.92). Notably, all included studies exhibited significant heterogeneity (P < 0.1). Meta-regression and subgroup analyses identified several significant sources of heterogeneity, including: study center type (single-center vs. multi-center), sample size (<100 vs. ≥100 patients), dataset classification (training, validation, testing, or ungrouped), imaging modality (computed tomography [CT] vs. magnetic resonance imaging [MRI]), modeling algorithm (deep learning vs. machine learning vs. other), validation methodology (cross-validation vs. cohort validation), segmentation method (manual vs. [semi]automated), regional differences (China vs. other countries), and risk of bias (high vs. low vs. unclear). AI-driven and traditional radiomic models have exhibited robust diagnostic performance for MIBC. Nevertheless, substantial heterogeneity across studies necessitates validation through multinational, multicenter prospective cohort studies to establish external validity.

Mixed Modality Classification Abdominal Meta Analysis Post Market Academic Lab Benchmark SOTA

A generalist foundation model and database for open-world medical image segmentation.

Zhang S, Zhang Q, Zhang S, Liu X, Yue J, Lu M, Xu H, Yao J, Wei X, Cao J, Zhang X, Gao M, Shen J, Hao Y, Wang Y, Zhang X, Wu S, Zhang P, Cui S, Wang G

•papers•Sep 5 2025

Vision foundation models have demonstrated vast potential in achieving generalist medical segmentation capability, providing a versatile, task-agnostic solution through a single model. However, current generalist models involve simple pre-training on various medical data containing irrelevant information, often resulting in the negative transfer phenomenon and degenerated performance. Furthermore, the practical applicability of foundation models across diverse open-world scenarios, especially in out-of-distribution (OOD) settings, has not been extensively evaluated. Here we construct a publicly accessible database, MedSegDB, based on a tree-structured hierarchy and annotated from 129 public medical segmentation repositories and 5 in-house datasets. We further propose a Generalist Medical Segmentation model (MedSegX), a vision foundation model trained with a model-agnostic Contextual Mixture of Adapter Experts (ConMoAE) for open-world segmentation. We conduct a comprehensive evaluation of MedSegX across a range of medical segmentation tasks. Experimental results indicate that MedSegX achieves state-of-the-art performance across various modalities and organ systems in in-distribution (ID) settings. In OOD and real-world clinical settings, MedSegX consistently maintains its performance in both zero-shot and data-efficient generalization, outperforming other foundation models.

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Dataset Benchmark SOTA

Benchmarking feature projection methods in radiomics.

Demircioğlu A

•papers•Sep 5 2025

In radiomics, feature selection methods are primarily used to eliminate redundant features and identify relevant ones. Feature projection methods, such as principal component analysis (PCA), are often avoided due to concerns that recombining features may compromise interpretability. However, since most radiomic features lack inherent semantic meaning, prioritizing interpretability over predictive performance may not be justified. This study investigates whether feature projection methods can improve predictive performance compared to feature selection, as measured by the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AUPRC), and the F1, F0.5 and F2 scores. Models were trained on a large collection of 50 binary classification radiomic datasets derived from CT and MRI of various organs and representing different clinical outcomes. Evaluation was performed using nested, stratified 5-fold cross-validation with 10 repeats. Nine feature projection methods, including PCA, Kernel PCA, and Non-Negative Matrix Factorization (NMF), were compared to nine selection methods, such as Minimum Redundancy Maximum Relevance (MRMRe), Extremely Randomized Trees (ET), and LASSO, using four classifiers. The results showed that selection methods, particularly ET, MRMRe, Boruta, and LASSO, achieved the highest overall performance. Importantly, performance varied considerably across datasets, and some projection methods, such as NMF, occasionally outperformed all selection methods on individual datasets, indicating their potential utility. However, the average difference between selection methods and projection methods across all datasets was negligible and statistically insignificant, suggesting that both perform similarly based solely on methodological considerations. These findings support the notion that, in a typical radiomics study, selection methods should remain the primary approach but also emphasize the importance of considering projection methods in order to achieve the highest performance.

Mixed Modality Classification Methodology In Silico Academic Lab

Interpretable Semi-federated Learning for Multimodal Cardiac Imaging and Risk Stratification: A Privacy-Preserving Framework.

Liu X, Li S, Zhu Q, Xu S, Jin Q

•papers•Sep 5 2025

The growing heterogeneity of cardiac patient data from hospitals and wearables necessitates predictive models that are tailored, comprehensible, and safeguard privacy. This study introduces PerFed-Cardio, a lightweight and interpretable semi-federated learning (Semi-FL) system for real-time cardiovascular risk stratification utilizing multimodal data, including cardiac imaging, physiological signals, and electronic health records (EHR). In contrast to conventional federated learning, where all clients engage uniformly, our methodology employs a personalized Semi-FL approach that enables high-capacity nodes (e.g., hospitals) to conduct comprehensive training, while edge devices (e.g., wearables) refine shared models via modality-specific subnetworks. Cardiac MRI and echocardiography pictures are analyzed via lightweight convolutional neural networks enhanced with local attention modules to highlight diagnostically significant areas. Physiological characteristics (e.g., ECG, activity) and EHR data are amalgamated through attention-based fusion layers. Model transparency is attained using Local Interpretable Model-agnostic Explanations (LIME) and Grad-CAM, which offer spatial and feature-level elucidations for each prediction. Assessments on authentic multimodal datasets from 123 patients across five simulated institutions indicate that PerFed-Cardio attains an AUC-ROC of 0.972 with an inference latency of 130 ms. The customized model calibration and targeted training diminish communication load by 28%, while maintaining an F1-score over 92% in noisy situations. These findings underscore PerFed-Cardio as a privacy-conscious, adaptive, and interpretable system for scalable cardiac risk assessment.

Mixed Modality Classification Cardiac Methodology In Silico Ethics GenAI

Diffusion Generative Models Meet Compressed Sensing, with Applications to Image Data and Financial Time Series

Zhengyi Guo, Jiatu Li, Wenpin Tang, David D. Yao

•preprint•Sep 4 2025

This paper develops dimension reduction techniques for accelerating diffusion model inference in the context of synthetic data generation. The idea is to integrate compressed sensing into diffusion models: (i) compress the data into a latent space, (ii) train a diffusion model in the latent space, and (iii) apply a compressed sensing algorithm to the samples generated in the latent space, facilitating the efficiency of both model training and inference. Under suitable sparsity assumptions on data, the proposed algorithm is proved to enjoy faster convergence by combining diffusion model inference with sparse recovery. As a byproduct, we obtain an optimal value for the latent space dimension. We also conduct numerical experiments on a range of datasets, including image data (handwritten digits, medical images, and climate data) and financial time series for stress testing.

Mixed Modality Image Synthesis Methodology In Silico GenAI

Geometric-Driven Cross-Modal Registration Framework for Optical Scanning and CBCT Models in AR-Based Maxillofacial Surgical Navigation.

Liu Y, Wang E, Gong M, Tao B, Wu Y, Qi X, Chen X

•papers•Sep 4 2025

Accurate preoperative planning for dental implants, especially in edentulous or partially edentulous patients, relies on precise localization of radiographic templates that guide implant positioning. By wearing a patientspecific radiographic template, clinicians can better assess anatomical constraints and plan optimal implant paths. However, due to the low radiopacity of such templates, their spatial position is difficult to determine directly from cone-beam computed tomography (CBCT) scans. To overcome this limitation, high-resolution optical scans of the templates are acquired, providing detailed geometric information for accurate spatial registration. This paper proposes a geometric-driven cross-modal registration framework that aligns the optical scan model of the radiographic template with patient CBCT data, enhancing registration accuracy through geometric feature extraction such as curvature and occlusal contours. A hybrid deep learning workflow further improves robustness, achieving a root mean square error (RMSE) of 1.68mm and mean absolute error (MAE) of 1.25mm. The system also incorporates augmented reality (AR) for real-time surgical navigation. Clinical and phantom experiments validate its effectiveness in supporting precise implant path planning and execution. Our proposed system enhances the efficiency and safety of dental implant surgery by integrating geometric feature extraction, deep learning-based registration, and AR-assisted navigation.

Mixed Modality Registration Abdominal Methodology Phantom/Animal Academic Lab Reproducibility

MetaPredictomics: A Comprehensive Approach to Predict Postsurgical Non-Small Cell Lung Cancer Recurrence Using Clinicopathologic, Radiomics, and Organomics Data.

Amini M, Hajianfar G, Salimi Y, Mansouri Z, Zaidi H

•papers•Sep 3 2025

Non-small cell lung cancer (NSCLC) is a complex disease characterized by diverse clinical, genetic, and histopathologic traits, necessitating personalized treatment approaches. While numerous biomarkers have been introduced for NSCLC prognostication, no single source of information can provide a comprehensive understanding of the disease. However, integrating biomarkers from multiple sources may offer a holistic view of the disease, enabling more accurate predictions. In this study, we present MetaPredictomics, a framework that integrates clinicopathologic data with PET/CT radiomics from the primary tumor and presumed healthy organs (referred to as "organomics") to predict postsurgical recurrence. A fully automated deep learning-based segmentation model was employed to delineate 19 affected (whole lung and the affected lobe) and presumed healthy organs from CT images of the presurgical PET/CT scans of 145 NSCLC patients sourced from a publicly available data set. Using PyRadiomics, 214 features (107 from CT, 107 from PET) were extracted from the gross tumor volume (GTV) and each segmented organ. In addition, a clinicopathologic feature set was constructed, incorporating clinical characteristics, histopathologic data, gene mutation status, conventional PET imaging biomarkers, and patients' treatment history. GTV Radiomics, each of the organomics, and the clinicopathologic feature sets were each fed to a time-to-event prediction machine, based on glmboost, to establish first-level models. The risk scores obtained from the first-level models were then used as inputs for meta models developed using a stacked ensemble approach. Questing optimized performance, we assessed meta models established upon all combinations of first-level models with concordance index (C-index) ≥0.6. The performance of all the models was evaluated using the average C-index across a unique 3-fold cross-validation scheme for fair comparison. The clinicopathologic model outperformed other first-level models with a C-index of 0.67, followed closely by GTV radiomics model with C-index of 0.65. Among the organomics models, whole-lung and aorta models achieved top performance with a C-index of 0.65, while 12 organomics models achieved C-indices of ≥0.6. Meta models significantly outperformed the first-level models with the top 100 achieving C-indices between 0.703 and 0.731. The clinicopathologic, whole lung, esophagus, pancreas, and GTV models were the most frequently present models in the top 100 meta models with frequencies of 98, 71, 69, 62, and 61, respectively. In this study, we highlighted the value of maximizing the use of medical imaging for NSCLC recurrence prognostication by incorporating data from various organs, rather than focusing solely on the tumor and its immediate surroundings. This multisource integration proved particularly beneficial in the meta models, where combining clinicopathologic data with tumor radiomics and organomics models significantly enhanced recurrence prediction.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Analog optical computer for AI inference and combinatorial optimization.

Kalinin KP, Gladrow J, Chu J, Clegg JH, Cletheroe D, Kelly DJ, Rahmani B, Brennan G, Canakci B, Falck F, Hansen M, Kleewein J, Kremer H, O'Shea G, Pickup L, Rajmohan S, Rowstron A, Ruhle V, Braine L, Khedekar S, Berloff NG, Gkantsidis C, Parmigiani F, Ballani H

•papers•Sep 3 2025

Artificial intelligence (AI) and combinatorial optimization drive applications across science and industry, but their increasing energy demands challenge the sustainability of digital computing. Most unconventional computing systems<sup>1-7</sup> target either AI or optimization workloads and rely on frequent, energy-intensive digital conversions, limiting efficiency. These systems also face application-hardware mismatches, whether handling memory-bottlenecked neural models, mapping real-world optimization problems or contending with inherent analog noise. Here we introduce an analog optical computer (AOC) that combines analog electronics and three-dimensional optics to accelerate AI inference and combinatorial optimization in a single platform. This dual-domain capability is enabled by a rapid fixed-point search, which avoids digital conversions and enhances noise robustness. With this fixed-point abstraction, the AOC implements emerging compute-bound neural models with recursive reasoning potential and realizes an advanced gradient-descent approach for expressive optimization. We demonstrate the benefits of co-designing the hardware and abstraction, echoing the co-evolution of digital accelerators and deep learning models, through four case studies: image classification, nonlinear regression, medical image reconstruction and financial transaction settlement. Built with scalable, consumer-grade technologies, the AOC paves a promising path for faster and sustainable computing. Its native support for iterative, compute-intensive models offers a scalable analog platform for fostering future innovation in AI and optimization.

Mixed Modality Reconstruction Methodology Prototype Academic Lab Breakthrough

From Noisy Labels to Intrinsic Structure: A Geometric-Structural Dual-Guided Framework for Noise-Robust Medical Image Segmentation

Tao Wang, Zhenxuan Zhang, Yuanbo Zhou, Xinlin Zhang, Yuanbin Chen, Tao Tan, Guang Yang, Tong Tong

•preprint•Sep 2 2025

The effectiveness of convolutional neural networks in medical image segmentation relies on large-scale, high-quality annotations, which are costly and time-consuming to obtain. Even expert-labeled datasets inevitably contain noise arising from subjectivity and coarse delineations, which disrupt feature learning and adversely impact model performance. To address these challenges, this study propose a Geometric-Structural Dual-Guided Network (GSD-Net), which integrates geometric and structural cues to improve robustness against noisy annotations. It incorporates a Geometric Distance-Aware module that dynamically adjusts pixel-level weights using geometric features, thereby strengthening supervision in reliable regions while suppressing noise. A Structure-Guided Label Refinement module further refines labels with structural priors, and a Knowledge Transfer module enriches supervision and improves sensitivity to local details. To comprehensively assess its effectiveness, we evaluated GSD-Net on six publicly available datasets: four containing three types of simulated label noise, and two with multi-expert annotations that reflect real-world subjectivity and labeling inconsistencies. Experimental results demonstrate that GSD-Net achieves state-of-the-art performance under noisy annotations, achieving improvements of 2.52% on Kvasir, 22.76% on Shenzhen, 8.87% on BU-SUC, and 4.59% on BraTS2020 under SR simulated noise. The codes of this study are available at https://github.com/ortonwang/GSD-Net.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

VLSM-Ensemble: Ensembling CLIP-based Vision-Language Models for Enhanced Medical Image Segmentation

AI-driven and Traditional Radiomic Model for Predicting Muscle Invasion in Bladder Cancer via Multi-parametric Imaging: A Systematic Review and Meta-analysis.

A generalist foundation model and database for open-world medical image segmentation.

Benchmarking feature projection methods in radiomics.

Interpretable Semi-federated Learning for Multimodal Cardiac Imaging and Risk Stratification: A Privacy-Preserving Framework.

Diffusion Generative Models Meet Compressed Sensing, with Applications to Image Data and Financial Time Series

Geometric-Driven Cross-Modal Registration Framework for Optical Scanning and CBCT Models in AR-Based Maxillofacial Surgical Navigation.

MetaPredictomics: A Comprehensive Approach to Predict Postsurgical Non-Small Cell Lung Cancer Recurrence Using Clinicopathologic, Radiomics, and Organomics Data.

Analog optical computer for AI inference and combinatorial optimization.

From Noisy Labels to Intrinsic Structure: A Geometric-Structural Dual-Guided Framework for Noise-Robust Medical Image Segmentation

Ready to Sharpen Your Edge?