Latest Papers on Radiology AI. Tags: Mixed Modality

Evaluation of synthetic training data for 3D intraoral reconstruction of cleft patients from single images.

Lingens L, Lill Y, Nalabothu P, Benitez BK, Mueller AA, Gross M, Solenthaler B

•papers•May 24 2025

This study investigates the effectiveness of synthetic training data in predicting 2D landmarks for 3D intraoral reconstruction in cleft lip and palate patients. We take inspiration from existing landmark prediction and 3D reconstruction techniques for faces and demonstrate their potential in medical applications. We generated both real and synthetic datasets from intraoral scans and videos. A convolutional neural network was trained using a negative-Gaussian log-likelihood loss function to predict 2D landmarks and their corresponding confidence scores. The predicted landmarks were then used to fit a statistical shape model to generate 3D reconstructions from individual images. We analyzed the model's performance on real patient data and explored the dataset size required to overcome the domain gap between synthetic and real images. Our approach generates satisfying results on synthetic data and shows promise when tested on real data. The method achieves rapid 3D reconstruction from single images and can therefore provide significant value in day-to-day medical work. Our results demonstrate that synthetic training data are viable for training models to predict 2D landmarks and reconstruct 3D meshes in patients with cleft lip and palate. This approach offers an accessible, low-cost alternative to traditional methods, using smartphone technology for noninvasive, rapid, and accurate 3D reconstructions in clinical settings.

Mixed Modality Reconstruction Methodology In Silico Academic Lab GenAI

TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation

Haoyu Yang, Yuxiang Cai, Jintao Chen, Xuhong Zhang, Wenhui Lei, Xiaoming Shi, Jianwei Yin, Yankai Jiang

•preprint•May 24 2025

3D medical image segmentation is vital for clinical diagnosis and treatment but is challenged by high-dimensional data and complex spatial dependencies. Traditional single-modality networks, such as CNNs and Transformers, are often limited by computational inefficiency and constrained contextual modeling in 3D settings. We introduce a novel multimodal framework that leverages Mamba and Kolmogorov-Arnold Networks (KAN) as an efficient backbone for long-sequence modeling. Our approach features three key innovations: First, an EGSC (Enhanced Gated Spatial Convolution) module captures spatial information when unfolding 3D images into 1D sequences. Second, we extend Group-Rational KAN (GR-KAN), a Kolmogorov-Arnold Networks variant with rational basis functions, into 3D-Group-Rational KAN (3D-GR-KAN) for 3D medical imaging - its first application in this domain - enabling superior feature representation tailored to volumetric data. Third, a dual-branch text-driven strategy leverages CLIP's text embeddings: one branch swaps one-hot labels for semantic vectors to preserve inter-organ semantic relationships, while the other aligns images with detailed organ descriptions to enhance semantic alignment. Experiments on the Medical Segmentation Decathlon (MSD) and KiTS23 datasets show our method achieving state-of-the-art performance, surpassing existing approaches in accuracy and efficiency. This work highlights the power of combining advanced sequence modeling, extended network architectures, and vision-language synergy to push forward 3D medical image segmentation, delivering a scalable solution for clinical use. The source code is openly available at https://github.com/yhy-whu/TK-Mamba.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

How We Won the ISLES'24 Challenge by Preprocessing

Tianyi Ren, Juampablo E. Heras Rivera, Hitender Oswal, Yutong Pan, William Henry, Jacob Ruzevick, Mehmet Kurt

•preprint•May 23 2025

Stroke is among the top three causes of death worldwide, and accurate identification of stroke lesion boundaries is critical for diagnosis and treatment. Supervised deep learning methods have emerged as the leading solution for stroke lesion segmentation but require large, diverse, and annotated datasets. The ISLES'24 challenge addresses this need by providing longitudinal stroke imaging data, including CT scans taken on arrival to the hospital and follow-up MRI taken 2-9 days from initial arrival, with annotations derived from follow-up MRI. Importantly, models submitted to the ISLES'24 challenge are evaluated using only CT inputs, requiring prediction of lesion progression that may not be visible in CT scans for segmentation. Our winning solution shows that a carefully designed preprocessing pipeline including deep-learning-based skull stripping and custom intensity windowing is beneficial for accurate segmentation. Combined with a standard large residual nnU-Net architecture for segmentation, this approach achieves a mean test Dice of 28.5 with a standard deviation of 21.27.

Mixed Modality Segmentation Neurological Methodology In Silico Benchmark SOTA

Integrating multi-omics data with artificial intelligence to decipher the role of tumor-infiltrating lymphocytes in tumor immunotherapy.

Xie T, Xue H, Huang A, Yan H, Yuan J

•papers•May 23 2025

Tumor-infiltrating lymphocytes (TILs) are capable of recognizing tumor antigens, impacting tumor prognosis, predicting the efficacy of neoadjuvant therapies, contributing to the development of new cell-based immunotherapies, studying the tumor immune microenvironment, and identifying novel biomarkers. Traditional methods for evaluating TILs primarily rely on histopathological examination using standard hematoxylin and eosin staining or immunohistochemical staining, with manual cell counting under a microscope. These methods are time-consuming and subject to significant observer variability and error. Recently, artificial intelligence (AI) has rapidly advanced in the field of medical imaging, particularly with deep learning algorithms based on convolutional neural networks. AI has shown promise as a powerful tool for the quantitative evaluation of tumor biomarkers. The advent of AI offers new opportunities for the automated and standardized assessment of TILs. This review provides an overview of the advancements in the application of AI for assessing TILs from multiple perspectives. It specifically focuses on AI-driven approaches for identifying TILs in tumor tissue images, automating TILs quantification, recognizing TILs subpopulations, and analyzing the spatial distribution patterns of TILs. The review aims to elucidate the prognostic value of TILs in various cancers, as well as their predictive capacity for responses to immunotherapy and neoadjuvant therapy. Furthermore, the review explores the integration of AI with other emerging technologies, such as single-cell sequencing, multiplex immunofluorescence, spatial transcriptomics, and multimodal approaches, to enhance the comprehensive study of TILs and further elucidate their clinical utility in tumor treatment and prognosis.

Mixed Modality Detection Review Concept Academic Lab GenAI

Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior

Tsai Hor Chan, Dora Yan Zhang, Guosheng Yin, Lequan Yu

•preprint•May 23 2025

Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates and avoid overfitting by performing inference on the posterior weights. However, the selection of appropriate prior distributions remains a challenging task, and BNNs may suffer from catastrophic inflated variance or poor predictive performance when poor choices are made for the priors. Existing BNN designs apply different priors to weights, while the behaviours of these priors make it difficult to sufficiently shrink noisy signals or they are prone to overshrinking important signals in the weights. To alleviate this problem, we propose a novel R2D2-Net, which imposes the R^2-induced Dirichlet Decomposition (R2D2) prior to the BNN weights. The R2D2-Net can effectively shrink irrelevant coefficients towards zero, while preventing key features from over-shrinkage. To approximate the posterior distribution of weights more accurately, we further propose a variational Gibbs inference algorithm that combines the Gibbs updating procedure and gradient-based optimization. This strategy enhances stability and consistency in estimation when the variational objective involving the shrinkage parameters is non-convex. We also analyze the evidence lower bound (ELBO) and the posterior concentration rates from a theoretical perspective. Experiments on both natural and medical image classification and uncertainty estimation tasks demonstrate satisfactory performance of our method.

Mixed Modality Classification Methodology In Silico Academic Lab

AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models

Xingjian Li, Qifeng Wu, Colleen Que, Yiran Ding, Adithya S. Ubaradka, Jianhua Xing, Tianyang Wang, Min Xu

•preprint•May 23 2025

Medical image segmentation is vital for clinical diagnosis, yet current deep learning methods often demand extensive expert effort, i.e., either through annotating large training datasets or providing prompts at inference time for each new case. This paper introduces a zero-shot and automatic segmentation pipeline that combines off-the-shelf vision-language and segmentation foundation models. Given a medical image and a task definition (e.g., "segment the optic disc in an eye fundus image"), our method uses a grounding model to generate an initial bounding box, followed by a visual prompt boosting module that enhance the prompts, which are then processed by a promptable segmentation model to produce the final mask. To address the challenges of domain gap and result verification, we introduce a test-time adaptation framework featuring a set of learnable adaptors that align the medical inputs with foundation model representations. Its hyperparameters are optimized via Bayesian Optimization, guided by a proxy validation model without requiring ground-truth labels. Our pipeline offers an annotation-efficient and scalable solution for zero-shot medical image segmentation across diverse tasks. Our pipeline is evaluated on seven diverse medical imaging datasets and shows promising results. By proper decomposition and test-time adaptation, our fully automatic pipeline performs competitively with weakly-prompted interactive foundation models.

Mixed Modality Segmentation Methodology In Silico Academic Lab GenAI

EnsembleEdgeFusion: advancing semantic segmentation in microvascular decompression imaging with innovative ensemble techniques.

Dhiyanesh B, Vijayalakshmi M, Saranya P, Viji D

•papers•May 23 2025

Semantic segmentation involves an imminent part in the investigation of medical images, particularly in the domain of microvascular decompression, where publicly available datasets are scarce, and expert annotation is demanding. In response to this challenge, this study presents a meticulously curated dataset comprising 2003 RGB microvascular decompression images, each intricately paired with annotated masks. Extensive data preprocessing and augmentation strategies were employed to fortify the training dataset, enhancing the robustness of proposed deep learning model. Numerous up-to-date semantic segmentation approaches, including DeepLabv3+, U-Net, DilatedFastFCN with JPU, DANet, and a custom Vanilla architecture, were trained and evaluated using diverse performance metrics. Among these models, DeepLabv3 + emerged as a strong contender, notably excelling in F1 score. Innovatively, ensemble techniques, such as stacking and bagging, were introduced to further elevate segmentation performance. Bagging, notably with the Naïve Bayes approach, exhibited significant improvements, underscoring the potential of ensemble methods in medical image segmentation. The proposed EnsembleEdgeFusion technique exhibited superior loss reduction during training compared to DeepLabv3 + and achieved maximum Mean Intersection over Union (MIoU) scores of 77.73%, surpassing other models. Category-wise analysis affirmed its superiority in accurately delineating various categories within the test dataset.

Mixed Modality Segmentation Neurological Methodology In Silico Academic Lab Open Dataset

AMVLM: Alignment-Multiplicity Aware Vision-Language Model for Semi-Supervised Medical Image Segmentation.

Pan Q, Li Z, Qiao W, Lou J, Yang Q, Yang G, Ji B

•papers•May 23 2025

Low-quality pseudo labels pose a significant obstacle in semi-supervised medical image segmentation (SSMIS), impeding consistency learning on unlabeled data. Leveraging vision-language model (VLM) holds promise in ameliorating pseudo label quality by employing textual prompts to delineate segmentation regions, but it faces the challenge of cross-modal alignment uncertainty due to multiple correspondences (multiple images/texts tend to correspond to one text/image). Existing VLMs address this challenge by modeling semantics as distributions but such distributions lead to semantic degradation. To address these problems, we propose Alignment-Multiplicity Aware Vision-Language Model (AMVLM), a new VLM pre-training paradigm with two novel similarity metric strategies. (i) Cross-modal Similarity Supervision (CSS) proposes a probability distribution transformer to supervise similarity scores across fine-granularity semantics through measuring cross-modal distribution disparities, thus learning cross-modal multiple alignments. (ii) Intra-modal Contrastive Learning (ICL) takes into account the similarity metric of coarse-fine granularity information within each modality to encourage cross-modal semantic consistency. Furthermore, using the pretrained AMVLM, we propose a pioneering text-guided SSMIS network to compensate for the quality deficiencies of pseudo-labels. This network incorporates a text mask generator to produce multimodal supervision information, enhancing pseudo label quality and the model's consistency learning. Extensive experimentation validates the efficacy of our AMVLM-driven SSMIS, showcasing superior performance across four publicly available datasets. The code will be available at: https://github.com/QingtaoPan/AMVLM.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Graph Mamba for Efficient Whole Slide Image Understanding

Jiaxuan Lu, Junyan Shi, Yuhui Lin, Fang Yan, Yue Gao, Shaoting Zhang, Xiaosong Wang

•preprint•May 23 2025

Whole Slide Images (WSIs) in histopathology present a significant challenge for large-scale medical image analysis due to their high resolution, large size, and complex tile relationships. Existing Multiple Instance Learning (MIL) methods, such as Graph Neural Networks (GNNs) and Transformer-based models, face limitations in scalability and computational cost. To bridge this gap, we propose the WSI-GMamba framework, which synergistically combines the relational modeling strengths of GNNs with the efficiency of Mamba, the State Space Model designed for sequence learning. The proposed GMamba block integrates Message Passing, Graph Scanning & Flattening, and feature aggregation via a Bidirectional State Space Model (Bi-SSM), achieving Transformer-level performance with 7* fewer FLOPs. By leveraging the complementary strengths of lightweight GNNs and Mamba, the WSI-GMamba framework delivers a scalable solution for large-scale WSI analysis, offering both high accuracy and computational efficiency for slide-level classification.

Mixed Modality Classification Methodology In Silico Academic Lab

FreqU-FNet: Frequency-Aware U-Net for Imbalanced Medical Image Segmentation

Ruiqi Xing

•preprint•May 23 2025

Medical image segmentation faces persistent challenges due to severe class imbalance and the frequency-specific distribution of anatomical structures. Most conventional CNN-based methods operate in the spatial domain and struggle to capture minority class signals, often affected by frequency aliasing and limited spectral selectivity. Transformer-based models, while powerful in modeling global dependencies, tend to overlook critical local details necessary for fine-grained segmentation. To overcome these limitations, we propose FreqU-FNet, a novel U-shaped segmentation architecture operating in the frequency domain. Our framework incorporates a Frequency Encoder that leverages Low-Pass Frequency Convolution and Daubechies wavelet-based downsampling to extract multi-scale spectral features. To reconstruct fine spatial details, we introduce a Spatial Learnable Decoder (SLD) equipped with an adaptive multi-branch upsampling strategy. Furthermore, we design a frequency-aware loss (FAL) function to enhance minority class learning. Extensive experiments on multiple medical segmentation benchmarks demonstrate that FreqU-FNet consistently outperforms both CNN and Transformer baselines, particularly in handling under-represented classes, by effectively exploiting discriminative frequency bands.

Mixed Modality Segmentation Methodology In Silico Academic Lab

Filter Papers

Tags

Evaluation of synthetic training data for 3D intraoral reconstruction of cleft patients from single images.

TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation

How We Won the ISLES'24 Challenge by Preprocessing

Integrating multi-omics data with artificial intelligence to decipher the role of tumor-infiltrating lymphocytes in tumor immunotherapy.

Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior

AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models

EnsembleEdgeFusion: advancing semantic segmentation in microvascular decompression imaging with innovative ensemble techniques.

AMVLM: Alignment-Multiplicity Aware Vision-Language Model for Semi-Supervised Medical Image Segmentation.

Graph Mamba for Efficient Whole Slide Image Understanding

FreqU-FNet: Frequency-Aware U-Net for Imbalanced Medical Image Segmentation

Ready to Sharpen Your Edge?