Latest Papers on Radiology AI. Tags: Mixed Modality

Flow Matching-Based Data Synthesis for Robust Anatomical Landmark Localization.

Hadzic A, Bogensperger L, Berghold A, Urschler M

•papers•Aug 29 2025

Anatomical landmark localization (ALL) plays a crucial role in medical imaging for applications such as therapy planning and surgical interventions. State-ofthe- art deep learning methods for ALL are often trained on small datasets due to the scarcity of large, annotated medical data. This constraint often leads to overfitting on the training dataset, which in turn reduces the model's ability to generalize to unseen data. To address these challenges, we propose a multi-channel generative approach utilizing Flow Matching to synthesize diverse annotated images for data augmentation in ALL tasks. Each synthetically generated sample consists of a medical image paired with a multi-channel heatmap that encodes its landmark configuration, from which the corresponding landmark annotations can be derived. We assess the quality of synthetic image-heatmap pairs automatically using a Statistical Shape Model to evaluate landmark plausibility and compute the Fréchet Inception Distance score to quantify image quality. Our results show that pairs synthesized via Flow Matching exhibit superior quality and diversity compared with those generated by other state-of-the-art generative models like Generative Adversarial Networks or diffusion models. Furthermore, we investigate the effect of integrating synthetic data into the training process of an ALL network. In our experiments, the ALL network trained with Flow Matching-generated data demonstrates improved robustness, particularly in scenarios with limited training data or occlusions, compared with baselines that utilize solely real images or synthetic data from alternative generative models.

Mixed Modality Detection Methodology In Silico Academic Lab

Masked Autoencoder Pretraining and BiXLSTM ResNet Architecture for PET/CT Tumor Segmentation

Moona Mazher, Steven A Niederer, Abdul Qayyum

•preprint•Aug 29 2025

The accurate segmentation of lesions in whole-body PET/CT imaging is es-sential for tumor characterization, treatment planning, and response assess-ment, yet current manual workflows are labor-intensive and prone to inter-observer variability. Automated deep learning methods have shown promise but often remain limited by modality specificity, isolated time points, or in-sufficient integration of expert knowledge. To address these challenges, we present a two-stage lesion segmentation framework developed for the fourth AutoPET Challenge. In the first stage, a Masked Autoencoder (MAE) is em-ployed for self-supervised pretraining on unlabeled PET/CT and longitudinal CT scans, enabling the extraction of robust modality-specific representations without manual annotations. In the second stage, the pretrained encoder is fine-tuned with a bidirectional XLSTM architecture augmented with ResNet blocks and a convolutional decoder. By jointly leveraging anatomical (CT) and functional (PET) information as complementary input channels, the model achieves improved temporal and spatial feature integration. Evalua-tion on the AutoPET Task 1 dataset demonstrates that self-supervised pre-training significantly enhances segmentation accuracy, achieving a Dice score of 0.582 compared to 0.543 without pretraining. These findings high-light the potential of combining self-supervised learning with multimodal fu-sion for robust and generalizable PET/CT lesion segmentation. Code will be available at https://github.com/RespectKnowledge/AutoPet_2025_BxLSTM_UNET_Segmentation

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

Towards Interactive Lesion Segmentation in Whole-Body PET/CT with Promptable Models

Maximilian Rokuss, Yannick Kirchhoff, Fabian Isensee, Klaus H. Maier-Hein

•preprint•Aug 29 2025

Whole-body PET/CT is a cornerstone of oncological imaging, yet accurate lesion segmentation remains challenging due to tracer heterogeneity, physiological uptake, and multi-center variability. While fully automated methods have advanced substantially, clinical practice benefits from approaches that keep humans in the loop to efficiently refine predicted masks. The autoPET/CT IV challenge addresses this need by introducing interactive segmentation tasks based on simulated user prompts. In this work, we present our submission to Task 1. Building on the winning autoPET III nnU-Net pipeline, we extend the framework with promptable capabilities by encoding user-provided foreground and background clicks as additional input channels. We systematically investigate representations for spatial prompts and demonstrate that Euclidean Distance Transform (EDT) encodings consistently outperform Gaussian kernels. Furthermore, we propose online simulation of user interactions and a custom point sampling strategy to improve robustness under realistic prompting conditions. Our ensemble of EDT-based models, trained with and without external data, achieves the strongest cross-validation performance, reducing both false positives and false negatives compared to baseline models. These results highlight the potential of promptable models to enable efficient, user-guided segmentation workflows in multi-tracer, multi-center PET/CT. Code is publicly available at https://github.com/MIC-DKFZ/autoPET-interactive

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code

Incomplete Multi-modal Disentanglement Learning with Application to Alzheimer's Disease Diagnosis.

Han K, Hu D, Zhao F, Liu T, Yang F, Li G

•papers•Aug 29 2025

Multi-modal neuroimaging data, including magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (PET), have greatly advanced the computer-aided diagnosis of Alzheimer's disease (AD) by providing shared and complementary information. However, the problem of incomplete multi-modal data remains inevitable and challenging. Conventional strategies that exclude subjects with missing data or synthesize missing scans either result in substantial sample reduction or introduce unwanted noise. To address this issue, we propose an Incomplete Multi-modal Disentanglement Learning method (IMDL) for AD diagnosis without missing scan synthesis, a novel model that employs a tiny Transformer to fuse incomplete multi-modal features extracted by modality-wise variational autoencoders adaptively. Specifically, we first design a cross-modality contrastive learning module to encourage modality-wise variational autoencoders to disentangle shared and complementary representations of each modality. Then, to alleviate the potential information gap between the representations obtained from complete and incomplete multi-modal neuroimages, we leverage the technique of adversarial learning to harmonize these representations with two discriminators. Furthermore, we develop a local attention rectification module comprising local attention alignment and multi-instance attention rectification to enhance the localization of atrophic areas associated with AD. This module aligns inter-modality and intra-modality attention within the Transformer, thus making attention weights more explainable. Extensive experiments conducted on ADNI and AIBL datasets demonstrated the superior performance of the proposed IMDL in AD diagnosis, and a further validation on the HABS-HD dataset highlighted its effectiveness for dementia diagnosis using different multi-modal neuroimaging data (i.e., T1-weighted MRI and diffusion tensor imaging).

Mixed Modality Classification Neurological Methodology In Silico Academic Lab

Temporal Flow Matching for Learning Spatio-Temporal Trajectories in 4D Longitudinal Medical Imaging

Nico Albert Disch, Yannick Kirchhoff, Robin Peretzke, Maximilian Rokuss, Saikat Roy, Constantin Ulrich, David Zimmerer, Klaus Maier-Hein

•preprint•Aug 29 2025

Understanding temporal dynamics in medical imaging is crucial for applications such as disease progression modeling, treatment planning and anatomical development tracking. However, most deep learning methods either consider only single temporal contexts, or focus on tasks like classification or regression, limiting their ability for fine-grained spatial predictions. While some approaches have been explored, they are often limited to single timepoints, specific diseases or have other technical restrictions. To address this fundamental gap, we introduce Temporal Flow Matching (TFM), a unified generative trajectory method that (i) aims to learn the underlying temporal distribution, (ii) by design can fall back to a nearest image predictor, i.e. predicting the last context image (LCI), as a special case, and (iii) supports $3D$ volumes, multiple prior scans, and irregular sampling. Extensive benchmarks on three public longitudinal datasets show that TFM consistently surpasses spatio-temporal methods from natural imaging, establishing a new state-of-the-art and robust baseline for $4D$ medical image prediction.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Benchmark SOTA

Domain Adaptation Techniques for Natural and Medical Image Classification

Ahmad Chaddad, Yihang Wu, Reem Kateb, Christian Desrosiers

•preprint•Aug 28 2025

Domain adaptation (DA) techniques have the potential in machine learning to alleviate distribution differences between training and test sets by leveraging information from source domains. In image classification, most advances in DA have been made using natural images rather than medical data, which are harder to work with. Moreover, even for natural images, the use of mainstream datasets can lead to performance bias. {With the aim of better understanding the benefits of DA for both natural and medical images, this study performs 557 simulation studies using seven widely-used DA techniques for image classification in five natural and eight medical datasets that cover various scenarios, such as out-of-distribution, dynamic data streams, and limited training samples.} Our experiments yield detailed results and insightful observations highlighting the performance and medical applicability of these techniques. Notably, our results have shown the outstanding performance of the Deep Subdomain Adaptation Network (DSAN) algorithm. This algorithm achieved feasible classification accuracy (91.2\%) in the COVID-19 dataset using Resnet50 and showed an important accuracy improvement in the dynamic data stream DA scenario (+6.7\%) compared to the baseline. Our results also demonstrate that DSAN exhibits remarkable level of explainability when evaluated on COVID-19 and skin cancer datasets. These results contribute to the understanding of DA techniques and offer valuable insight into the effective adaptation of models to medical data.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation

Yifan Gao, Haoyue Li, Feng Yuan, Xiaosong Wang, Xin Gao

•preprint•Aug 28 2025

Foundation models pre-trained on large-scale natural image datasets offer a powerful paradigm for medical image segmentation. However, effectively transferring their learned representations for precise clinical applications remains a challenge. In this work, we propose Dino U-Net, a novel encoder-decoder architecture designed to exploit the high-fidelity dense features of the DINOv3 vision foundation model. Our architecture introduces an encoder built upon a frozen DINOv3 backbone, which employs a specialized adapter to fuse the model's rich semantic features with low-level spatial details. To preserve the quality of these representations during dimensionality reduction, we design a new fidelity-aware projection module (FAPM) that effectively refines and projects the features for the decoder. We conducted extensive experiments on seven diverse public medical image segmentation datasets. Our results show that Dino U-Net achieves state-of-the-art performance, consistently outperforming previous methods across various imaging modalities. Our framework proves to be highly scalable, with segmentation accuracy consistently improving as the backbone model size increases up to the 7-billion-parameter variant. The findings demonstrate that leveraging the superior, dense-pretrained features from a general-purpose foundation model provides a highly effective and parameter-efficient approach to advance the accuracy of medical image segmentation. The code is available at https://github.com/yifangao112/DinoUNet.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

The African Breast Imaging Dataset for Equitable Cancer Care: Protocol for an Open Mammogram and Ultrasound Breast Cancer Detection Dataset

Musinguzi, D., Katumba, A., Kawooya, M. G., Malumba, R., Nakatumba-Nabende, J., Achuka, S. A., Adewole, M., Anazodo, U.

•preprint•Aug 28 2025

IntroductionBreast cancer is one of the most common cancers globally. Its incidence in Africa has increased sharply, surpassing that in high-income countries. Mortality remains high due to late-stage diagnosis, when treatment is less effetive. We propose the first open, longitudinal breast imaging dataset from Africa comprising point-of-care ultrasound scans, mammograms, biopsy pathology, and clinical profiles to support early detection using machine learning. Methods and AnalysisWe will engage women through community outreach and train them in self-examination. Those with suspected lesions, particularly with a family history of breast cancer, will be invited to participate. A total of 100 women will undergo baseline assessment at medical centers, including clinical exams, blood tests, and mammograms. Follow-up point-of-care ultrasound scans and clinical data will be collected at 3 and 6 months, with final assessments at 9 months including mammograms. Ethics and DisseminationThe study has been approved by the Institutional Review Boards at ECUREI and the MAI Lab. Findings will be disseminated through peer-reviewed journals and scientific conferences.

Mixed Modality Detection Breast Dataset Release Concept Academic Lab Open Dataset

Learning What is Worth Learning: Active and Sequential Domain Adaptation for Multi-modal Gross Tumor Volume Segmentation

Jingyun Yang, Guoqing Zhang, Jingge Wang, Yang Li

•preprint•Aug 28 2025

Accurate gross tumor volume segmentation on multi-modal medical data is critical for radiotherapy planning in nasopharyngeal carcinoma and glioblastoma. Recent advances in deep neural networks have brought promising results in medical image segmentation, leading to an increasing demand for labeled data. Since labeling medical images is time-consuming and labor-intensive, active learning has emerged as a solution to reduce annotation costs by selecting the most informative samples to label and adapting high-performance models with as few labeled samples as possible. Previous active domain adaptation (ADA) methods seek to minimize sample redundancy by selecting samples that are farthest from the source domain. However, such one-off selection can easily cause negative transfer, and access to source medical data is often limited. Moreover, the query strategy for multi-modal medical data remains unexplored. In this work, we propose an active and sequential domain adaptation framework for dynamic multi-modal sample selection in ADA. We derive a query strategy to prioritize labeling and training on the most valuable samples based on their informativeness and representativeness. Empirical validation on diverse gross tumor volume segmentation tasks demonstrates that our method achieves favorable segmentation performance, significantly outperforming state-of-the-art ADA methods. Code is available at the git repository: \href{https://github.com/Hiyoochan/mmActS}{mmActS}.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Mitosis detection in domain shift scenarios: a Mamba-based approach

Gennaro Percannella, Mattia Sarno, Francesco Tortorella, Mario Vento

•preprint•Aug 28 2025

Mitosis detection in histopathology images plays a key role in tumor assessment. Although machine learning algorithms could be exploited for aiding physicians in accurately performing such a task, these algorithms suffer from significative performance drop when evaluated on images coming from domains that are different from the training ones. In this work, we propose a Mamba-based approach for mitosis detection under domain shift, inspired by the promising performance demonstrated by Mamba in medical imaging segmentation tasks. Specifically, our approach exploits a VM-UNet architecture for carrying out the addressed task, as well as stain augmentation operations for further improving model robustness against domain shift. Our approach has been submitted to the track 1 of the MItosis DOmain Generalization (MIDOG) challenge. Preliminary experiments, conducted on the MIDOG++ dataset, show large room for improvement for the proposed method.

Mixed Modality Detection Methodology In Silico Academic Lab

Filter Papers

Tags

Flow Matching-Based Data Synthesis for Robust Anatomical Landmark Localization.

Masked Autoencoder Pretraining and BiXLSTM ResNet Architecture for PET/CT Tumor Segmentation

Towards Interactive Lesion Segmentation in Whole-Body PET/CT with Promptable Models

Incomplete Multi-modal Disentanglement Learning with Application to Alzheimer's Disease Diagnosis.

Temporal Flow Matching for Learning Spatio-Temporal Trajectories in 4D Longitudinal Medical Imaging

Domain Adaptation Techniques for Natural and Medical Image Classification

Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation

The African Breast Imaging Dataset for Equitable Cancer Care: Protocol for an Open Mammogram and Ultrasound Breast Cancer Detection Dataset

Learning What is Worth Learning: Active and Sequential Domain Adaptation for Multi-modal Gross Tumor Volume Segmentation

Mitosis detection in domain shift scenarios: a Mamba-based approach

Ready to Sharpen Your Edge?