Latest Papers on Radiology AI. Tags: Mixed Modality.

Privacy-Preserving Latent Diffusion-Based Synthetic Medical Image Generation.

Shi Y, Xia W, Niu C, Wiedeman C, Wang G

•papers•Oct 6 2025

Deep learning methods have impacted almost every research field, demonstrating notable successes in medical imaging tasks such as denoising and super-resolution. However, the prerequisite for deep learning is data at scale, but data sharing is expensive yet at risk of privacy leakage. As cutting-edge AI generative models, diffusion models have now become dominant because of their rigorous foundation and unprecedented outcomes. Here we propose a latent diffusion approach for data synthesis without compromising patient privacy. In our exemplary case studies, we develop a latent diffusion model to generate medical CT, MRI, and PET images using publicly available datasets. We demonstrate that state-of-the-art deep learning-based denoising/super-resolution networks can be trained on our synthetic data to achieve image quality with no significant difference from what the same network can achieve after being trained on the original data. In our advanced diffusion model, we specifically embed a safeguard mechanism to protect patient privacy effectively and efficiently. Our approach allows privacy-proof public sharing of diverse big datasets for development of deep models, potentially enabling federated learning at the level of input data instead of local network weights.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Academic Lab Open Dataset GenAI Ethics

Generative AI and foundation models in medical image.

Oda M

•papers•Oct 6 2025

In recent years, generative AI has attracted significant public attention, and its use has been rapidly expanding across a wide range of domains. From creative tasks such as text summarization, idea generation, and source code generation, to the streamlining of medical support tasks like diagnostic report generation and summarization, AI is now deeply involved in many areas. Today's breadth of AI applications is clearly distinct from what was seen before generative AI gained widespread recognition. Representative generative AI services include DALL·E 3 (OpenAI, California, USA) and Stable Diffusion (Stability AI, London, England, UK) for image generation, ChatGPT (OpenAI, California, USA), and Gemini (Google, California, USA) for text generation. The rise of generative AI has been influenced by advances in deep learning models and the scaling up of data, models, and computational resources based on the Scaling Laws. Moreover, the emergence of foundation models, which are trained on large-scale datasets and possess general-purpose knowledge applicable to various downstream tasks, is creating a new paradigm in AI development. These shifts brought about by generative AI and foundation models also profoundly impact medical image processing, fundamentally changing the framework for AI development in healthcare. This paper provides an overview of diffusion models used in image generation AI and large language models (LLMs) used in text generation AI, and introduces their applications in medical support. This paper also discusses foundation models, which are gaining attention alongside generative AI, including their construction methods and applications in the medical field. Finally, the paper explores how to develop foundation models and high-performance AI for medical support by fully utilizing national data and computational resources.

Mixed Modality Image Synthesis Review Concept Academic Lab GenAI

Machine learning-assisted classification of lung cancer: the role of sarcopenia, inflammatory biomarkers, and PET/CT anatomical-metabolic parameters.

Tanyildizi-Kokkulunk H, Alcin G, Cavdar I, Akyel R, Yigit S, Ciftci-Kusbeci T, Caliskan G

•papers•Oct 6 2025

Accurate differentiation between non-cancerous, benign, and malignant lung cancer remains a diagnostic challenge due to overlapping clinical and imaging characteristics. This study proposes a multimodal machine learning (ML) framework integrating positron emission tomography/computed tomography (PET/CT) anatomic-metabolic parameters, sarcopenia markers, and inflammatory biomarkers to enhance classification performance in lung cancer. A retrospective dataset of 222 patients was analyzed, including demographic variables, functional and morphometric sarcopenia indices, hematological inflammation markers, and PET/CT derived parameters such as maximum and mean standardized uptake value (SUVmax, SUVmean), metabolic tumor volume (MTV), total lesion glycolysis (TLG). Five ML algorithms-Logistic Regression, Multi-Layer Perceptron, Support Vector Machine, Extreme Gradient Boosting, and Random Forest-were evaluated using standardized performance metrics. Synthetic Minority Oversampling Technique was applied to balance class distributions. Feature importance analysis was conducted using the optimal model, and classification was repeated using the top 15 features. Among the models, Random Forest demonstrated superior predictive performance with a test accuracy of 96%, precision, recall, and F1-score of 0.96, and an average AUC of 0.99. Feature importance analysis revealed SUVmax, SUVmean, total lesion glycolysis, and skeletal muscle index as leading predictors. A secondary classification using only the top 15 features yielded even higher test accuracy (97%). These findings underscore the potential of integrating metabolic imaging, physical function, and biochemical inflammation markers in a non-invasive ML-based diagnostic pipeline. The proposed framework demonstrates high accuracy and generalizability and may serve as an effective clinical decision support tool in early lung cancer diagnosis and risk stratification.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab

Multimodal imaging and advanced quantitative techniques for HER-2 status prediction in breast cancer.

Wang Q, Zheng N, Yu Q, Shao S

•papers•Oct 6 2025

HER-2-positive breast cancer is a biologically distinct subtype, accurate early assessment of HER-2 status is therefore critical for guiding personalized treatment. Currently mainly detected through immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) on biopsy or surgical specimens. However, these methods are invasive, susceptible to sampling errors, and lack the capability for real-time, noninvasive monitoring of tumor heterogeneity or treatment response. Therefore, the development of noninvasive imaging-based predictive methods has gained significant research interest. Multiparametric magnetic resonance imaging (mpMRI) can quantify tumor perfusion parameters (Ktrans-vascular permeability, Ve- extracellular space) through dynamic contrast-enhanced MRI (DCE-MRI), measure the apparent diffusion coefficient (ADC) using diffusion-weighted imaging (DWI), and obtain metabolic information via positron emission tomography-MRI (PET-MRI), which are closely associated with HER-2 expression status. Concurrently, radiomics and deep learning (DL) systematically extract multidimensional features of breast tumors from multimodal imaging data, including morphological parameters (sphericity, surface area), first-order statistical metrics (kurtosis, K; skewness), and textural features (gray-level co-occurrence matrix GLCM, quantisation texture space distribution; gray-level run-length matrix GLRLM, evaluation of homogeneous region size), thereby constructing high-dimensional quantitative analysis datasets. Based on the resolution of the heterogeneity of the feature spatial distribution, the DL algorithms can autonomously mine the potential imaging patterns closely related to the expression of HER-2 molecules and establish a non-invasive prediction model. Although traditional single-parameter models, such as the ADC derived from DWI, can provide valuable information about the tumor microenvironment, their predictive efficacy is often constrained by parameter inconsistencies and a lack of standardization. As a narrative review, this article argues that multimodal imaging, radiomics, and deep learning are better equipped to capture the complex HER-2-related tumor heterogeneity, thereby providing a stronger theoretical foundation for guiding personalized treatment strategies and prognostic evaluation.

Mixed Modality Classification Breast Review Concept GenAI

Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis

Arnela Hadzic, Simon Johannes Joham, Martin Urschler

•preprint•Oct 6 2025

Generating synthetic CT (sCT) from MRI or CBCT plays a crucial role in enabling MRI-only and CBCT-based adaptive radiotherapy, improving treatment precision while reducing patient radiation exposure. To address this task, we adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI $\rightarrow$ sCT and CBCT $\rightarrow$ sCT across three anatomical regions: abdomen, head and neck, and thorax. Validation and testing were performed through the challenge submission system. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to the relatively low training resolution imposed by memory and runtime constraints. Future work will explore patch-based training and latent-space flow models to improve resolution and local structural fidelity.

Mixed Modality Image Synthesis Retrospective Clinical In Silico

A systematic review on automatic segmentation of renal tumors and cysts using various convolutional neural network architectures in radiological images.

Anusha C, Rao KN, Rao TL

•papers•Oct 4 2025

Premature diagnosis of kidney cancer is crucial for saving lives and enabling better treatment. Medical experts utilize radiological images, such as CT, MRI, US, and histopathological analysis, to identify kidney tumors and cysts, providing valuable information on their size, shape, location, and metabolism, thus aiding in diagnosis. In radiological image processing, precise segmentation remains difficult when done manually, despite numerous noteworthy efforts and encouraging results in this field. Thus, there's an emergent need for automatic methods for renal and renal mass segmentation. In this regard, this article reviews studies on utilizing deep learning models to detect renal masses early in medical imaging examinations, particularly various CNN (Convolutional Neural Network) models that have demonstrated excellent outcomes in the segmentation of radiological images. Furthermore, we addressed the detailed dataset characteristics that the researchers adapted, as well as the accuracy and efficiency metrics obtained using various parameters. However, several studies employed datasets with limited images, whereas only a handful used hundreds of thousands of images. Those examinations did not fully determine the tumor and cyst diagnosis. The key goals are to describe recent accomplishments, examine the methodological approaches used by researchers, and recommend potential future research directions.

Mixed Modality Segmentation Abdominal Review Concept

Multi-Modal Oral Cancer Detection Using Weighted Ensemble Convolutional Neural Networks

Ajo Babu George, Sreehari J R Ajo Babu George, Sreehari J R Ajo Babu George, Sreehari J R

•preprint•Oct 4 2025

Aims Late diagnosis of Oral Squamous Cell Carcinoma (OSCC) contributes significantly to its high global mortality rate, with over 50\% of cases detected at advanced stages and a 5-year survival rate below 50\% according to WHO statistics. This study aims to improve early detection of OSCC by developing a multimodal deep learning framework that integrates clinical, radiological, and histopathological images using a weighted ensemble of DenseNet-121 convolutional neural networks (CNNs). Material and Methods A retrospective study was conducted using publicly available datasets representing three distinct medical imaging modalities. Each modality-specific dataset was used to train a DenseNet-121 CNN via transfer learning. Augmentation and modality-specific preprocessing were applied to increase robustness. Predictions were fused using a validation-weighted ensemble strategy. Evaluation was performed using accuracy, precision, recall, F1-score. Results High validation accuracy was achieved for radiological (100\%) and histopathological (95.12\%) modalities, with clinical images performing lower (63.10\%) due to visual heterogeneity. The ensemble model demonstrated improved diagnostic robustness with an overall accuracy of 84.58\% on a multimodal validation dataset of 55 samples. Conclusion The multimodal ensemble framework bridges gaps in the current diagnostic workflow by offering a non-invasive, AI-assisted triage tool that enhances early identification of high-risk lesions. It supports clinicians in decision-making, aligning with global oncology guidelines to reduce diagnostic delays and improve patient outcomes.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab

MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation

T-Mai Bui, Fares Bougourzi, Fadi Dornaika, Vinh Truong Hoang

•preprint•Oct 4 2025

In recent years, deep learning has shown near-expert performance in segmenting complex medical tissues and tumors. However, existing models are often task-specific, with performance varying across modalities and anatomical regions. Balancing model complexity and performance remains challenging, particularly in clinical settings where both accuracy and efficiency are critical. To address these issues, we propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion (MAF) mechanism to capture local, global, and long-range dependencies. A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency. Additionally, a co-attention gate enhances feature selection by emphasizing relevant spatial and semantic information across scales during both encoding and decoding, improving feature interaction and cross-scale communication. Extensive experiments on multiple benchmark datasets show that our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity. By effectively balancing efficiency and effectiveness, our architecture offers a practical and scalable solution for diverse medical imaging tasks. Source code and trained models will be publicly released upon acceptance to support reproducibility and further research.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA Open Code Reproducibility

LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

Ci-Siang Lin, Min-Hung Chen, Yu-Yang Sheng, Yu-Chiang Frank Wang

•preprint•Oct 3 2025

Multimodal Large Language Models (MLLMs) have achieved strong performance on general visual benchmarks but struggle with out-of-distribution (OOD) tasks in specialized domains such as medical imaging, where labeled data is limited and expensive. We introduce LEAML, a label-efficient adaptation framework that leverages both scarce labeled VQA samples and abundant unlabeled images. Our approach generates domain-relevant pseudo question-answer pairs for unlabeled data using a QA generator regularized by caption distillation. Importantly, we selectively update only those neurons most relevant to question-answering, enabling the QA Generator to efficiently acquire domain-specific knowledge during distillation. Experiments on gastrointestinal endoscopy and sports VQA demonstrate that LEAML consistently outperforms standard fine-tuning under minimal supervision, highlighting the effectiveness of our proposed LEAML framework.

Mixed Modality LLM Radiology Report Abdominal Methodology In Silico Academic Lab GenAI

Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training

Tidiane Camaret Ndir, Alexander Pfefferle, Robin Tibor Schirrmeister

•preprint•Oct 3 2025

Interactive 3D biomedical image segmentation requires efficient models that can iteratively refine predictions based on user prompts. Current foundation models either lack volumetric awareness or suffer from limited interactive capabilities. We propose a training strategy that combines dynamic volumetric prompt generation with content-aware adaptive cropping to optimize the use of the image encoder. Our method simulates realistic user interaction patterns during training while addressing the computational challenges of learning from sequential refinement feedback on a single GPU. For efficient training, we initialize our network using the publicly available weights from the nnInteractive segmentation model. Evaluation on the \textbf{Foundation Models for Interactive 3D Biomedical Image Segmentation} competition demonstrates strong performance with an average final Dice score of 0.6385, normalized surface distance of 0.6614, and area-under-the-curve metrics of 2.4799 (Dice) and 2.5671 (NSD).

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Privacy-Preserving Latent Diffusion-Based Synthetic Medical Image Generation.

Generative AI and foundation models in medical image.

Machine learning-assisted classification of lung cancer: the role of sarcopenia, inflammatory biomarkers, and PET/CT anatomical-metabolic parameters.

Multimodal imaging and advanced quantitative techniques for HER-2 status prediction in breast cancer.

Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis

A systematic review on automatic segmentation of renal tumors and cysts using various convolutional neural network architectures in radiological images.

Multi-Modal Oral Cancer Detection Using Weighted Ensemble Convolutional Neural Networks

MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation

LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training

Ready to Sharpen Your Edge?