Latest Papers on Radiology AI. Tags: Mixed Modality

SqueezeViX-Net with SOAE: A Prevailing Deep Learning Framework for Accurate Pneumonia Classification using X-Ray and CT Imaging Modalities.

Kavitha N, Anand B

•papers•Sep 11 2025

Pneumonia represents a dangerous respiratory illness that leads to severe health problems when proper diagnosis does not occur, followed by an increase in deaths, particularly among at-risk populations. Appropriate treatment requires the correct identification of pneumonia types in conjunction with swift and accurate diagnosis. This paper presents the deep learning framework SqueezeViX-Net, specifically designed for pneumonia classification. The model benefits from a Self-Optimized Adaptive Enhancement (SOAE) method, which makes programmed changes to the dropout rate during the training process. The adaptive dropout adjustment mechanism leads to better model suitability and stability. The evaluation of SqueezeViX-Net is conducted through the analysis of extensive X-ray and CT image collections derived from publicly accessible Kaggle repositories. SqueezeViX-Net outperformed various established deep learning architectures, including DenseNet-121, ResNet-152V2, and EfficientNet-B7, when evaluated in terms of performance. The model demonstrated better results, as it performed with higher accuracy levels, surpassing both precision and recall metrics, as well as the F1-score metric. The validation process of this model was conducted using a range of pneumonia data sets, comprising both CT images and X-ray images, which demonstrated its ability to handle modality variations. SqueezeViX-Net integrates SOAE technology to develop an advanced framework that enables the specific identification of pneumonia for clinical use. The model demonstrates excellent diagnostic potential for medical staff through its dynamic learning capabilities and high precision, contributing to improved patient treatment outcomes.

Mixed Modality Classification Chest Methodology In Silico

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Mohammed Tiouti, Mohamed Bal-Ghaoui

•preprint•Sep 11 2025

Effective model and hyperparameter selection remains a major challenge in deep learning, often requiring extensive expertise and computation. While AutoML and large language models (LLMs) promise automation, current LLM-based approaches rely on trial and error and expensive APIs, which provide limited interpretability and generalizability. We propose MetaLLMiX, a zero-shot hyperparameter optimization framework combining meta-learning, explainable AI, and efficient LLM reasoning. By leveraging historical experiment outcomes with SHAP explanations, MetaLLMiX recommends optimal hyperparameters and pretrained models without additional trials. We further employ an LLM-as-judge evaluation to control output format, accuracy, and completeness. Experiments on eight medical imaging datasets using nine open-source lightweight LLMs show that MetaLLMiX achieves competitive or superior performance to traditional HPO methods while drastically reducing computational cost. Our local deployment outperforms prior API-based approaches, achieving optimal results on 5 of 8 tasks, response time reductions of 99.6-99.9%, and the fastest training times on 6 datasets (2.4-15.7x faster), maintaining accuracy within 1-5% of best-performing baselines.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA Open Code

Surrogate Supervision for Robust and Generalizable Deformable Image Registration

Yihao Liu, Junyu Chen, Lianrui Zuo, Shuwen Wei, Brian D. Boyd, Carmen Andreescu, Olusola Ajilore, Warren D. Taylor, Aaron Carass, Bennett A. Landman

•preprint•Sep 11 2025

Objective: Deep learning-based deformable image registration has achieved strong accuracy, but remains sensitive to variations in input image characteristics such as artifacts, field-of-view mismatch, or modality difference. We aim to develop a general training paradigm that improves the robustness and generalizability of registration networks. Methods: We introduce surrogate supervision, which decouples the input domain from the supervision domain by applying estimated spatial transformations to surrogate images. This allows training on heterogeneous inputs while ensuring supervision is computed in domains where similarity is well defined. We evaluate the framework through three representative applications: artifact-robust brain MR registration, mask-agnostic lung CT registration, and multi-modal MR registration. Results: Across tasks, surrogate supervision demonstrated strong resilience to input variations including inhomogeneity field, inconsistent field-of-view, and modality differences, while maintaining high performance on well-curated data. Conclusions: Surrogate supervision provides a principled framework for training robust and generalizable deep learning-based registration models without increasing complexity. Significance: Surrogate supervision offers a practical pathway to more robust and generalizable medical image registration, enabling broader applicability in diverse biomedical imaging scenarios.

Mixed Modality Registration Methodology In Silico Academic Lab

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

Jiesi Hu, Jianfeng Cao, Yanwu Yang, Chenfei Ye, Yixuan Zhang, Hanyang Peng, Ting Ma

•preprint•Sep 11 2025

In-context learning (ICL) offers a promising paradigm for universal medical image analysis, enabling models to perform diverse image processing tasks without retraining. However, current ICL models for medical imaging remain limited in two critical aspects: they cannot simultaneously achieve high-fidelity predictions and global anatomical understanding, and there is no unified model trained across diverse medical imaging tasks (e.g., segmentation and enhancement) and anatomical regions. As a result, the full potential of ICL in medical imaging remains underexplored. Thus, we present \textbf{Medverse}, a universal ICL model for 3D medical imaging, trained on 22 datasets covering diverse tasks in universal image segmentation, transformation, and enhancement across multiple organs, imaging modalities, and clinical centers. Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine, generating consistent, full-resolution volumetric outputs and enabling multi-scale anatomical awareness. We further propose a blockwise cross-attention module that facilitates long-range interactions between context and target inputs while preserving computational efficiency through spatial sparsity. Medverse is extensively evaluated on a broad collection of held-out datasets covering previously unseen clinical centers, organs, species, and imaging modalities. Results demonstrate that Medverse substantially outperforms existing ICL baselines and establishes a novel paradigm for in-context learning. Code and model weights will be made publicly available. Our model are publicly available at https://github.com/jiesihu/Medverse.

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Mohammed Tiouti, Mohamed Bal-Ghaoui

•preprint•Sep 11 2025

Mixed Modality Classification Methodology In Silico Academic Lab Open Code

Leveraging Large Language Models to Enhance Radiology Report Readability: A Systematic Review.

Patwardhan V, Balchander D, Fussell D, Joseph J, Joshi A, Troutt H, Ling J, Wei K, Weinberg B, Chow D

•papers•Sep 11 2025

Patients increasingly have direct access to their medical record. Radiology reports are complex and difficult for patients to understand and contextualize. One solution is to use large language models (LLMs) to translate reports into patient-accessible language. Objective This review summarizes the existing literature on using LLMs for the simplification of patient radiology reports. We also propose guidelines for best practices in future studies. A systematic review was performed following PRISMA guidelines. Studies published and indexed using PubMed, Scopus, and Google Scholar up to February 2025 were included. Inclusion criteria comprised of studies that used large language models for simplification of diagnostic or interventional radiology reports for patients and evaluated readability. Exclusion criteria included non-English manuscripts, abstracts, conference presentations, review articles, retracted articles, and studies that did not focus on report simplification. The Mixed Methods Appraisal tool (MMAT) 2018 was used for bias assessment. Given the diversity of results, studies were categorized based on reporting methods, and qualitative and quantitative findings were presented to summarize key insights. A total of 2126 citations were identified and 17 were included in the qualitative analysis. 71% of studies utilized a single LLM, while 29% of studies utilized multiple LLMs. The most prevalent LLMs included ChatGPT, Google Bard/Gemini, Bing Chat, Claude, and Microsoft Copilot. All studies that assessed quantitative readability metrics (n=12) reported improvements. Assessment of simplified reports via qualitative methods demonstrated varied results with physician vs non-physician raters. LLMs demonstrate the potential to enhance the accessibility of radiology reports for patients, but the literature is limited by heterogeneity of inputs, models, and evaluation metrics across existing studies. We propose a set of best practice guidelines to standardize future LLM research.

Mixed Modality LLM Radiology Report Review Concept GenAI Policy

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models

Qiuhui Chen, Xuancheng Yao, Huping Ye, Yi Hong

•preprint•Sep 11 2025

Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance. Our source code, generated datasets, and pre-trained models will be available at https://github.com/Qybc/Med3DInsight.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

FlexiD-Fuse: Flexible number of inputs multi-modal medical image fusion based on diffusion model

Yushen Xu, Xiaosong Li, Yuchun Wang, Xiaoqi Cheng, Huafeng Li, Haishu Tan

•preprint•Sep 11 2025

Different modalities of medical images provide unique physiological and anatomical information for diseases. Multi-modal medical image fusion integrates useful information from different complementary medical images with different modalities, producing a fused image that comprehensively and objectively reflects lesion characteristics to assist doctors in clinical diagnosis. However, existing fusion methods can only handle a fixed number of modality inputs, such as accepting only two-modal or tri-modal inputs, and cannot directly process varying input quantities, which hinders their application in clinical settings. To tackle this issue, we introduce FlexiD-Fuse, a diffusion-based image fusion network designed to accommodate flexible quantities of input modalities. It can end-to-end process two-modal and tri-modal medical image fusion under the same weight. FlexiD-Fuse transforms the diffusion fusion problem, which supports only fixed-condition inputs, into a maximum likelihood estimation problem based on the diffusion process and hierarchical Bayesian modeling. By incorporating the Expectation-Maximization algorithm into the diffusion sampling iteration process, FlexiD-Fuse can generate high-quality fused images with cross-modal information from source images, independently of the number of input images. We compared the latest two and tri-modal medical image fusion methods, tested them on Harvard datasets, and evaluated them using nine popular metrics. The experimental results show that our method achieves the best performance in medical image fusion with varying inputs. Meanwhile, we conducted extensive extension experiments on infrared-visible, multi-exposure, and multi-focus image fusion tasks with arbitrary numbers, and compared them with the perspective SOTA methods. The results of the extension experiments consistently demonstrate the effectiveness and superiority of our method.

Mixed Modality Image Synthesis Methodology In Silico

Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation

Wenjun Yu, Yinchen Zhou, Jia-Xuan Jiang, Shubin Zeng, Yuee Li, Zhong Wang

•preprint•Sep 10 2025

Multimodal models have achieved remarkable success in natural image segmentation, yet they often underperform when applied to the medical domain. Through extensive study, we attribute this performance gap to the challenges of multimodal fusion, primarily the significant semantic gap between abstract textual prompts and fine-grained medical visual features, as well as the resulting feature dispersion. To address these issues, we revisit the problem from the perspective of semantic aggregation. Specifically, we propose an Expectation-Maximization (EM) Aggregation mechanism and a Text-Guided Pixel Decoder. The former mitigates feature dispersion by dynamically clustering features into compact semantic centers to enhance cross-modal correspondence. The latter is designed to bridge the semantic gap by leveraging domain-invariant textual knowledge to effectively guide deep visual representations. The synergy between these two mechanisms significantly improves the model's generalization ability. Extensive experiments on public cardiac and fundus datasets demonstrate that our method consistently outperforms existing SOTA approaches across multiple domain generalization benchmarks.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

CLAPS: A CLIP-Unified Auto-Prompt Segmentation for Multi-Modal Retinal Imaging

Zhihao Zhao, Yinzheng Zhao, Junjie Yang, Xiangtong Yao, Quanmin Liang, Shahrooz Faghihroohi, Kai Huang, Nassir Navab, M. Ali Nasseri

•preprint•Sep 10 2025

Recent advancements in foundation models, such as the Segment Anything Model (SAM), have significantly impacted medical image segmentation, especially in retinal imaging, where precise segmentation is vital for diagnosis. Despite this progress, current methods face critical challenges: 1) modality ambiguity in textual disease descriptions, 2) a continued reliance on manual prompting for SAM-based workflows, and 3) a lack of a unified framework, with most methods being modality- and task-specific. To overcome these hurdles, we propose CLIP-unified Auto-Prompt Segmentation (\CLAPS), a novel method for unified segmentation across diverse tasks and modalities in retinal imaging. Our approach begins by pre-training a CLIP-based image encoder on a large, multi-modal retinal dataset to handle data scarcity and distribution imbalance. We then leverage GroundingDINO to automatically generate spatial bounding box prompts by detecting local lesions. To unify tasks and resolve ambiguity, we use text prompts enhanced with a unique "modality signature" for each imaging modality. Ultimately, these automated textual and spatial prompts guide SAM to execute precise segmentation, creating a fully automated and unified pipeline. Extensive experiments on 12 diverse datasets across 11 critical segmentation categories show that CLAPS achieves performance on par with specialized expert models while surpassing existing benchmarks across most metrics, demonstrating its broad generalizability as a foundation model.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

SqueezeViX-Net with SOAE: A Prevailing Deep Learning Framework for Accurate Pneumonia Classification using X-Ray and CT Imaging Modalities.

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Surrogate Supervision for Robust and Generalizable Deformable Image Registration

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Leveraging Large Language Models to Enhance Radiology Report Readability: A Systematic Review.

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models

FlexiD-Fuse: Flexible number of inputs multi-modal medical image fusion based on diffusion model

Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation

CLAPS: A CLIP-Unified Auto-Prompt Segmentation for Multi-Modal Retinal Imaging

Ready to Sharpen Your Edge?