Latest Papers on Radiology AI. Tags: Mixed Modality

DBCM-net:dual backbone cascaded multi-convolutional segmentation network for medical image segmentation.

Wang X, Li B, Ma J, Huo L, Tian X

•papers•Sep 17 2025

Medical image segmentation plays a vital role in diagnosis, treatment planning, and disease monitoring. However, endoscopic and dermoscopic images often exhibit blurred boundaries and low contrast, presenting a significant challenge for precise segmentation. Moreover, single encoder-decoder architectures suffer from inherent limitations, resulting in the loss of either fine-grained details or global context. Some dual-encoder models yield inaccurate results due to mismatched receptive fields and overly simplistic fusion strategies. To overcome these issues, we present the Dual Backbone Cascaded Multi-Convolutional Segmentation Network (DBCM-Net). Our approach employs a Multi-Axis Vision Transformer and a Vision Mamba encoder to extract semantic features at multiple scales, with a cascaded design that enables information sharing between the two backbones. We introduce the Global and Local Fusion Attention Block (GLFAB) to generate attention masks that seamlessly integrate global context with local detail, producing more precise feature maps. Additionally, we incorporate a Depthwise Separable Convolution Attention Module (DSCAM) within the encoders to strengthen the model's ability to capture critical features. A Feature Refinement Fusion Block (FRFB) is further applied to refine these feature maps before subsequent processing. The cascaded network architecture synergistically combines the complementary strengths of both encoders. We rigorously evaluated our model on three distinct datasets, achieving Dice coefficients of 94.93% on the CVC-ClinicDB polyp dataset, 91.93% on ISIC2018, and 92.73% on ACDC, each surpassing current state-of-the-art methods. Extensive experiments demonstrate that the proposed method excels in segmentation accuracy and preserves edge details effectively.

Mixed Modality Segmentation Methodology In Silico

Diagnostic Performance of Large Language Models in Multimodal Analysis of Radiolucent Jaw Lesions.

Kim K, Kim BC

•papers•Sep 16 2025

Large language models (LLMs), such as ChatGPT and Gemini, are increasingly being used in medical domains, including dental diagnostics. Despite advancements in image-based deep learning systems, LLM diagnostic capabilities in oral and maxillofacial surgery (OMFS) for processing multi-modal imaging inputs remain underexplored. Radiolucent jaw lesions represent a particularly challenging diagnostic category due to their varied presentations and overlapping radiographic features. This study evaluated diagnostic performance of ChatGPT 4o and Gemini 2.5 Pro using real-world OMFS radiolucent jaw lesion cases, presented in multiple-choice (MCQ) and short-answer (SAQ) formats across 3 imaging conditions: panoramic radiography only, panoramic + CT, and panoramic + CT + pathology. Data from 100 anonymized patients at Wonkwang University Daejeon Dental Hospital were analyzed, including demographics, panoramic radiographs, CBCT images, histopathology slides, and confirmed diagnoses. Sample size was determined based on institutional case availability and statistical power requirements for comparative analysis. ChatGPT and Gemini diagnosed each case under 6 conditions using 3 imaging modalities (P, P+C, P+C+B) in MCQ and SAQ formats. Model accuracy was scored against expert-confirmed diagnoses by 2 independent evaluators. McNemar's and Cochran's Q tests evaluated statistical differences across models and imaging modalities. For MCQ tasks, ChatGPT achieved 66%, 73%, and 82% accuracies across the P, P+C, and P+C+B conditions, respectively, while Gemini achieved 57%, 62%, and 63%, respectively. In SAQ tasks, ChatGPT achieved 34%, 45%, and 48%; Gemini achieved 15%, 24%, and 28%, respectively. Accuracy improved significantly with additional imaging data for ChatGPT; ChatGPT consistently outperformed Gemini across all conditions (P < .001 for MCQ; P = .008 to < .001 for SAQ). MCQ format, which incorporates a human-in-the-loop (HITL) structure, showed higher overall performance than SAQ. ChatGPT demonstrated superior diagnostic performance compared to Gemini in OMFS diagnostic tasks when provided with richer multimodal inputs. Diagnostic accuracy increased with additional imaging data, especially in MCQ formats, suggesting LLMs can effectively synthesize radiographic and pathological data. LLMs have potential as diagnostic support tools for OMFS, especially in settings with limited specialist access. Presenting clinical cases in structured formats using curated imaging data enhances LLM accuracy and underscores HITL integration. Although current LLMs show promising results, further validation using larger datasets and hybrid AI systems are necessary for broader contextualised, clinical adoption.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab GenAI

MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.

Xia Z, Li H, Lan L

•papers•Sep 16 2025

Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is designed to explicitly attend to the most relevant content. Theoretical analysis demonstrates that MedFormer outperforms existing medical vision transformers in terms of generality and efficiency. Extensive experiments across various imaging modality datasets show that MedFormer consistently enhances performance in all three medical image recognition tasks mentioned above. MedFormer provides an efficient and versatile solution for medical image recognition, with strong potential for clinical application. The code is available on GitHub.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

Cross-modality transformer model leveraging DCE-MRI and pathological images for predicting pathological complete response and lymph node metastasis in breast cancer.

Fan M, Zhu Z, Yu Z, Du J, Xie S, Pan X, Chen S, Li L

•papers•Sep 16 2025

Pathological diagnosis remains the gold standard for diagnosing breast cancer and is highly accurate and sensitive, which is crucial for assessing pathological complete response (pCR) and lymph node metastasis (LNM) following neoadjuvant chemotherapy (NACT). Dynamic contrast-enhanced MRI (DCE-MRI) is a noninvasive technique that provides detailed morphological and functional insights into tumors. The optimal complementarity of these two modalities, particularly in situations where one is unavailable, and their integration to enhance therapeutic predictions have not been fully explored. To this end, we propose a cross-modality image transformer (CMIT) model designed for feature synthesis and fusion to predict pCR and LNM in breast cancer. This model enables interaction and integration between the two modalities via a transformer's cross-attention module. A modality information transfer module is developed to produce synthetic pathological image features (sPIFs) from DCE-MRI data and synthetic DCE-MRI features (sMRIs) from pathological images. During training, the model leverages both real and synthetic imaging features to increase the predictive performance. In the prediction phase, the synthetic imaging features are fused with the corresponding real imaging feature to make predictions. The experimental results demonstrate that the proposed CMIT model, which integrates DCE-MRI with sPIFs or histopathological images with sMRI, outperforms (with AUCs of 0.809 and 0.852, respectively) the use of MRI or pathological images alone in predicting the pCR to NACT. Similar improvements were observed in LNM prediction. For LNM prediction, the DCE-MRI model's performance improved from an AUC of 0.637 to 0.712, while the DCE-MRI-guided histopathological model achieved an AUC of 0.792. Notably, our proposed model can predict treatment response effectively via DCE-MRI, regardless of the availability of actual histopathological images.

Mixed Modality Classification Breast Methodology In Silico

MBLEformer: Multi-Scale Bidirectional Lesion Enhancement Transformer for Cervical Cancer Image Segmentation.

Li S, Chen P, Zhang J, Wang B

•papers•Sep 16 2025

Accurate segmentation of lesion areas from Lugol's Iodine Staining images is crucial for screening pre-cancerous cervical lesions. However, in underdeveloped regions lacking skilled clinicians, this method may lead to misdiagnosis and missed diagnoses. In recent years, deep learning methods have been widely applied to assist in medical image segmentation. This study aims to improve the accuracy of cervical cancer lesion segmentation by addressing the limitations of Convolutional Neural Networks (CNNs) and attention mechanisms in capturing global features and refining upsampling details. This paper presents a Multi-Scale Bidirectional Lesion Enhancement Network, named MBLEformer, which employs the Swin Transformer encoder to extract image features at multiple stages and utilizes a multi-scale attention mechanism to capture semantic features from different perspectives. Additionally, a bidirectional lesion enhancement upsampling strategy is introduced to refine the edge details of lesion areas. Experimental results demonstrate that the proposed model exhibits superior segmentation performance on a proprietary cervical cancer colposcopic dataset, outperforming other medical image segmentation methods, with a mean Intersection over Union (mIoU) of 82.5%, accuracy, and specificity of 94.9% and 83.6%. MBLEformer significantly improves the accuracy of lesion segmentation in iodine-stained cervical cancer images, with the potential to enhance the efficiency and accuracy of pre-cancerous lesion diagnosis and help address the issue of imbalanced medical resources.

Mixed Modality Segmentation Abdominal Methodology In Silico

Artificial Intelligence in Cardiovascular Health: Insights into Post-COVID Public Health Challenges.

Naushad Z, Malik J, Mishra AK, Singh S, Shrivastav D, Sharma CK, Verma VV, Pal RK, Roy B, Sharma VK

•papers•Sep 16 2025

Cardiovascular diseases (CVDs) continue to be the topmost cause of the worldwide morbidity and mortality. Risk factors such as diabetes, hypertension, obesity and smoking are significantly worsening the situation. The COVID-19 pandemic has powerfully highlighted the undeniable connection between viral infections and cardiovascular health. Current literature highlights that SARS-CoV-2 contributes to myocardial injury, endothelial dysfunction, thrombosis, and systemic inflammation, increasing the severity of CVD outcomes. Long COVID has also been associated with persistent cardiovascular complications, including myocarditis, arrhythmias, thromboembolic events, and accelerated atherosclerosis. Addressing these challenges requires continued research and public health strategies to mitigate long-term risks. Artificial intelligence (AI) is changing cardiovascular medicine and community health through progressive machine learning (ML) and deep learning (DL) applications. AI enhances risk prediction, facilitates biomarker discovery, and improves imaging techniques such as echocardiography, CT, and MRI for detecting coronary artery disease and myocardial injury on time. Remote monitoring and wearable devices powered by AI enable real-time cardiovascular assessment and personalized treatment. In public health, AI optimizes disease surveillance, epidemiological modeling, and healthcare resource allocation. AI-driven clinical decision support systems improve diagnostic accuracy and health equity by enabling targeted interventions. The integration of AI into cardiovascular medicine and public health offers data-driven, efficient, and patient-centered solutions to mitigate post-COVID cardiovascular complications.

Mixed Modality Detection Cardiac

FunKAN: Functional Kolmogorov-Arnold Network for Medical Image Enhancement and Segmentation

Maksim Penkin, Andrey Krylov

•preprint•Sep 16 2025

Medical image enhancement and segmentation are critical yet challenging tasks in modern clinical practice, constrained by artifacts and complex anatomical variations. Traditional deep learning approaches often rely on complex architectures with limited interpretability. While Kolmogorov-Arnold networks offer interpretable solutions, their reliance on flattened feature representations fundamentally disrupts the intrinsic spatial structure of imaging data. To address this issue we propose a Functional Kolmogorov-Arnold Network (FunKAN) -- a novel interpretable neural framework, designed specifically for image processing, that formally generalizes the Kolmogorov-Arnold representation theorem onto functional spaces and learns inner functions using Fourier decomposition over the basis Hermite functions. We explore FunKAN on several medical image processing tasks, including Gibbs ringing suppression in magnetic resonance images, benchmarking on IXI dataset. We also propose U-FunKAN as state-of-the-art binary medical segmentation model with benchmarks on three medical datasets: BUSI (ultrasound images), GlaS (histological structures) and CVC-ClinicDB (colonoscopy videos), detecting breast cancer, glands and polyps, respectively. Experiments on those diverse datasets demonstrate that our approach outperforms other KAN-based backbones in both medical image enhancement (PSNR, TV) and segmentation (IoU, F1). Our work bridges the gap between theoretical function approximation and medical image analysis, offering a robust, interpretable solution for clinical applications.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Intelligent Healthcare Imaging Platform An VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation

Samer Al-Hamadani

•preprint•Sep 16 2025

The rapid advancement of artificial intelligence (AI) in healthcare imaging has revolutionized diagnostic medicine and clinical decision-making processes. This work presents an intelligent multimodal framework for medical image analysis that leverages Vision-Language Models (VLMs) in healthcare diagnostics. The framework integrates Google Gemini 2.5 Flash for automated tumor detection and clinical report generation across multiple imaging modalities including CT, MRI, X-ray, and Ultrasound. The system combines visual feature extraction with natural language processing to enable contextual image interpretation, incorporating coordinate verification mechanisms and probabilistic Gaussian modeling for anomaly distribution. Multi-layered visualization techniques generate detailed medical illustrations, overlay comparisons, and statistical representations to enhance clinical confidence, with location measurement achieving 80 pixels average deviation. Result processing utilizes precise prompt engineering and textual analysis to extract structured clinical information while maintaining interpretability. Experimental evaluations demonstrated high performance in anomaly detection across multiple modalities. The system features a user-friendly Gradio interface for clinical workflow integration and demonstrates zero-shot learning capabilities to reduce dependence on large datasets. This framework represents a significant advancement in automated diagnostic support and radiological workflow efficiency, though clinical validation and multi-center evaluation are necessary prior to widespread adoption.

Mixed Modality Detection Methodology In Silico Academic Lab Breakthrough GenAI

Data fusion of medical imaging in neurological disorders.

Mirzaei G, Gupta A, Adeli H

•papers•Sep 16 2025

Medical imaging plays a crucial role in the accurate diagnosis and prognosis of various medical conditions, with each modality offering unique and complementary insights into the body's structure and function. However, no single imaging technique can capture the full spectrum of necessary information. Data fusion has emerged as a powerful tool to integrate information from different perspectives, including multiple modalities, views, temporal sequences, and spatial scales. By combining data, fusion techniques provide a more comprehensive understanding, significantly enhancing the precision and reliability of clinical analyses. This paper presents an overview of data fusion approaches - covering multi-view, multi-modal, and multi-scale strategies - across imaging modalities such as MRI, CT, PET, SPECT, EEG, and MEG, with a particular emphasis on applications in neurological disorders. Furthermore, we highlight the latest advancements in data fusion methods and key studies published since 2016, illustrating the progress and growing impact of this interdisciplinary field.

Mixed Modality Image Synthesis Neurological Review Concept

MambaDiff: Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation.

Liu Y, Feng Y, Cheng J, Zhan H, Zhu Z

•papers•Sep 15 2025

Accurate 3D medical image segmentation is crucial for diagnosis and treatment. Diffusion models demonstrate promising performance in medical image segmentation tasks due to the progressive nature of the generation process and the explicit modeling of data distributions. However, the weak guidance of conditional information and insufficient feature extraction in diffusion models lead to the loss of fine-grained features and structural consistency in the segmentation results, thereby affecting the accuracy of medical image segmentation. To address this challenge, we propose a Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation. We extract multilevel semantic features from the original images using an encoder and tightly integrate them with the denoising process of the diffusion model through a Semantic Hierarchical Embedding (SHE) mechanism, to capture the intricate relationship between the noisy label and image data. Meanwhile, we design a Global-Slice Perception Mamba (GSPM) layer, which integrates multi-dimensional perception mechanisms to endow the model with comprehensive spatial reasoning and feature extraction capabilities. Experimental results show that our proposed MambaDiff achieves more competitive performance compared to prior arts with substantially fewer parameters on four public medical image segmentation datasets including BraTS 2021, BraTS 2024, LiTS and MSD Hippocampus. The source code of our method is available at https://github.com/yuliu316316/MambaDiff.

Mixed Modality Segmentation Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

DBCM-net:dual backbone cascaded multi-convolutional segmentation network for medical image segmentation.

Diagnostic Performance of Large Language Models in Multimodal Analysis of Radiolucent Jaw Lesions.

MedFormer: hierarchical medical vision transformer with content-aware dual sparse selection attention.

Cross-modality transformer model leveraging DCE-MRI and pathological images for predicting pathological complete response and lymph node metastasis in breast cancer.

MBLEformer: Multi-Scale Bidirectional Lesion Enhancement Transformer for Cervical Cancer Image Segmentation.

Artificial Intelligence in Cardiovascular Health: Insights into Post-COVID Public Health Challenges.

FunKAN: Functional Kolmogorov-Arnold Network for Medical Image Enhancement and Segmentation

Intelligent Healthcare Imaging Platform An VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation

Data fusion of medical imaging in neurological disorders.

MambaDiff: Mamba-Enhanced Diffusion Model for 3D Medical Image Segmentation.

Ready to Sharpen Your Edge?