Latest Papers on Radiology AI. Tags: Mixed Modality

An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation

Dayu Tan, Cheng Kong, Yansen Su, Hai Chen, Dongliang Yang, Junfeng Xia, Chunhou Zheng

•preprint•Sep 29 2025

In the field of multi-organ medical image segmentation, recent methods frequently employ Transformers to capture long-range dependencies from image features. However, these methods overlook the high computational cost of Transformers and their deficiencies in extracting local detailed information. To address high computational costs and inadequate local detail information, we reassess the design of feature extraction modules and propose a new deep-learning network called LamFormer for fine-grained segmentation tasks across multiple organs. LamFormer is a novel U-shaped network that employs Linear Attention Mamba (LAM) in an enhanced pyramid encoder to capture multi-scale long-range dependencies. We construct the Parallel Hierarchical Feature Aggregation (PHFA) module to aggregate features from different layers of the encoder, narrowing the semantic gap among features while filtering information. Finally, we design the Reduced Transformer (RT), which utilizes a distinct computational approach to globally model up-sampled features. RRT enhances the extraction of detailed local information and improves the network's capability to capture long-range dependencies. LamFormer outperforms existing segmentation methods on seven complex and diverse datasets, demonstrating exceptional performance. Moreover, the proposed network achieves a balance between model performance and model complexity.

Mixed Modality Segmentation Whole Body Methodology In Silico Benchmark SOTA

BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation

Zelin Liu, Sicheng Dong, Bocheng Li, Yixuan Yang, Jiacheng Ruan, Chenxu Zhou, Suncheng Xiang

•preprint•Sep 29 2025

Vision foundation models like the Segment Anything Model (SAM), pretrained on large-scale natural image datasets, often struggle in medical image segmentation due to a lack of domain-specific adaptation. In clinical practice, fine-tuning such models efficiently for medical downstream tasks with minimal resource demands, while maintaining strong performance, is challenging. To address these issues, we propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging. It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM's Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and (3) a low-rank tensor attention mechanism in the mask decoder, cutting memory usage by 75% and boosting inference speed. Experiments on standard medical segmentation datasets show that BALR-SAM, without requiring prompts, outperforms several state-of-the-art (SOTA) methods, including fully fine-tuned MedSAM, while updating just 1.8% (11.7M) of its parameters.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

EVLF-FM: Explainable Vision Language Foundation Model for Medicine

Yang Bai, Haoran Cheng, Yang Zhou, Jun Zhou, Arun Thirunavukarasu, Yuhe Ke, Jie Yao, Kanae Fukutsu, Chrystie Wan Ning Quek, Ashley Hong, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Hiok Hong Chan, Victor Koh, Marcus Tan, Kelvin Z. Li, Leonard Yip, Ching Yu Cheng, Yih Chung Tham, Gavin Siew Wei Tan, Leopold Schmetterer, Marcus Ang, Rahat Hussain, Jod Mehta, Tin Aung, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Soon Thye Lim, Eyal Klang, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting

•preprint•Sep 29 2025

Despite the promise of foundation models in medical AI, current systems remain limited - they are modality-specific and lack transparent reasoning processes, hindering clinical adoption. To address this gap, we present EVLF-FM, a multimodal vision-language foundation model (VLM) designed to unify broad diagnostic capability with fine-grain explainability. The development and testing of EVLF-FM encompassed over 1.3 million total samples from 23 global datasets across eleven imaging modalities related to six clinical specialties: dermatology, hepatology, ophthalmology, pathology, pulmonology, and radiology. External validation employed 8,884 independent test samples from 10 additional datasets across five imaging modalities. Technically, EVLF-FM is developed to assist with multiple disease diagnosis and visual question answering with pixel-level visual grounding and reasoning capabilities. In internal validation for disease diagnostics, EVLF-FM achieved the highest average accuracy (0.858) and F1-score (0.797), outperforming leading generalist and specialist models. In medical visual grounding, EVLF-FM also achieved stellar performance across nine modalities with average mIOU of 0.743 and [email protected] of 0.837. External validations further confirmed strong zero-shot and few-shot performance, with competitive F1-scores despite a smaller model size. Through a hybrid training strategy combining supervised and visual reinforcement fine-tuning, EVLF-FM not only achieves state-of-the-art accuracy but also exhibits step-by-step reasoning, aligning outputs with visual evidence. EVLF-FM is an early multi-disease VLM model with explainability and reasoning capabilities that could advance adoption of and trust in foundation models for real-world clinical deployment.

Mixed Modality Classification Methodology In Silico Academic Lab Breakthrough Benchmark SOTA GenAI

Advancement in hepatocellular carcinoma research: Biomarkers, therapeutics approaches and impact of artificial intelligence.

Rajak D, Nema P, Sahu A, Vishwakarma S, Kashaw SK

•papers•Sep 29 2025

Cancer is a leading, highly complex, and deadly disease that has become a major concern in modern medicine. Hepatocellular carcinoma is the most common primary liver cancer and a leading cause of global cancer mortality. Its development is predominantly associated with chronic liver diseases such as hepatitis B and C infections, cirrhosis, alcohol consumption, and non-alcoholic fatty liver disease. Molecular mechanisms underlying HCC involve genetic mutations, epigenetic changes, and disrupted signalling pathways, including Wnt/β-catenin and PI3K/AKT/mTOR. Early diagnosis remains challenging, as most cases are detected at advanced stages, limiting curative treatment options. Diagnostic advancements, including biomarkers like alpha-fetoprotein and cutting-edge imaging techniques such as CT, MRI, and ultrasound-based radiomics, have improved early detection. Treatment strategies depend on the disease stage, ranging from curative options like surgical resection and liver transplantation to palliative therapies, including transarterial chemoembolization, systemic therapies, and immunotherapy. Immune checkpoint inhibitors targeting PD-1/PD-L1 and CTLA-4 have shown promise for advanced HCC. In this review we discuss about emerging technologies, including artificial intelligence and multi-omics platforms for HCC management by enhancing diagnostic accuracy, identifying novel therapeutic targets, and enabling personalized treatments. Despite these advancements, the prognosis for HCC patients remains poor, underscoring the need for continued research into early detection, innovative therapies, and translational applications to effectively address this global health challenge.

Mixed Modality Classification Abdominal Review Concept GenAI

Survey of AI-Powered Approaches for Osteoporosis Diagnosis in Medical Imaging

Abdul Rahman, Bumshik Lee

•preprint•Sep 29 2025

Osteoporosis silently erodes skeletal integrity worldwide; however, early detection through imaging can prevent most fragility fractures. Artificial intelligence (AI) methods now mine routine Dual-energy X-ray Absorptiometry (DXA), X-ray, Computed Tomography (CT), and Magnetic Resonance Imaging (MRI) scans for subtle, clinically actionable markers, but the literature is fragmented. This survey unifies the field through a tri-axial framework that couples imaging modalities with clinical tasks and AI methodologies (classical machine learning, convolutional neural networks (CNNs), transformers, self-supervised learning, and explainable AI). Following a concise clinical and technical primer, we detail our Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)-guided search strategy, introduce the taxonomy via a roadmap figure, and synthesize cross-study insights on data scarcity, external validation, and interpretability. By identifying emerging trends, open challenges, and actionable research directions, this review provides AI scientists, medical imaging researchers, and musculoskeletal clinicians with a clear compass to accelerate rigorous, patient-centered innovation in osteoporosis care. The project page of this survey can also be found on Github.

Mixed Modality Classification Musculoskeletal Review Concept Open Code

Revealing Shared Tumor Microenvironment Dynamics Related to Microsatellite Instability Across Different Cancers Using Cellular Social Network Analysis

Zamanitajeddin, N., Jahanifar, M., Eastwood, M., Gunesli, G., Arends, M. J., Rajpoot, N.

•preprint•Sep 29 2025

Microsatellite instability (MSI) is a key biomarker for immunotherapy response and prognosis across multiple cancers, yet its identification from routine Hematoxylin and Eosin (H&E) slides remains challenging. Current deep learning predictors often operate as black-box, weakly supervised models trained on individual slides, limiting interpretability, biological insight, and generalization; particularly in low-data regimes. Importantly, systematic quantitative analysis of shared MSI-associated characteristics across different cancer types has not been performed, representing a major gap in understanding conserved tumor microenvironment (TME) patterns linked to MSI. Here, we present a multi-cancer MSI prediction model that leverages pathology foundation models for robust feature extraction and cell-level social network analysis (SNA) to uncover TME patterns associated with MSI. For the MSI prediction task, we introduce a novel transformer-based embedding aggregation method, leveraging attention-guided, multi-case batch training to improve learning efficiency, stability, and interpretability. Our method achieves high predictive performance, with mean AUROCs of 0.86{+/-}0.06 (colorectal cancer), 0.89{+/-}0.06 (stomach adenocarcinoma), and 0.73{+/-}0.06 (uterine corpus endometrial carcinoma) in internal cross-validation on TCGA dataset and AUROC of 0.99 on external PAIP dataset, outperforming state-of-the-art weakly supervised methods (particularly in AUPRC with an average of 0.65 across three cancers). Multi-cancer training further improved generalization (by 3%) via exposing the model to diverse MSI manifestations, enabling robust learning of transferable, domain-invariant histological patterns. To investigate the TME, we constructed cell graphs from high-attention regions, classifying cells as epithelial, inflammatory, mitotic, or connective, and applied SNA metrics to quantify spatial interactions. Across cancers, MSI tumors exhibited increased epithelial cell density and stronger epithelial-inflammatory connectivity, with subtle, context-dependent changes in stromal organization. These features were consistent across univariate and multivariate analyses and supported by expert pathologist review, suggesting the presence of a conserved MSI-associated microenvironmental phenotype. Our proposed prediction algorithm and SNA-driven interpretation advance MSI prediction and uncover interpretable, biologically meaningful MSI signatures shared across colorectal, gastric, and endometrial cancers.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Artificial Intelligence to Detect Developmental Dysplasia of Hip: A Systematic Review.

Bhavsar S, Gowda BB, Bhavsar M, Patole S, Rao S, Rath C

•papers•Sep 28 2025

Deep learning (DL), a branch of artificial intelligence (AI), has been applied to diagnose developmental dysplasia of the hip (DDH) on pelvic radiographs and ultrasound (US) images. This technology can potentially assist in early screening, enable timely intervention and improve cost-effectiveness. We conducted a systematic review to evaluate the diagnostic accuracy of the DL algorithm in detecting DDH. PubMed, Medline, EMBASE, EMCARE, the clinicaltrials.gov (clinical trial registry), IEEE Xplore and Cochrane Library databases were searched in October 2024. Prospective and retrospective cohort studies that included children (< 16 years) at risk of or suspected to have DDH and reported hip ultrasonography (US) or X-ray images using AI were included. A review was conducted using the guidelines of the Cochrane Collaboration Diagnostic Test Accuracy Working Group. Risk of bias was assessed using the QUADAS-2 tool. Twenty-three studies met inclusion criteria, with 15 (n = 8315) evaluating DDH on US images and eight (n = 7091) on pelvic radiographs. The area under the curve of the included studies ranged from 0.80 to 0.99 for pelvic radiographs and 0.90-0.99 for US images. Sensitivity and specificity for detecting DDH on radiographs ranged from 92.86% to 100% and 95.65% to 99.82%, respectively. For US images, sensitivity ranged from 86.54% to 100% and specificity from 62.5% to 100%. AI demonstrated comparable effectiveness to physicians in detecting DDH. However, limited evaluation on external datasets restricts its generalisability. Further research incorporating diverse datasets and real-world applications is needed to assess its broader clinical impact on DDH diagnosis.

Mixed Modality Detection Musculoskeletal Meta Analysis In Silico

Application of deep learning-based convolutional neural networks in gastrointestinal disease endoscopic examination.

Wang YY, Liu B, Wang JH

•papers•Sep 28 2025

Gastrointestinal (GI) diseases, including gastric and colorectal cancers, significantly impact global health, necessitating accurate and efficient diagnostic methods. Endoscopic examination is the primary diagnostic tool; however, its accuracy is limited by operator dependency and interobserver variability. Advancements in deep learning, particularly convolutional neural networks (CNNs), show great potential for enhancing GI disease detection and classification. This review explores the application of CNNs in endoscopic imaging, focusing on polyp and tumor detection, disease classification, endoscopic ultrasound, and capsule endoscopy analysis. We discuss the performance of CNN models with traditional diagnostic methods, highlighting their advantages in accuracy and real-time decision support. Despite promising results, challenges remain, including data availability, model interpretability, and clinical integration. Future directions include improving model generalization, enhancing explainability, and conducting large-scale clinical trials. With continued advancements, CNN-powered artificial intelligence systems could revolutionize GI endoscopy by enhancing early disease detection, reducing diagnostic errors, and improving patient outcomes.

Mixed Modality Classification Abdominal Review Concept Academic Lab GenAI

Adversarial Versus Federated: An Adversarial Learning based Multi-Modality Cross-Domain Federated Medical Segmentation

You Zhou, Lijiang Chen, Shuchang Lyu, Guangxia Cui, Wenpei Bai, Zheng Zhou, Meng Li, Guangliang Cheng, Huiyu Zhou, Qi Zhao

•preprint•Sep 28 2025

Federated learning enables collaborative training of machine learning models among different clients while ensuring data privacy, emerging as the mainstream for breaking data silos in the healthcare domain. However, the imbalance of medical resources, data corruption or improper data preservation may lead to a situation where different clients possess medical images of different modality. This heterogeneity poses a significant challenge for cross-domain medical image segmentation within the federated learning framework. To address this challenge, we propose a new Federated Domain Adaptation (FedDA) segmentation training framework. Specifically, we propose a feature-level adversarial learning among clients by aligning feature maps across clients through embedding an adversarial training mechanism. This design can enhance the model's generalization on multiple domains and alleviate the negative impact from domain-shift. Comprehensive experiments on three medical image datasets demonstrate that our proposed FedDA substantially achieves cross-domain federated aggregation, endowing single modality client with cross-modality processing capabilities, and consistently delivers robust performance compared to state-of-the-art federated aggregation algorithms in objective and subjective assessment. Our code are available at https://github.com/GGbond-study/FedDA.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Q-FSRU: Quantum-Augmented Frequency-Spectral For Medical Visual Question Answering

Rakesh Thakur, Yusra Tariq, Rakesh Chandra Joshi

•preprint•Sep 28 2025

Solving tough clinical questions that require both image and text understanding is still a major challenge in healthcare AI. In this work, we propose Q-FSRU, a new model that combines Frequency Spectrum Representation and Fusion (FSRU) with a method called Quantum Retrieval-Augmented Generation (Quantum RAG) for medical Visual Question Answering (VQA). The model takes in features from medical images and related text, then shifts them into the frequency domain using Fast Fourier Transform (FFT). This helps it focus on more meaningful data and filter out noise or less useful information. To improve accuracy and ensure that answers are based on real knowledge, we add a quantum inspired retrieval system. It fetches useful medical facts from external sources using quantum-based similarity techniques. These details are then merged with the frequency-based features for stronger reasoning. We evaluated our model using the VQA-RAD dataset, which includes real radiology images and questions. The results showed that Q-FSRU outperforms earlier models, especially on complex cases needing image text reasoning. The mix of frequency and quantum information improves both performance and explainability. Overall, this approach offers a promising way to build smart, clear, and helpful AI tools for doctors.

Mixed Modality LLM Radiology Report Methodology In Silico GenAI Benchmark SOTA

Filter Papers

Tags

An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation

BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation

EVLF-FM: Explainable Vision Language Foundation Model for Medicine

Advancement in hepatocellular carcinoma research: Biomarkers, therapeutics approaches and impact of artificial intelligence.

Survey of AI-Powered Approaches for Osteoporosis Diagnosis in Medical Imaging

Revealing Shared Tumor Microenvironment Dynamics Related to Microsatellite Instability Across Different Cancers Using Cellular Social Network Analysis

Artificial Intelligence to Detect Developmental Dysplasia of Hip: A Systematic Review.

Application of deep learning-based convolutional neural networks in gastrointestinal disease endoscopic examination.

Adversarial Versus Federated: An Adversarial Learning based Multi-Modality Cross-Domain Federated Medical Segmentation

Q-FSRU: Quantum-Augmented Frequency-Spectral For Medical Visual Question Answering

Ready to Sharpen Your Edge?