Latest Papers on Radiology AI. Tags: GenAI

LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models

Zhihao Chen, Tao Chen, Chenhui Wang, Qi Gao, Huidong Xie, Chuang Niu, Ge Wang, Hongming Shan

•preprint•Jul 8 2025

Low-dose computed tomography (LDCT) reduces radiation exposure but often degrades image quality, potentially compromising diagnostic accuracy. Existing deep learning-based denoising methods focus primarily on pixel-level mappings, overlooking the potential benefits of high-level semantic guidance. Recent advances in vision-language models (VLMs) suggest that language can serve as a powerful tool for capturing structured semantic information, offering new opportunities to improve LDCT reconstruction. In this paper, we introduce LangMamba, a Language-driven Mamba framework for LDCT denoising that leverages VLM-derived representations to enhance supervision from normal-dose CT (NDCT). LangMamba follows a two-stage learning strategy. First, we pre-train a Language-guided AutoEncoder (LangAE) that leverages frozen VLMs to map NDCT images into a semantic space enriched with anatomical information. Second, we synergize LangAE with two key components to guide LDCT denoising: Semantic-Enhanced Efficient Denoiser (SEED), which enhances NDCT-relevant local semantic while capturing global features with efficient Mamba mechanism, and Language-engaged Dual-space Alignment (LangDA) Loss, which ensures that denoised images align with NDCT in both perceptual and semantic spaces. Extensive experiments on two public datasets demonstrate that LangMamba outperforms conventional state-of-the-art methods, significantly improving detail preservation and visual fidelity. Remarkably, LangAE exhibits strong generalizability to unseen datasets, thereby reducing training costs. Furthermore, LangDA loss improves explainability by integrating language-guided insights into image reconstruction and offers a plug-and-play fashion. Our findings shed new light on the potential of language as a supervisory signal to advance LDCT denoising. The code is publicly available on https://github.com/hao1635/LangMamba.

CT Reconstruction Whole Body Methodology In Silico Academic Lab Open Code GenAI

Foundation models for radiology: fundamentals, applications, opportunities, challenges, risks, and prospects.

Akinci D'Antonoli T, Bluethgen C, Cuocolo R, Klontzas ME, Ponsiglione A, Kocak B

•papers•Jul 8 2025

Foundation models (FMs) represent a significant evolution in artificial intelligence (AI), impacting diverse fields. Within radiology, this evolution offers greater adaptability, multimodal integration, and improved generalizability compared with traditional narrow AI. Utilizing large-scale pre-training and efficient fine-tuning, FMs can support diverse applications, including image interpretation, report generation, integrative diagnostics combining imaging with clinical/laboratory data, and synthetic data creation, holding significant promise for advancements in precision medicine. However, clinical translation of FMs faces several substantial challenges. Key concerns include the inherent opacity of model decision-making processes, environmental and social sustainability issues, risks to data privacy, complex ethical considerations, such as bias and fairness, and navigating the uncertainty of regulatory frameworks. Moreover, rigorous validation is essential to address inherent stochasticity and the risk of hallucination. This international collaborative effort provides a comprehensive overview of the fundamentals, applications, opportunities, challenges, and prospects of FMs, aiming to guide their responsible and effective adoption in radiology and healthcare.

Mixed Modality LLM Radiology Report Whole Body Review Concept Consortium GenAI Ethics Policy

An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.

Patrick Decourcy Hallinan JT, Leow NW, Low YX, Lee A, Ong W, Zhou Chan MD, Devi GK, He SS, De-Liang Loh D, Wei Lim DS, Low XZ, Teo EC, Furqan SM, Yang Tham WW, Tan JH, Kumar N, Makmur A, Yonghan T

•papers•Jul 8 2025

Privacy-preserving large language models (PP-LLMs) hold potential for assisting clinicians with documentation. We evaluated a PP-LLM to improve the clinical information on radiology request forms for musculoskeletal magnetic resonance imaging (MRI) and to automate protocoling, which ensures that the most appropriate imaging is performed. The present retrospective study included musculoskeletal MRI radiology request forms that had been randomly collected from June to December 2023. Studies without electronic medical record (EMR) entries were excluded. An institutional PP-LLM (Claude Sonnet 3.5) augmented the original radiology request forms by mining EMRs, and, in combination with rule-based processing of the LLM outputs, suggested appropriate protocols using institutional guidelines. Clinical information on the original and PP-LLM radiology request forms were compared with use of the RI-RADS (Reason for exam Imaging Reporting and Data System) grading by 2 musculoskeletal (MSK) radiologists independently (MSK1, with 13 years of experience, and MSK2, with 11 years of experience). These radiologists established a consensus reference standard for protocoling, against which the PP-LLM and of 2 second-year board-certified radiologists (RAD1 and RAD2) were compared. Inter-rater reliability was assessed with use of the Gwet AC1, and the percentage agreement with the reference standard was calculated. Overall, 500 musculoskeletal MRI radiology request forms were analyzed for 407 patients (202 women and 205 men with a mean age [and standard deviation] of 50.3 ± 19.5 years) across a range of anatomical regions, including the spine/pelvis (143 MRI scans; 28.6%), upper extremity (169 scans; 33.8%) and lower extremity (188 scans; 37.6%). Two hundred and twenty-two (44.4%) of the 500 MRI scans required contrast. The clinical information provided in the PP-LLM-augmented radiology request forms was rated as superior to that in the original requests. Only 0.4% to 0.6% of PP-LLM radiology request forms were rated as limited/deficient, compared with 12.4% to 22.6% of the original requests (p < 0.001). Almost-perfect inter-rater reliability was observed for LLM-enhanced requests (AC1 = 0.99; 95% confidence interval [CI], 0.99 to 1.0), compared with substantial agreement for the original forms (AC1 = 0.62; 95% CI, 0.56 to 0.67). For protocoling, MSK1 and MSK2 showed almost-perfect agreement on the region/coverage (AC1 = 0.96; 95% CI, 0.95 to 0.98) and contrast requirement (AC1 = 0.98; 95% CI, 0.97 to 0.99). Compared with the consensus reference standard, protocoling accuracy for the PP-LLM was 95.8% (95% CI, 94.0% to 97.6%), which was significantly higher than that for both RAD1 (88.6%; 95% CI, 85.8% to 91.4%) and RAD2 (88.2%; 95% CI, 85.4% to 91.0%) (p < 0.001 for both). Musculoskeletal MRI request form augmentation with an institutional LLM provided superior clinical information and improved protocoling accuracy compared with clinician requests and non-MSK-trained radiologists. Institutional adoption of such LLMs could enhance the appropriateness of MRI utilization and patient care. Diagnostic Level III. See Instructions for Authors for a complete description of levels of evidence.

MRI Triage Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab GenAI

[The standardization and digitalization and intelligentization represent the future development direction of hip arthroscopy diagnosis and treatment technology].

Li CB, Zhang J, Wang L, Wang YT, Kang XQ, Wang MX

•papers•Jul 8 2025

In recent years, hip arthroscopy has made great progress and has been extended to the treatment of intra-articular or periarticular diseases. However, the complex structure of the hip joint, high technical operation requirements and relatively long learning curve have hindered the popularization and development of hip arthroscopy in China. Therefore, on the one hand, it is necessary to promote the research and training of standardized techniques for the diagnosis of hip disease and the treatment of arthroscopic surgery, so as to improve the safety, effectiveness and popularization of the technology. On the other hand, our organization proactively leverages cutting-edge digitalization and intelligentization technologies, including medical image digitalization, medical big data analytics, artificial intelligence, surgical navigation and robotic control, virtual reality, telemedicine, and 5G communication technology. We conduct a range of innovative research and development initiatives such as intelligent-assisted diagnosis of hip diseases, digital preoperative planning, surgical intelligent navigation and robotic procedures, and smart rehabilitation solutions. These efforts aim to facilitate a digitalization and intelligentization leap in technology and continuously enhance the precision of diagnosis and treatment. In conclusion, standardization promotes the homogenization of diagnosis and treatment, while digitalization and intelligentization facilitate the precision of operations. The synergy of the two lays the foundation for personalized diagnosis and treatment and continuous innovation, ultimately driving the rapid development of hip arthroscopy technology.

Mixed Modality Classification Musculoskeletal Review Concept Academic Lab GenAI

The future of multimodal artificial intelligence models for integrating imaging and clinical metadata: a narrative review.

Simon BD, Ozyoruk KB, Gelikman DG, Harmon SA, Türkbey B

•papers•Jul 8 2025

With the ongoing revolution of artificial intelligence (AI) in medicine, the impact of AI in radiology is more pronounced than ever. An increasing number of technical and clinical AI-focused studies are published each day. As these tools inevitably affect patient care and physician practices, it is crucial that radiologists become more familiar with the leading strategies and underlying principles of AI. Multimodal AI models can combine both imaging and clinical metadata and are quickly becoming a popular approach that is being integrated into the medical ecosystem. This narrative review covers major concepts of multimodal AI through the lens of recent literature. We discuss emerging frameworks, including graph neural networks, which allow for explicit learning from non-Euclidean relationships, and transformers, which allow for parallel computation that scales, highlighting existing literature and advocating for a focus on emerging architectures. We also identify key pitfalls in current studies, including issues with taxonomy, data scarcity, and bias. By informing radiologists and biomedical AI experts about existing practices and challenges, we hope to guide the next wave of imaging-based multimodal AI research.

Mixed Modality Review Ethics GenAI

Progress in fully automated abdominal CT interpretation-an update over the past decade.

Batheja V, Summers R

•papers•Jul 8 2025

This article reviews advancements in fully automated abdominal CT interpretation over the past decade, with a focus on automated image analysis techniques such as quantitative analysis, computer-aided detection, and disease classification. For each abdominal organ, we review segmentation techniques, assess clinical applications and performance, and explore methods for detecting/classifying associated pathologies. We also highlight cutting-edge AI developments, including foundation models, large language models, and multimodal image analysis. While challenges remain in integrating AI into radiology practice, recent progress underscores its growing potential to streamline workflows, reduce radiologist burnout, and enhance patient care.

CT Segmentation Abdominal Review In Silico GenAI

Uncovering Neuroimaging Biomarkers of Brain Tumor Surgery with AI-Driven Methods

Carmen Jimenez-Mesa, Yizhou Wan, Guilio Sansone, Francisco J. Martinez-Murcia, Javier Ramirez, Pietro Lio, Juan M. Gorriz, Stephen J. Price, John Suckling, Michail Mamalakis

•preprint•Jul 7 2025

Brain tumor resection is a complex procedure with significant implications for patient survival and quality of life. Predictions of patient outcomes provide clinicians and patients the opportunity to select the most suitable onco-functional balance. In this study, global features derived from structural magnetic resonance imaging in a clinical dataset of 49 pre- and post-surgery patients identified potential biomarkers associated with survival outcomes. We propose a framework that integrates Explainable AI (XAI) with neuroimaging-based feature engineering for survival assessment, offering guidance for surgical decision-making. In this study, we introduce a global explanation optimizer that refines survival-related feature attribution in deep learning models, enhancing interpretability and reliability. Our findings suggest that survival is influenced by alterations in regions associated with cognitive and sensory functions, indicating the importance of preserving areas involved in decision-making and emotional regulation during surgery to improve outcomes. The global explanation optimizer improves both fidelity and comprehensibility of explanations compared to state-of-the-art XAI methods. It effectively identifies survival-related variability, underscoring its relevance in precision medicine for brain tumor treatment.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab GenAI

Leveraging Large Language Models for Accurate AO Fracture Classification from CT Text Reports.

Mergen M, Spitzl D, Ketzer C, Strenzke M, Marka AW, Makowski MR, Bressem KK, Adams LC, Gassert FT

•papers•Jul 7 2025

Large language models (LLMs) have shown promising potential in analyzing complex textual data, including radiological reports. These models can assist clinicians, particularly those with limited experience, by integrating and presenting diagnostic criteria within radiological classifications. However, before clinical adoption, LLMs must be rigorously validated by medical professionals to ensure accuracy, especially in the context of advanced radiological classification systems. This study evaluates the performance of four LLMs-ChatGPT-4o, AmbossGPT, Claude 3.5 Sonnet, and Gemini 2.0 Flash-in classifying fractures based on the AO classification system using CT reports. A dataset of 292 fictitious physician-generated CT reports, representing 310 fractures, was used to assess the accuracy of each LLM in AO fracture classification retrospectively. Performance was evaluated by comparing the models' classifications to ground truth labels, with accuracy rates analyzed across different fracture types and subtypes. ChatGPT-4o and AmbossGPT achieved the highest overall accuracy (74.6 and 74.3%, respectively), outperforming Claude 3.5 Sonnet (69.5%) and Gemini 2.0 Flash (62.7%). Statistically significant differences were observed in fracture type classification, particularly between ChatGPT-4o and Gemini 2.0 Flash (Δ12%, p < 0.001). While all models demonstrated strong bone recognition rates (90-99%), their accuracy in fracture subtype classification remained lower (71-77%), indicating limitations in nuanced diagnostic categorization. LLMs show potential in assisting radiologists with initial fracture classification, particularly in high-volume or resource-limited settings. However, their performance remains inconsistent for detailed subtype classification, highlighting the need for further refinement and validation before clinical integration in advanced diagnostic workflows.

CT Classification Musculoskeletal Retrospective Clinical In Silico GenAI Benchmark SOTA

Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.

Shi MJ, Wang ZX, Wang SK, Li XH, Zhang YL, Yan Y, An R, Dong LN, Qiu L, Tian T, Liu JX, Song HC, Wang YF, Deng C, Cao ZB, Wang HY, Wang Z, Wei W, Song J, Lu J, Wei X, Wang ZC

•papers•Jul 7 2025

Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous. To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons. Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients. This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.

MRI Report Generation Abdominal Retrospective Clinical In Silico Academic Lab GenAI

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

•preprint•Jul 7 2025

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

X-Ray Classification Chest Methodology In Silico Big Tech GenAI Open Code

Filter Papers

Tags

LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models

Foundation models for radiology: fundamentals, applications, opportunities, challenges, risks, and prospects.

An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.

[The standardization and digitalization and intelligentization represent the future development direction of hip arthroscopy diagnosis and treatment technology].

The future of multimodal artificial intelligence models for integrating imaging and clinical metadata: a narrative review.

Progress in fully automated abdominal CT interpretation-an update over the past decade.

Uncovering Neuroimaging Biomarkers of Brain Tumor Surgery with AI-Driven Methods

Leveraging Large Language Models for Accurate AO Fracture Classification from CT Text Reports.

Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.

MedGemma Technical Report

Ready to Sharpen Your Edge?