Sort by:
Page 46 of 78779 results

Using a large language model for post-deployment monitoring of FDA approved AI: pulmonary embolism detection use case.

Sorin V, Korfiatis P, Bratt AK, Leiner T, Wald C, Butler C, Cook CJ, Kline TL, Collins JD

pubmed logopapersJun 30 2025
Artificial intelligence (AI) is increasingly integrated into clinical workflows. The performance of AI in production can diverge from initial evaluations. Post-deployment monitoring (PDM) remains a challenging ingredient of ongoing quality assurance once AI is deployed in clinical production. To develop and evaluate a PDM framework that uses large language models (LLMs) for free-text classification of radiology reports, and human oversight. We demonstrate its application to monitor a commercially vended pulmonary embolism (PE) detection AI (CVPED). We retrospectively analyzed 11,999 CT pulmonary angiography (CTPA) studies performed between 04/30/2023-06/17/2024. Ground truth was determined by combining LLM-based radiology-report classification and the CVPED outputs, with human review of discrepancies. We simulated a daily monitoring framework to track discrepancies between CVPED and the LLM. Drift was defined when discrepancy rate exceeded a fixed 95% confidence interval (CI) for seven consecutive days. The CI and the optimal retrospective assessment period were determined from a stable dataset with consistent performance. We simulated drift by systematically altering CVPED or LLM sensitivity and specificity, and we modeled an approach to detect data shifts. We incorporated a human-in-the-loop selective alerting framework for continuous prospective evaluation and to investigate potential for incremental detection. Of 11,999 CTPAs, 1,285 (10.7%) had PE. Overall, 373 (3.1%) had discrepant classifications between CVPED and LLM. Among 111 CVPED-positive and LLM-negative cases, 29 would have triggered an alert due to the radiologist not interacting with CVPED. Of those, 24 were CVPED false-positives, one was an LLM false-negative, and the framework ultimately identified four true-alerts for incremental PE cases. The optimal retrospective assessment period for drift detection was determined to be two months. A 2-3% decline in model specificity caused a 2-3-fold increase in discrepancies, while a 10% drop in sensitivity was required to produce a similar effect. For example, a 2.5% drop in LLM specificity led to a 1.7-fold increase in CVPED-negative-LLM-positive discrepancies, which would have taken 22 days to detect using the proposed framework. A PDM framework combining LLM-based free-text classification with a human-in-the-loop alerting system can continuously track an image-based AI's performance, alert for performance drift, and provide incremental clinical value.

$μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

Siyou Li, Pengyao Qin, Huanan Wu, Dong Nie, Arun J. Thirunavukarasu, Juntao Yu, Le Zhang

arxiv logopreprintJun 30 2025
Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficulty in objectively evaluating discrepancies between model-generated and expert-written reports. To address these challenges, we propose $\mu^2$LLM, a $\underline{\textbf{mu}}$ltiscale $\underline{\textbf{mu}}$ltimodal large language models for RRG tasks. The novel ${\mu}^2$Tokenizer, as an intermediate layer, integrates multi-modal features from the multiscale visual tokenizer and the text tokenizer, then enhances report generation quality through direct preference optimization (DPO), guided by GREEN-RedLlama. Experimental results on four large CT image-report medical datasets demonstrate that our method outperforms existing approaches, highlighting the potential of our fine-tuned $\mu^2$LLMs on limited data for RRG tasks. At the same time, for prompt engineering, we introduce a five-stage, LLM-driven pipeline that converts routine CT reports into paired visual-question-answer triples and citation-linked reasoning narratives, creating a scalable, high-quality supervisory corpus for explainable multimodal radiology LLM. All code, datasets, and models will be publicly available in our official repository. https://github.com/Siyou-Li/u2Tokenizer

VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation

Peng Huang, Junhu Fu, Bowen Guo, Zeju Li, Yuanyuan Wang, Yi Guo

arxiv logopreprintJun 30 2025
As the appearance of medical images is influenced by multiple underlying factors, generative models require rich attribute information beyond labels to produce realistic and diverse images. For instance, generating an image of skin lesion with specific patterns demands descriptions that go beyond diagnosis, such as shape, size, texture, and color. However, such detailed descriptions are not always accessible. To address this, we explore a framework, termed Visual Attribute Prompts (VAP)-Diffusion, to leverage external knowledge from pre-trained Multi-modal Large Language Models (MLLMs) to improve the quality and diversity of medical image generation. First, to derive descriptions from MLLMs without hallucination, we design a series of prompts following Chain-of-Thoughts for common medical imaging tasks, including dermatologic, colorectal, and chest X-ray images. Generated descriptions are utilized during training and stored across different categories. During testing, descriptions are randomly retrieved from the corresponding category for inference. Moreover, to make the generator robust to unseen combination of descriptions at the test time, we propose a Prototype Condition Mechanism that restricts test embeddings to be similar to those from training. Experiments on three common types of medical imaging across four datasets verify the effectiveness of VAP-Diffusion.

Precision and Personalization: How Large Language Models Redefining Diagnostic Accuracy in Personalized Medicine - A Systematic Literature Review.

Aththanagoda AKNL, Kulathilake KASH, Abdullah NA

pubmed logopapersJun 30 2025
Personalized medicine aims to tailor medical treatments to the unique characteristics of each patient, but its effectiveness relies on achieving diagnostic accuracy to fully understand individual variability in disease response and treatment efficacy. This systematic literature review explores the role of large language models (LLMs) in enhancing diagnostic precision and supporting the advancement of personalized medicine. A comprehensive search was conducted across Web of Science, Science Direct, Scopus, and IEEE Xplore, targeting peer-reviewed articles published in English between January 2020 and March 2025 that applied LLMs within personalized medicine contexts. Following PRISMA guidelines, 39 relevant studies were selected and systematically analyzed. The findings indicate a growing integration of LLMs across key domains such as clinical informatics, medical imaging, patient-specific diagnosis, and clinical decision support. LLMs have shown potential in uncovering subtle data patterns critical for accurate diagnosis and personalized treatment planning. This review highlights the expanding role of LLMs in improving diagnostic accuracy in personalized medicine, offering insights into their performance, applications, and challenges, while also acknowledging limitations in generalizability due to variable model performance and dataset biases. The review highlights the importance of addressing challenges related to data privacy, model interpretability, and reliability across diverse clinical scenarios. For successful clinical integration, future research must focus on refining LLM technologies, ensuring ethical standards, and validating models continuously to safeguard effective and responsible use in healthcare environments.

Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

Yang Zhou, Chrystie Wan Ning Quek, Jun Zhou, Yan Wang, Yang Bai, Yuhe Ke, Jie Yao, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting

arxiv logopreprintJun 30 2025
Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundation model trained using self-supervised learning and a memory module. MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities, including computed tomography (CT), chest X-rays (CXR), ultrasound (US), pathology patches, color fundus photography (CFP), optical coherence tomography (OCT) and dermatology images. MerMED-FM was evaluated across multiple diseases and compared against existing foundational models. Strong performance was achieved across all modalities, with AUROCs of 0.988 (OCT); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (skin); 0.894 (CFP); 0.858 (CXR). MerMED-FM has the potential to be a highly adaptable, versatile, cross-specialty foundation model that enables robust medical imaging interpretation across diverse medical disciplines.

Towards 3D Semantic Image Synthesis for Medical Imaging

Wenwu Tang, Khaled Seyam, Bin Yang

arxiv logopreprintJun 30 2025
In the medical domain, acquiring large datasets is challenging due to both accessibility issues and stringent privacy regulations. Consequently, data availability and privacy protection are major obstacles to applying machine learning in medical imaging. To address this, our study proposes the Med-LSDM (Latent Semantic Diffusion Model), which operates directly in the 3D domain and leverages de-identified semantic maps to generate synthetic data as a method of privacy preservation and data augmentation. Unlike many existing methods that focus on generating 2D slices, Med-LSDM is designed specifically for 3D semantic image synthesis, making it well-suited for applications requiring full volumetric data. Med-LSDM incorporates a guiding mechanism that controls the 3D image generation process by applying a diffusion model within the latent space of a pre-trained VQ-GAN. By operating in the compressed latent space, the model significantly reduces computational complexity while still preserving critical 3D spatial details. Our approach demonstrates strong performance in 3D semantic medical image synthesis, achieving a 3D-FID score of 0.0054 on the conditional Duke Breast dataset and similar Dice scores (0.70964) to those of real images (0.71496). These results demonstrate that the synthetic data from our model have a small domain gap with real data and are useful for data augmentation.

MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation

Sunggu Kyung, Jinyoung Seo, Hyunseok Lim, Dongyeong Kim, Hyungbin Park, Jimin Sung, Jihyun Kim, Wooyoung Jo, Yoojin Nam, Namkug Kim

arxiv logopreprintJun 29 2025
The recent release of RadGenome-Chest CT has significantly advanced CT-based report generation. However, existing methods primarily focus on global features, making it challenging to capture region-specific details, which may cause certain abnormalities to go unnoticed. To address this, we propose MedRegion-CT, a region-focused Multi-Modal Large Language Model (MLLM) framework, featuring three key innovations. First, we introduce Region Representative ($R^2$) Token Pooling, which utilizes a 2D-wise pretrained vision model to efficiently extract 3D CT features. This approach generates global tokens representing overall slice features and region tokens highlighting target areas, enabling the MLLM to process comprehensive information effectively. Second, a universal segmentation model generates pseudo-masks, which are then processed by a mask encoder to extract region-centric features. This allows the MLLM to focus on clinically relevant regions, using six predefined region masks. Third, we leverage segmentation results to extract patient-specific attributions, including organ size, diameter, and locations. These are converted into text prompts, enriching the MLLM's understanding of patient-specific contexts. To ensure rigorous evaluation, we conducted benchmark experiments on report generation using the RadGenome-Chest CT. MedRegion-CT achieved state-of-the-art performance, outperforming existing methods in natural language generation quality and clinical relevance while maintaining interpretability. The code for our framework is publicly available.

Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation

Hongyi Pan, Ziliang Hong, Gorkem Durak, Ziyue Xu, Ulas Bagci

arxiv logopreprintJun 29 2025
Federated learning (FL) has emerged as a promising paradigm for collaboratively training deep learning models across institutions without exchanging sensitive medical data. However, its effectiveness is often hindered by limited data availability and non-independent, identically distributed data across participating clients, which can degrade model performance and generalization. To address these challenges, we propose a generative AI based data augmentation framework that integrates synthetic image sharing into the federated training process for breast cancer diagnosis via ultrasound images. Specifically, we train two simple class-specific Deep Convolutional Generative Adversarial Networks: one for benign and one for malignant lesions. We then simulate a realistic FL setting using three publicly available breast ultrasound image datasets: BUSI, BUS-BRA, and UDIAT. FedAvg and FedProx are adopted as baseline FL algorithms. Experimental results show that incorporating a suitable number of synthetic images improved the average AUC from 0.9206 to 0.9237 for FedAvg and from 0.9429 to 0.9538 for FedProx. We also note that excessive use of synthetic data reduced performance, underscoring the importance of maintaining a balanced ratio of real and synthetic samples. Our findings highlight the potential of generative AI based data augmentation to enhance FL results in the breast ultrasound image classification task.

Novel Artificial Intelligence-Driven Infant Meningitis Screening From High-Resolution Ultrasound Imaging.

Sial HA, Carandell F, Ajanovic S, Jiménez J, Quesada R, Santos F, Buck WC, Sidat M, Bassat Q, Jobst B, Petrone P

pubmed logopapersJun 28 2025
Infant meningitis can be a life-threatening disease and requires prompt and accurate diagnosis to prevent severe outcomes or death. Gold-standard diagnosis requires lumbar puncture (LP) to obtain and analyze cerebrospinal fluid (CSF). Despite being standard practice, LPs are invasive, pose risks for the patient and often yield negative results, either due to contamination with red blood cells from the puncture itself or because LPs are routinely performed to rule out a life-threatening infection, despite the disease's relatively low incidence. Furthermore, in low-income settings where incidence is the highest, LPs and CSF exams are rarely feasible, and suspected meningitis cases are generally treated empirically. There is a growing need for non-invasive, accurate diagnostic methods. We developed a three-stage deep learning framework using Neosonics ultrasound technology for 30 infants with suspected meningitis and a permeable fontanelle at three Spanish University Hospitals (from 2021 to 2023). In stage 1, 2194 images were processed for quality control using a vessel/non-vessel model, with a focus on vessel identification and manual removal of images exhibiting artifacts such as poor coupling and clutter. This refinement process resulted in a final cohort comprising 16 patients-6 cases (336 images) and 10 controls (445 images), yielding 781 images for the second stage. The second stage involved the use of a deep learning model to classify images based on a white blood cell count threshold (set at 30 cells/mm<sup>3</sup>) into control or meningitis categories. The third stage integrated explainable artificial intelligence (XAI) methods, such as Grad-CAM visualizations, alongside image statistical analysis, to provide transparency and interpretability of the model's decision-making process in our artificial intelligence-driven screening tool. Our approach achieved 96% accuracy in quality control and 93% precision and 92% accuracy in image-level meningitis detection, with an overall patient-level accuracy of 94%. It identified 6 meningitis cases and 10 controls with 100% sensitivity and 90% specificity, demonstrating only a single misclassification. The use of gradient-weighted class activation mapping-based XAI significantly enhanced diagnostic interpretability, and to further refine our insights we incorporated a statistics-based XAI approach. By analyzing image metrics such as entropy and standard deviation, we identified texture variations in the images attributable to the presence of cells, which improved the interpretability of our diagnostic tool. This study supports the efficacy of a multi-stage deep learning model for non-invasive screening of infant meningitis and its potential to guide the need for LPs. It also highlights the transformative potential of artificial intelligence in medical diagnostic screening for neonatal health care, paving the way for future research and innovations.

Radio DINO: A foundation model for advanced radiomics and AI-driven medical imaging analysis.

Zedda L, Loddo A, Di Ruberto C

pubmed logopapersJun 28 2025
Radiomics is transforming medical imaging by extracting complex features that enhance disease diagnosis, prognosis, and treatment evaluation. However, traditional approaches face significant challenges, such as the need for manual feature engineering, high dimensionality, and limited sample sizes. This paper presents Radio DINO, a novel family of deep learning foundation models that leverage self-supervised learning (SSL) techniques from DINO and DINOV2, pretrained on the RadImageNet dataset. The novelty of our approach lies in (1) developing Radio DINO to capture rich semantic embeddings, enabling robust feature extraction without manual intervention, (2) demonstrating superior performance across various clinical tasks on the MedMNISTv2 dataset, surpassing existing models, and (3) enhancing the interpretability of the model by providing visualizations that highlight its focus on clinically relevant image regions. Our results show that Radio DINO has the potential to democratize advanced radiomics tools, making them accessible to healthcare institutions with limited resources and ultimately improving diagnostic and prognostic outcomes in radiology.
Page 46 of 78779 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.