Sort by:
Page 14 of 38374 results

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

Li R, Mao S, Zhu C, Yang Y, Tan C, Li L, Mu X, Liu H, Yang Y

pubmed logopapersJun 11 2025
The rapid advancements in natural language processing, particularly the development of large language models (LLMs), have opened new avenues for managing complex clinical text data. However, the inherent complexity and specificity of medical texts present significant challenges for the practical application of prompt engineering in diagnostic tasks. This paper explores LLMs with new prompt engineering technology to enhance model interpretability and improve the prediction performance of pulmonary disease based on a traditional deep learning model. A retrospective dataset including 2965 chest CT radiology reports was constructed. The reports were from 4 cohorts, namely, healthy individuals and patients with pulmonary tuberculosis, lung cancer, and pneumonia. Then, a novel prompt engineering strategy that integrates feature summarization (F-Sum), chain of thought (CoT) reasoning, and a hybrid retrieval-augmented generation (RAG) framework was proposed. A feature summarization approach, leveraging term frequency-inverse document frequency (TF-IDF) and K-means clustering, was used to extract and distill key radiological findings related to 3 diseases. Simultaneously, the hybrid RAG framework combined dense and sparse vector representations to enhance LLMs' comprehension of disease-related text. In total, 3 state-of-the-art LLMs, GLM-4-Plus, GLM-4-air (Zhipu AI), and GPT-4o (OpenAI), were integrated with the prompt strategy to evaluate the efficiency in recognizing pneumonia, tuberculosis, and lung cancer. The traditional deep learning model, BERT (Bidirectional Encoder Representations from Transformers), was also compared to assess the superiority of LLMs. Finally, the proposed method was tested on an external validation dataset consisted of 343 chest computed tomography (CT) report from another hospital. Compared with BERT-based prediction model and various other prompt engineering techniques, our method with GLM-4-Plus achieved the best performance on test dataset, attaining an F1-score of 0.89 and accuracy of 0.89. On the external validation dataset, F1-score (0.86) and accuracy (0.92) of the proposed method with GPT-4o were the highest. Compared to the popular strategy with manually selected typical samples (few-shot) and CoT designed by doctors (F1-score=0.83 and accuracy=0.83), the proposed method that summarized disease characteristics (F-Sum) based on LLM and automatically generated CoT performed better (F1-score=0.89 and accuracy=0.90). Although the BERT-based model got similar results on the test dataset (F1-score=0.85 and accuracy=0.88), its predictive performance significantly decreased on the external validation set (F1-score=0.48 and accuracy=0.78). These findings highlight the potential of LLMs to revolutionize pulmonary disease prediction, particularly in resource-constrained settings, by surpassing traditional models in both accuracy and flexibility. The proposed prompt engineering strategy not only improves predictive performance but also enhances the adaptability of LLMs in complex medical contexts, offering a promising tool for advancing disease diagnosis and clinical decision-making.

Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.

Miaojiao S, Xia L, Xian Tao Z, Zhi Liang H, Sheng C, Songsong W

pubmed logopapersJun 11 2025
Breast ultrasound is essential for evaluating breast nodules, with Breast Imaging Reporting and Data System (BI-RADS) providing standardized classification. However, interobserver variability among radiologists can affect diagnostic accuracy. Large language models (LLMs) like ChatGPT-4 have shown potential in medical imaging interpretation. This study explores its feasibility in improving BI-RADS classification consistency and malignancy prediction compared to radiologists. This study aims to evaluate the feasibility of using LLMs, particularly ChatGPT-4, to assess the consistency and diagnostic accuracy of standardized breast ultrasound imaging reports, using pathology as the reference standard. This retrospective study analyzed breast nodule ultrasound data from 671 female patients (mean 45.82, SD 9.20 years; range 26-75 years) who underwent biopsy or surgical excision at our hospital between June 2019 and June 2024. ChatGPT-4 was used to interpret BI-RADS classifications and predict benign versus malignant nodules. The study compared the model's performance to that of two senior radiologists (≥15 years of experience) and two junior radiologists (<5 years of experience) using key diagnostic metrics, including accuracy, sensitivity, specificity, area under the receiver operating characteristic curve, P values, and odds ratios with 95% CIs. Two diagnostic models were evaluated: (1) image interpretation model, where ChatGPT-4 classified nodules based on BI-RADS features, and (2) image-to-text-LLM model, where radiologists provided textual descriptions, and ChatGPT-4 determined malignancy probability based on keywords. Radiologists were blinded to pathological outcomes, and BI-RADS classifications were finalized through consensus. ChatGPT-4 achieved an overall BI-RADS classification accuracy of 96.87%, outperforming junior radiologists (617/671, 91.95% and 604/671, 90.01%, P<.01). For malignancy prediction, ChatGPT-4 achieved an area under the receiver operating characteristic curve of 0.82 (95% CI 0.79-0.85), an accuracy of 80.63% (541/671 cases), a sensitivity of 90.56% (259/286 cases), and a specificity of 73.51% (283/385 cases). The image interpretation model demonstrated performance comparable to senior radiologists, while the image-to-text-LLM model further improved diagnostic accuracy for all radiologists, increasing their sensitivity and specificity significantly (P<.001). Statistical analyses, including the McNemar test and DeLong test, confirmed that ChatGPT-4 outperformed junior radiologists (P<.01) and showed noninferiority compared to senior radiologists (P>.05). Pathological diagnoses served as the reference standard, ensuring robust evaluation reliability. Integrating ChatGPT-4 into an image-to-text-LLM workflow improves BI-RADS classification accuracy and supports radiologists in breast ultrasound diagnostics. These results demonstrate its potential as a decision-support tool to enhance diagnostic consistency and reduce variability.

HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding

Yanzhao Shi, Xiaodan Zhang, Junzhong Ji, Haoning Jiang, Chengxin Zheng, Yinong Wang, Liangqiong Qu

arxiv logopreprintJun 11 2025
Automated 3D CT diagnosis empowers clinicians to make timely, evidence-based decisions by enhancing diagnostic accuracy and workflow efficiency. While multimodal large language models (MLLMs) exhibit promising performance in visual-language understanding, existing methods mainly focus on 2D medical images, which fundamentally limits their ability to capture complex 3D anatomical structures. This limitation often leads to misinterpretation of subtle pathologies and causes diagnostic hallucinations. In this paper, we present Hybrid Spatial Encoding Network (HSENet), a framework that exploits enriched 3D medical visual cues by effective visual perception and projection for accurate and robust vision-language understanding. Specifically, HSENet employs dual-3D vision encoders to perceive both global volumetric contexts and fine-grained anatomical details, which are pre-trained by dual-stage alignment with diagnostic reports. Furthermore, we propose Spatial Packer, an efficient multimodal projector that condenses high-resolution 3D spatial regions into a compact set of informative visual tokens via centroid-based compression. By assigning spatial packers with dual-3D vision encoders, HSENet can seamlessly perceive and transfer hybrid visual representations to LLM's semantic space, facilitating accurate diagnostic text generation. Experimental results demonstrate that our method achieves state-of-the-art performance in 3D language-visual retrieval (39.85% of R@100, +5.96% gain), 3D medical report generation (24.01% of BLEU-4, +8.01% gain), and 3D visual question answering (73.60% of Major Class Accuracy, +1.99% gain), confirming its effectiveness. Our code is available at https://github.com/YanzhaoShi/HSENet.

Efficacy of a large language model in classifying branch-duct intraductal papillary mucinous neoplasms.

Sato M, Yasaka K, Abe S, Kurashima J, Asari Y, Kiryu S, Abe O

pubmed logopapersJun 11 2025
Appropriate categorization based on magnetic resonance imaging (MRI) findings is important for managing intraductal papillary mucinous neoplasms (IPMNs). In this study, a large language model (LLM) that classifies IPMNs based on MRI findings was developed, and its performance was compared with that of less experienced human readers. The medical image management and processing systems of our hospital were searched to identify MRI reports of branch-duct IPMNs (BD-IPMNs). They were assigned to the training, validation, and testing datasets in chronological order. The model was trained on the training dataset, and the best-performing model on the validation dataset was evaluated on the test dataset. Furthermore, two radiology residents (Readers 1 and 2) and an intern (Reader 3) manually sorted the reports in the test dataset. The accuracy, sensitivity, and time required for categorizing were compared between the model and readers. The accuracy of the fine-tuned LLM for the test dataset was 0.966, which was comparable to that of Readers 1 and 2 (0.931-0.972) and significantly better than that of Reader 3 (0.907). The fine-tuned LLM had an area under the receiver operating characteristic curve of 0.982 for the classification of cyst diameter ≥ 10 mm, which was significantly superior to that of Reader 3 (0.944). Furthermore, the fine-tuned LLM (25 s) completed the test dataset faster than the readers (1,887-2,646 s). The fine-tuned LLM classified BD-IPMNs based on MRI findings with comparable performance to that of radiology residents and significantly reduced the time required.

Conditional diffusion models for guided anomaly detection in brain images using fluid-driven anomaly randomization

Ana Lawry Aguila, Peirong Liu, Oula Puonti, Juan Eugenio Iglesias

arxiv logopreprintJun 11 2025
Supervised machine learning has enabled accurate pathology detection in brain MRI, but requires training data from diseased subjects that may not be readily available in some scenarios, for example, in the case of rare diseases. Reconstruction-based unsupervised anomaly detection, in particular using diffusion models, has gained popularity in the medical field as it allows for training on healthy images alone, eliminating the need for large disease-specific cohorts. These methods assume that a model trained on normal data cannot accurately represent or reconstruct anomalies. However, this assumption often fails with models failing to reconstruct healthy tissue or accurately reconstruct abnormal regions i.e., failing to remove anomalies. In this work, we introduce a novel conditional diffusion model framework for anomaly detection and healthy image reconstruction in brain MRI. Our weakly supervised approach integrates synthetically generated pseudo-pathology images into the modeling process to better guide the reconstruction of healthy images. To generate these pseudo-pathologies, we apply fluid-driven anomaly randomization to augment real pathology segmentation maps from an auxiliary dataset, ensuring that the synthetic anomalies are both realistic and anatomically coherent. We evaluate our model's ability to detect pathology, using both synthetic anomaly datasets and real pathology from the ATLAS dataset. In our extensive experiments, our model: (i) consistently outperforms variational autoencoders, and conditional and unconditional latent diffusion; and (ii) surpasses on most datasets, the performance of supervised inpainting methods with access to paired diseased/healthy images.

Slide-free surface histology enables rapid colonic polyp interpretation across specialties and foundation AI

Yong, A., Husna, N., Tan, K. H., Manek, G., Sim, R., Loi, R., Lee, O., Tang, S., Soon, G., Chan, D., Liang, K.

medrxiv logopreprintJun 11 2025
Colonoscopy is a mainstay of colorectal cancer screening and has helped to lower cancer incidence and mortality. The resection of polyps during colonoscopy is critical for tissue diagnosis and prevention of colorectal cancer, albeit resulting in increased resource requirements and expense. Discarding resected benign polyps without sending for histopathological processing and confirmatory diagnosis, known as the resect and discard strategy, could enhance efficiency but is not commonly practiced due to endoscopists predominant preference for pathological confirmation. The inaccessibility of histopathology from unprocessed resected tissue hampers endoscopic decisions. We show that intraprocedural fibre-optic microscopy with ultraviolet-C surface excitation (FUSE) of polyps post-resection enables rapid diagnosis, potentially complementing endoscopic interpretation and incorporating pathologist oversight. In a clinical study of 28 patients, slide-free FUSE microscopy of freshly resected polyps yielded mucosal views that greatly magnified the surface patterns observed on endoscopy and revealed previously unavailable histopathological signatures. We term this new cross-specialty readout surface histology. In blinded interpretations of 42 polyps (19 training, 23 reading) by endoscopists and pathologists of varying experience, surface histology differentiated normal/benign, low-grade dysplasia, and high-grade dysplasia and cancer, with 100% performance in classifying high/low risk. This FUSE dataset was also successfully interpreted by foundation AI models pretrained on histopathology slides, illustrating a new potential for these models to not only expedite conventional pathology tasks but also autonomously provide instant expert feedback during procedures that typically lack pathologists. Surface histology readouts during colonoscopy promise to empower endoscopist decisions and broadly enhance confidence and participation in resect and discard. One Sentence SummaryRapid microscopy of resected polyps during colonoscopy yielded accurate diagnoses, promising to enhance colorectal screening.

RadGPT: A system based on a large language model that generates sets of patient-centered materials to explain radiology report information.

Herwald SE, Shah P, Johnston A, Olsen C, Delbrouck JB, Langlotz CP

pubmed logopapersJun 10 2025
The Cures Act Final Rule requires that patients have real-time access to their radiology reports, which contain technical language. Our objective to was to use a novel system called RadGPT, which integrates concept extraction and a large language model (LLM), to help patients understand their radiology reports. RadGPT generated 150 concept explanations and 390 question-and-answer pairs from 30 radiology report impressions from between 2012 and 2020. The extracted concepts were used to create concept-based explanations, as well as concept-based question-and-answer pairs where questions were generated using either a fixed template or an LLM. Additionally, report-based question-and-answer pairs were generated directly from the impression using an LLM without concept extraction. One board-certified radiologist and 4 radiology residents rated the material quality using a standardized rubric. Concept-based LLM-generated questions were significantly higher quality than concept-based template-generated questions (p < 0.001). Excluding those template-based question-and-answer pairs from further analysis, nearly all (> 95%) of RadGPT-generated materials were rated highly, with at least 50% receiving the highest possible ranking from all 5 raters. No answers or explanations were rated as likely to affect the safety or effectiveness of patient care. Report-level LLM-based questions and answers were rated particularly highly, with 92% of report-level LLM-based questions and 61% of the corresponding report-level answers receiving the highest rating from all raters. The educational tool RadGPT generated high-quality explanations and question-and-answer pairs that were personalized for each radiology report, unlikely to produce harmful explanations and likely to enhance patient understanding of radiology information.

Foundation Models in Medical Imaging -- A Review and Outlook

Vivien van Veldhuizen, Vanessa Botha, Chunyao Lu, Melis Erdal Cesur, Kevin Groot Lipman, Edwin D. de Jong, Hugo Horlings, Clárisa I. Sanchez, Cees G. M. Snoek, Lodewyk Wessels, Ritse Mann, Eric Marcus, Jonas Teuwen

arxiv logopreprintJun 10 2025
Foundation models (FMs) are changing the way medical images are analyzed by learning from large collections of unlabeled data. Instead of relying on manually annotated examples, FMs are pre-trained to learn general-purpose visual features that can later be adapted to specific clinical tasks with little additional supervision. In this review, we examine how FMs are being developed and applied in pathology, radiology, and ophthalmology, drawing on evidence from over 150 studies. We explain the core components of FM pipelines, including model architectures, self-supervised learning methods, and strategies for downstream adaptation. We also review how FMs are being used in each imaging domain and compare design choices across applications. Finally, we discuss key challenges and open questions to guide future research.

MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding

Shivang Chopra, Lingchao Mao, Gabriela Sanchez-Rodriguez, Andrew J Feola, Jing Li, Zsolt Kira

arxiv logopreprintJun 10 2025
Different medical imaging modalities capture diagnostic information at varying spatial resolutions, from coarse global patterns to fine-grained localized structures. However, most existing vision-language frameworks in the medical domain apply a uniform strategy for local feature extraction, overlooking the modality-specific demands. In this work, we present MedMoE, a modular and extensible vision-language processing framework that dynamically adapts visual representation based on the diagnostic context. MedMoE incorporates a Mixture-of-Experts (MoE) module conditioned on the report type, which routes multi-scale image features through specialized expert branches trained to capture modality-specific visual semantics. These experts operate over feature pyramids derived from a Swin Transformer backbone, enabling spatially adaptive attention to clinically relevant regions. This framework produces localized visual representations aligned with textual descriptions, without requiring modality-specific supervision at inference. Empirical results on diverse medical benchmarks demonstrate that MedMoE improves alignment and retrieval performance across imaging modalities, underscoring the value of modality-specialized visual representations in clinical vision-language systems.

MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding

Shivang Chopra, Gabriela Sanchez-Rodriguez, Lingchao Mao, Andrew J Feola, Jing Li, Zsolt Kira

arxiv logopreprintJun 10 2025
Different medical imaging modalities capture diagnostic information at varying spatial resolutions, from coarse global patterns to fine-grained localized structures. However, most existing vision-language frameworks in the medical domain apply a uniform strategy for local feature extraction, overlooking the modality-specific demands. In this work, we present MedMoE, a modular and extensible vision-language processing framework that dynamically adapts visual representation based on the diagnostic context. MedMoE incorporates a Mixture-of-Experts (MoE) module conditioned on the report type, which routes multi-scale image features through specialized expert branches trained to capture modality-specific visual semantics. These experts operate over feature pyramids derived from a Swin Transformer backbone, enabling spatially adaptive attention to clinically relevant regions. This framework produces localized visual representations aligned with textual descriptions, without requiring modality-specific supervision at inference. Empirical results on diverse medical benchmarks demonstrate that MedMoE improves alignment and retrieval performance across imaging modalities, underscoring the value of modality-specialized visual representations in clinical vision-language systems.
Page 14 of 38374 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.