Sort by:
Page 3 of 431 results

Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective.

Moassefi M, Houshmand S, Faghani S, Chang PD, Sun SH, Khosravi B, Triphati AG, Rasool G, Bhatia NK, Folio L, Andriole KP, Gichoya JW, Erickson BJ

pubmed logopapersMay 8 2025
The rapid evolution of large language models (LLMs) offers promising opportunities for radiology report annotation, aiding in determining the presence of specific findings. This study evaluates the effectiveness of a human-optimized prompt in labeling radiology reports across multiple institutions using LLMs. Six distinct institutions collected 500 radiology reports: 100 in each of 5 categories. A standardized Python script was distributed to participating sites, allowing the use of one common locally executed LLM with a standard human-optimized prompt. The script executed the LLM's analysis for each report and compared predictions to reference labels provided by local investigators. Models' performance using accuracy was calculated, and results were aggregated centrally. The human-optimized prompt demonstrated high consistency across sites and pathologies. Preliminary analysis indicates significant agreement between the LLM's outputs and investigator-provided reference across multiple institutions. At one site, eight LLMs were systematically compared, with Llama 3.1 70b achieving the highest performance in accurately identifying the specified findings. Comparable performance with Llama 3.1 70b was observed at two additional centers, demonstrating the model's robust adaptability to variations in report structures and institutional practices. Our findings illustrate the potential of optimized prompt engineering in leveraging LLMs for cross-institutional radiology report labeling. This approach is straightforward while maintaining high accuracy and adaptability. Future work will explore model robustness to diverse report structures and further refine prompts to improve generalizability.

Are Diffusion Models Effective Good Feature Extractors for MRI Discriminative Tasks?

Li B, Sun Z, Li C, Kamagata K, Andica C, Uchida W, Takabayashi K, Guo S, Zou R, Aoki S, Tanaka T, Zhao Q

pubmed logopapersMay 8 2025
Diffusion models (DMs) excel in pixel-level and spatial tasks and are proven feature extractors for 2D image discriminative tasks when pretrained. However, their capabilities in 3D MRI discriminative tasks remain largely untapped. This study seeks to assess the effectiveness of DMs in this underexplored area. We use 59830 T1-weighted MR images (T1WIs) from the extensive, yet unlabeled, UK Biobank dataset. Additionally, we apply 369 T1WIs from the BraTS2020 dataset specifically for brain tumor classification, and 421 T1WIs from the ADNI1 dataset for the diagnosis of Alzheimer's disease. Firstly, a high-performing denoising diffusion probabilistic model (DDPM) with a U-Net backbone is pretrained on the UK Biobank, then fine-tuned on the BraTS2020 and ADNI1 datasets. Afterward, we assess its feature representation capabilities for discriminative tasks using linear probes. Finally, we accordingly introduce a novel fusion module, named CATS, that enhances the U-Net representations, thereby improving performance on discriminative tasks. Our DDPM produces synthetic images of high quality that match the distribution of the raw datasets. Subsequent analysis reveals that DDPM features extracted from middle blocks and smaller timesteps are of high quality. Leveraging these features, the CATS module, with just 1.7M additional parameters, achieved average classification scores of 0.7704 and 0.9217 on the BraTS2020 and ADNI1 datasets, demonstrating competitive performance with that of the representations extracted from the transferred DDPM model, as well as the 33.23M parameters ResNet18 trained from scratch. We have found that pretraining a DM on a large-scale dataset and then fine-tuning it on limited data from discriminative datasets is a viable approach for MRI data. With these well-performing DMs, we show that they excel not just in generation tasks but also as feature extractors when combined with our proposed CATS module.

A hybrid AI method for lung cancer classification using explainable AI techniques.

Shivwanshi RR, Nirala NS

pubmed logopapersMay 8 2025
The use of Artificial Intelligence (AI) methods for the analysis of CT (computed tomography) images has greatly contributed to the development of an effective computer-assisted diagnosis (CAD) system for lung cancer (LC). However, complex structures, multiple radiographic interrelations, and the dynamic locations of abnormalities within lung CT images make extracting relevant information to process and implement LC CAD systems difficult. These prominent problems are addressed in this paper by presenting a hybrid method of LC malignancy classification, which may help researchers and experts properly engineer the model's performance by observing how the model makes decisions. The proposed methodology is named IncCat-LCC: Explainer (Inception Net Cat Boost LC Classification: Explainer), which consists of feature extraction (FE) using the handcrafted radiomic Feature (HcRdF) extraction technique, InceptionNet CNN Feature (INCF) extraction, Vision Transformer Feature (ViTF) extraction, and XGBOOST (XGB)-based feature selection, and the GPU based CATBOOST (CB) classification technique. The proposed framework achieves better and highest performance scores for lung nodule multiclass malignancy classification when evaluated using metrics such as accuracy, precision, recall, f-1 score, specificity, and area under the roc curve as 96.74 %, 93.68 %, 96.74 %, 95.19 %, 98.47 % and 99.76 % consecutively for classifying highly normal class. Observing the explainable artificial intelligence (XAI) explanations will help readers understand the model performance and the statistical outcomes of the evaluation parameter. The work presented in this article may improve the existing LC CAD system and help assess the important parameters using XAI to recognize the factors contributing to enhanced performance and reliability.

From Genome to Phenome: Opportunities and Challenges of Molecular Imaging.

Tian M, Hood L, Chiti A, Schwaiger M, Minoshima S, Watanabe Y, Kang KW, Zhang H

pubmed logopapersMay 8 2025
The study of the human phenome is essential for understanding the complexities of wellness and disease and their transitions, with molecular imaging being a vital tool in this exploration. Molecular imaging embodies the 4 principles of human phenomics: precise measurement, accurate calculation or analysis, well-controlled manipulation or intervention, and innovative invention or creation. Its application has significantly enhanced the precision, individualization, and effectiveness of medical interventions. This article provides an overview of molecular imaging's technologic advancements and presents the potential use of molecular imaging in human phenomics and precision medicine. The integration of molecular imaging with multiomics data and artificial intelligence has the potential to transform health care, promoting proactive and preventive strategies. This evolving approach promises to deepen our understanding of the human phenome, lead to preclinical diagnostics and treatments, and establish quantitative frameworks for precision health management.

Weakly supervised language models for automated extraction of critical findings from radiology reports.

Das A, Talati IA, Chaves JMZ, Rubin D, Banerjee I

pubmed logopapersMay 8 2025
Critical findings in radiology reports are life threatening conditions that need to be communicated promptly to physicians for timely management of patients. Although challenging, advancements in natural language processing (NLP), particularly large language models (LLMs), now enable the automated identification of key findings from verbose reports. Given the scarcity of labeled critical findings data, we implemented a two-phase, weakly supervised fine-tuning approach on 15,000 unlabeled Mayo Clinic reports. This fine-tuned model then automatically extracted critical terms on internal (Mayo Clinic, n = 80) and external (MIMIC-III, n = 123) test datasets, validated against expert annotations. Model performance was further assessed on 5000 MIMIC-IV reports using LLM-aided metrics, G-eval and Prometheus. Both manual and LLM-based evaluations showed improved task alignment with weak supervision. The pipeline and model, publicly available under an academic license, can aid in critical finding extraction for research and clinical use ( https://github.com/dasavisha/CriticalFindings_Extract ).

ChatOCT: Embedded Clinical Decision Support Systems for Optical Coherence Tomography in Offline and Resource-Limited Settings.

Liu C, Zhang H, Zheng Z, Liu W, Gu C, Lan Q, Zhang W, Yang J

pubmed logopapersMay 7 2025
Optical Coherence Tomography (OCT) is a critical imaging modality for diagnosing ocular and systemic conditions, yet its accessibility is hindered by the need for specialized expertise and high computational demands. To address these challenges, we introduce ChatOCT, an offline-capable, domain-adaptive clinical decision support system (CDSS) that integrates structured expert Q&A generation, OCT-specific knowledge injection, and activation-aware model compression. Unlike existing systems, ChatOCT functions without internet access, making it suitable for low-resource environments. ChatOCT is built upon LLaMA-2-7B, incorporating domain-specific knowledge from PubMed and OCT News through a two-stage training process: (1) knowledge injection for OCT-specific expertise and (2) Q&A instruction tuning for structured, interactive diagnostic reasoning. To ensure feasibility in offline environments, we apply activation-aware weight quantization, reducing GPU memory usage to ~ 4.74 GB, enabling deployment on standard OCT hardware. A novel expert answer generation framework mitigates hallucinations by structuring responses in a multi-step process, ensuring accuracy and interpretability. ChatOCT outperforms state-of-the-art baselines such as LLaMA-2, PMC-LLaMA-13B, and ChatDoctor by 10-15 points in coherence, relevance, and clinical utility, while reducing GPU memory requirements by 79%, while maintaining real-time responsiveness (~ 20 ms inference time). Expert ophthalmologists rated ChatOCT's outputs as clinically actionable and aligned with real-world decision-making needs, confirming its potential to assist frontline healthcare providers. ChatOCT represents an innovative offline clinical decision support system for optical coherence tomography (OCT) that runs entirely on local embedded hardware, enabling real-time analysis in resource-limited settings without internet connectivity. By offering a scalable, generalizable pipeline that integrates knowledge injection, instruction tuning, and model compression, ChatOCT provides a blueprint for next-generation, resource-efficient clinical AI solutions across multiple medical domains.

Interpretable MRI-Based Deep Learning for Alzheimer's Risk and Progression

Lu, B., Chen, Y.-R., Li, R.-X., Zhang, M.-K., Yan, S.-Z., Chen, G.-Q., Castellanos, F. X., Thompson, P. M., Lu, J., Han, Y., Yan, C.-G.

medrxiv logopreprintMay 7 2025
Timely intervention for Alzheimers disease (AD) requires early detection. The development of immunotherapies targeting amyloid-beta and tau underscores the need for accessible, time-efficient biomarkers for early diagnosis. Here, we directly applied our previously developed MRI-based deep learning model for AD to the large Chinese SILCODE cohort (722 participants, 1,105 brain MRI scans). The model -- initially trained on North American data -- demonstrated robust cross-ethnic generalization, without any retraining or fine-tuning, achieving an AUC of 91.3% in AD classification with a sensitivity of 95.2%. It successfully identified 86.7% of individuals at risk of AD progression more than 5 years in advance. Individuals identified as high-risk exhibited significantly shorter median progression times. By integrating an interpretable deep learning brain risk map approach, we identified AD brain subtypes, including an MCI subtype associated with rapid cognitive decline. The models risk scores showed significant correlations with cognitive measures and plasma biomarkers, such as tau proteins and neurofilament light chain (NfL). These findings underscore the exceptional generalizability and clinical utility of MRI-based deep learning models, especially in large and diverse populations, offering valuable tools for early therapeutic intervention. The model has been made open-source and deployed to a free online website for AD risk prediction, to assist in early screening and intervention.

Prompt Engineering for Large Language Models in Interventional Radiology.

Dietrich N, Bradbury NC, Loh C

pubmed logopapersMay 7 2025
Prompt engineering plays a crucial role in optimizing artificial intelligence (AI) and large language model (LLM) outputs by refining input structure, a key factor in medical applications where precision and reliability are paramount. This Clinical Perspective provides an overview of prompt engineering techniques and their relevance to interventional radiology (IR). It explores key strategies, including zero-shot, one- or few-shot, chain-of-thought, tree-of-thought, self-consistency, and directional stimulus prompting, demonstrating their application in IR-specific contexts. Practical examples illustrate how these techniques can be effectively structured for workplace and clinical use. Additionally, the article discusses best practices for designing effective prompts and addresses challenges in the clinical use of generative AI, including data privacy and regulatory concerns. It concludes with an outlook on the future of generative AI in IR, highlighting advances including retrieval-augmented generation, domain-specific LLMs, and multimodal models.

Phenotype-Guided Generative Model for High-Fidelity Cardiac MRI Synthesis: Advancing Pretraining and Clinical Applications

Ziyu Li, Yujian Hu, Zhengyao Ding, Yiheng Mao, Haitao Li, Fan Yi, Hongkun Zhang, Zhengxing Huang

arxiv logopreprintMay 6 2025
Cardiac Magnetic Resonance (CMR) imaging is a vital non-invasive tool for diagnosing heart diseases and evaluating cardiac health. However, the limited availability of large-scale, high-quality CMR datasets poses a major challenge to the effective application of artificial intelligence (AI) in this domain. Even the amount of unlabeled data and the health status it covers are difficult to meet the needs of model pretraining, which hinders the performance of AI models on downstream tasks. In this study, we present Cardiac Phenotype-Guided CMR Generation (CPGG), a novel approach for generating diverse CMR data that covers a wide spectrum of cardiac health status. The CPGG framework consists of two stages: in the first stage, a generative model is trained using cardiac phenotypes derived from CMR data; in the second stage, a masked autoregressive diffusion model, conditioned on these phenotypes, generates high-fidelity CMR cine sequences that capture both structural and functional features of the heart in a fine-grained manner. We synthesized a massive amount of CMR to expand the pretraining data. Experimental results show that CPGG generates high-quality synthetic CMR data, significantly improving performance on various downstream tasks, including diagnosis and cardiac phenotypes prediction. These gains are demonstrated across both public and private datasets, highlighting the effectiveness of our approach. Code is availabel at https://anonymous.4open.science/r/CPGG.

A Vision-Language Model for Focal Liver Lesion Classification

Song Jian, Hu Yuchang, Wang Hui, Chen Yen-Wei

arxiv logopreprintMay 6 2025
Accurate classification of focal liver lesions is crucial for diagnosis and treatment in hepatology. However, traditional supervised deep learning models depend on large-scale annotated datasets, which are often limited in medical imaging. Recently, Vision-Language models (VLMs) such as Contrastive Language-Image Pre-training model (CLIP) has been applied to image classifications. Compared to the conventional convolutional neural network (CNN), which classifiers image based on visual information only, VLM leverages multimodal learning with text and images, allowing it to learn effectively even with a limited amount of labeled data. Inspired by CLIP, we pro-pose a Liver-VLM, a model specifically designed for focal liver lesions (FLLs) classification. First, Liver-VLM incorporates class information into the text encoder without introducing additional inference overhead. Second, by calculating the pairwise cosine similarities between image and text embeddings and optimizing the model with a cross-entropy loss, Liver-VLM ef-fectively aligns image features with class-level text features. Experimental results on MPCT-FLLs dataset demonstrate that the Liver-VLM model out-performs both the standard CLIP and MedCLIP models in terms of accuracy and Area Under the Curve (AUC). Further analysis shows that using a lightweight ResNet18 backbone enhances classification performance, particularly under data-constrained conditions.
Page 3 of 431 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.