Latest Papers on Radiology AI. Tags: GenAI

Aiding Medical Diagnosis through Image Synthesis and Classification

Kanishk Choudhary

•preprint•Jun 1 2025

Medical professionals, especially those in training, often depend on visual reference materials to support an accurate diagnosis and develop pattern recognition skills. However, existing resources may lack the diversity and accessibility needed for broad and effective clinical learning. This paper presents a system designed to generate realistic medical images from textual descriptions and validate their accuracy through a classification model. A pretrained stable diffusion model was fine-tuned using Low-Rank Adaptation (LoRA) on the PathMNIST dataset, consisting of nine colorectal histopathology tissue types. The generative model was trained multiple times using different training parameter configurations, guided by domain-specific prompts to capture meaningful features. To ensure quality control, a ResNet-18 classification model was trained on the same dataset, achieving 99.76% accuracy in detecting the correct label of a colorectal histopathological medical image. Generated images were then filtered using the trained classifier and an iterative process, where inaccurate outputs were discarded and regenerated until they were correctly classified. The highest performing version of the generative model from experimentation achieved an F1 score of 0.6727, with precision and recall scores of 0.6817 and 0.7111, respectively. Some types of tissue, such as adipose tissue and lymphocytes, reached perfect classification scores, while others proved more challenging due to structural complexity. The self-validating approach created demonstrates a reliable method for synthesizing domain-specific medical images because of high accuracy in both the generation and classification portions of the system, with potential applications in both diagnostic support and clinical education. Future work includes improving prompt-specific accuracy and extending the system to other areas of medical imaging.

Mixed Modality Image Synthesis Abdominal Methodology In Silico Academic Lab GenAI Open Dataset

Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation

Pimchanok Sukjai, Apiradee Boonmee

•preprint•Jun 1 2025

The escalating demand for medical image interpretation underscores the critical need for advanced artificial intelligence solutions to enhance the efficiency and accuracy of radiological diagnoses. This paper introduces CXR-PathFinder, a novel Large Language Model (LLM)-centric foundation model specifically engineered for automated chest X-ray (CXR) report generation. We propose a unique training paradigm, Clinician-Guided Adversarial Fine-Tuning (CGAFT), which meticulously integrates expert clinical feedback into an adversarial learning framework to mitigate factual inconsistencies and improve diagnostic precision. Complementing this, our Knowledge Graph Augmentation Module (KGAM) acts as an inference-time safeguard, dynamically verifying generated medical statements against authoritative knowledge bases to minimize hallucinations and ensure standardized terminology. Leveraging a comprehensive dataset of millions of paired CXR images and expert reports, our experiments demonstrate that CXR-PathFinder significantly outperforms existing state-of-the-art medical vision-language models across various quantitative metrics, including clinical accuracy (Macro F1 (14): 46.5, Micro F1 (14): 59.5). Furthermore, blinded human evaluation by board-certified radiologists confirms CXR-PathFinder's superior clinical utility, completeness, and accuracy, establishing its potential as a reliable and efficient aid for radiological practice. The developed method effectively balances high diagnostic fidelity with computational efficiency, providing a robust solution for automated medical report generation.

X-Ray LLM Radiology Report Chest Methodology In Silico GenAI

Adaptive Breast MRI Scanning Using AI.

Eskreis-Winkler S, Bhowmik A, Kelly LH, Lo Gullo R, D'Alessio D, Belen K, Hogan MP, Saphier NB, Sevilimedu V, Sung JS, Comstock CE, Sutton EJ, Pinker K

•papers•Jun 1 2025

Background MRI protocols typically involve many imaging sequences and often require too much time. Purpose To simulate artificial intelligence (AI)-directed stratified scanning for screening breast MRI with various triage thresholds and evaluate its diagnostic performance against that of the full breast MRI protocol. Materials and Methods This retrospective reader study included consecutive contrast-enhanced screening breast MRI examinations performed between January 2013 and January 2019 at three regional cancer sites. In this simulation study, an in-house AI tool generated a suspicion score for subtraction maximum intensity projection images during a given MRI examination, and the score was used to determine whether to proceed with the full MRI protocol or end the examination early (abbreviated breast MRI [AB-MRI] protocol). Examinations with suspicion scores under the 50th percentile were read using both the AB-MRI protocol (ie, dynamic contrast-enhanced MRI scans only) and the full MRI protocol. Diagnostic performance metrics for screening with various AI triage thresholds were compared with those for screening without AI triage. Results Of 863 women (mean age, 52 years ± 10 [SD]; 1423 MRI examinations), 51 received a cancer diagnosis within 12 months of screening. The diagnostic performance metrics for AI-directed stratified scanning that triaged 50% of examinations to AB-MRI versus full MRI protocol scanning were as follows: sensitivity, 88.2% (45 of 51; 95% CI: 79.4, 97.1) versus 86.3% (44 of 51; 95% CI: 76.8, 95.7); specificity, 80.8% (1108 of 1372; 95% CI: 78.7, 82.8) versus 81.4% (1117 of 1372; 95% CI: 79.4, 83.5); positive predictive value 3 (ie, percent of biopsies yielding cancer), 23.6% (43 of 182; 95% CI: 17.5, 29.8) versus 24.7% (42 of 170; 95% CI: 18.2, 31.2); cancer detection rate (per 1000 examinations), 31.6 (95% CI: 22.5, 40.7) versus 30.9 (95% CI: 21.9, 39.9); and interval cancer rate (per 1000 examinations), 4.2 (95% CI: 0.9, 7.6) versus 4.9 (95% CI: 1.3, 8.6). Specificity decreased by no more than 2.7 percentage points with AI triage. There were no AI-triaged examinations for which conducting the full MRI protocol would have resulted in additional cancer detection. Conclusion AI-directed stratified MRI decreased simulated scan times while maintaining diagnostic performance. © RSNA, 2025 <i>Supplemental material is available for this article.</i> See also the editorial by Strand in this issue.

MRI Triage Breast Retrospective Clinical In Silico Academic Lab GenAI

ChatGPT-4o's Performance in Brain Tumor Diagnosis and MRI Findings: A Comparative Analysis with Radiologists.

Ozenbas C, Engin D, Altinok T, Akcay E, Aktas U, Tabanli A

•papers•Jun 1 2025

To evaluate the accuracy of ChatGPT-4o in identifying magnetic resonance imaging (MRI) findings and diagnosing brain tumors by comparing its performance with that of experienced radiologists. This retrospective study included 46 patients with pathologically confirmed brain tumors who underwent preoperative MRI between January 2021 and October 2024. Two experienced radiologists and ChatGPT 4o independently evaluated the anonymized MRI images. Eight questions focusing on MRI sequences, lesion characteristics, and diagnoses were answered. ChatGPT-4o's responses were compared to those of the radiologists and the pathology outcomes. Statistical analyses were performed, which included accuracy, sensitivity, specificity, and the McNemar test, with p<0.05 considered to indicate a statistically significant difference. ChatGPT-4o successfully identified 44 of the 46 (95.7%) lesions; it achieved 88.3% accuracy in identifying MRI sequences, 81% in perilesional edema, 79.5% in signal characteristics, and 82.2% in contrast enhancement. However, its accuracy in localizing lesions was 53.6% and that in distinguishing extra-axial from intra-axial lesions was 26.3%. As such, ChatGPT-4o achieved success rates of 56.8% and 29.5% for differential diagnoses and most likely diagnoses when compared to 93.2-90.9% and 70.5-65.9% for radiologists, respectively (p<0.005). ChatGPT-4o demonstrated high accuracy in identifying certain MRI features but underperformed in diagnostic tasks in comparison with the radiologists. Despite its current limitations, future updates and advancements have the potential to enable large language models to facilitate diagnosis and offer a reliable second opinion to radiologists.

MRI LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI

Deep learning driven interpretable and informed decision making model for brain tumour prediction using explainable AI.

Adnan KM, Ghazal TM, Saleem M, Farooq MS, Yeun CY, Ahmad M, Lee SW

•papers•Jun 1 2025

Brain Tumours are highly complex, particularly when it comes to their initial and accurate diagnosis, as this determines patient prognosis. Conventional methods rely on MRI and CT scans and employ generic machine learning techniques, which are heavily dependent on feature extraction and require human intervention. These methods may fail in complex cases and do not produce human-interpretable results, making it difficult for clinicians to trust the model's predictions. Such limitations prolong the diagnostic process and can negatively impact the quality of treatment. The advent of deep learning has made it a powerful tool for complex image analysis tasks, such as detecting brain Tumours, by learning advanced patterns from images. However, deep learning models are often considered "black box" systems, where the reasoning behind predictions remains unclear. To address this issue, the present study applies Explainable AI (XAI) alongside deep learning for accurate and interpretable brain Tumour prediction. XAI enhances model interpretability by identifying key features such as Tumour size, location, and texture, which are crucial for clinicians. This helps build their confidence in the model and enables them to make better-informed decisions. In this research, a deep learning model integrated with XAI is proposed to develop an interpretable framework for brain Tumour prediction. The model is trained on an extensive dataset comprising imaging and clinical data and demonstrates high AUC while leveraging XAI for model explainability and feature selection. The study findings indicate that this approach improves predictive performance, achieving an accuracy of 92.98% and a miss rate of 7.02%. Additionally, interpretability tools such as LIME and Grad-CAM provide clinicians with a clearer understanding of the decision-making process, supporting diagnosis and treatment. This model represents a significant advancement in brain Tumour prediction, with the potential to enhance patient outcomes and contribute to the field of neuro-oncology.

Mixed Modality Classification Neurological Methodology In Silico Academic Lab GenAI

Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists.

Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, Yang C, Wang Y, Peng J, Bashir MR, Ronot M, Song B, Jiang H

•papers•Jun 1 2025

Whether large language models (LLMs) could be integrated into the diagnostic workflow of focal liver lesions (FLLs) remains unclear. We aimed to investigate two generic LLMs (ChatGPT-4o and Gemini) regarding their diagnostic accuracies referring to the CT/MRI reports, compared to and combined with radiologists of different experience levels. From April 2022 to April 2024, this single-center retrospective study included consecutive adult patients who underwent contrast-enhanced CT/MRI for single FLL and subsequent histopathologic examination. The LLMs were prompted by clinical information and the "findings" section of radiology reports three times to provide differential diagnoses in the descending order of likelihood, with the first considered the final diagnosis. In the research setting, six radiologists (three junior and three middle-level) independently reviewed the CT/MRI images and clinical information in two rounds (first alone, then with LLM assistance). In the clinical setting, diagnoses were retrieved from the "impressions" section of radiology reports. Diagnostic accuracy was investigated against histopathology. 228 patients (median age, 59 years; 155 males) with 228 FLLs (median size, 3.6 cm) were included. Regarding the final diagnosis, the accuracy of two-step ChatGPT-4o (78.9%) was higher than single-step ChatGPT-4o (68.0%, p < 0.001) and single-step Gemini (73.2%, p = 0.004), similar to real-world radiology reports (80.0%, p = 0.34) and junior radiologists (78.9%-82.0%; p-values, 0.21 to > 0.99), but lower than middle-level radiologists (84.6%-85.5%; p-values, 0.001 to 0.02). No incremental diagnostic value of ChatGPT-4o was observed for any radiologist (p-values, 0.63 to > 0.99). Two-step ChatGPT-4o showed matching accuracies to real-world radiology reports and junior radiologists for diagnosing FLLs but was less accurate than middle-level radiologists and demonstrated little incremental diagnostic value.

Mixed Modality LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI

A Large Language Model to Detect Negated Expressions in Radiology Reports.

Su Y, Babore YB, Kahn CE

•papers•Jun 1 2025

Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab GenAI

Decoding Glioblastoma Heterogeneity: Neuroimaging Meets Machine Learning.

Fares J, Wan Y, Mayrand R, Li Y, Mair R, Price SJ

•papers•Jun 1 2025

Recent advancements in neuroimaging and machine learning have significantly improved our ability to diagnose and categorize isocitrate dehydrogenase (IDH)-wildtype glioblastoma, a disease characterized by notable tumoral heterogeneity, which is crucial for effective treatment. Neuroimaging techniques, such as diffusion tensor imaging and magnetic resonance radiomics, provide noninvasive insights into tumor infiltration patterns and metabolic profiles, aiding in accurate diagnosis and prognostication. Machine learning algorithms further enhance glioblastoma characterization by identifying distinct imaging patterns and features, facilitating precise diagnoses and treatment planning. Integration of these technologies allows for the development of image-based biomarkers, potentially reducing the need for invasive biopsy procedures and enabling personalized therapy targeting specific pro-tumoral signaling pathways and resistance mechanisms. Although significant progress has been made, ongoing innovation is essential to address remaining challenges and further improve these methodologies. Future directions should focus on refining machine learning models, integrating emerging imaging techniques, and elucidating the complex interplay between imaging features and underlying molecular processes. This review highlights the pivotal role of neuroimaging and machine learning in glioblastoma research, offering invaluable noninvasive tools for diagnosis, prognosis prediction, and treatment planning, ultimately improving patient outcomes. These advances in the field promise to usher in a new era in the understanding and classification of IDH-wildtype glioblastoma.

MRI Classification Neurological Review Concept Academic Lab GenAI

Multi-modal large language models in radiology: principles, applications, and potential.

Shen Y, Xu Y, Ma J, Rui W, Zhao C, Heacock L, Huang C

•papers•Jun 1 2025

Large language models (LLMs) and multi-modal large language models (MLLMs) represent the cutting-edge in artificial intelligence. This review provides a comprehensive overview of their capabilities and potential impact on radiology. Unlike most existing literature reviews focusing solely on LLMs, this work examines both LLMs and MLLMs, highlighting their potential to support radiology workflows such as report generation, image interpretation, EHR summarization, differential diagnosis generation, and patient education. By streamlining these tasks, LLMs and MLLMs could reduce radiologist workload, improve diagnostic accuracy, support interdisciplinary collaboration, and ultimately enhance patient care. We also discuss key limitations, such as the limited capacity of current MLLMs to interpret 3D medical images and to integrate information from both image and text data, as well as the lack of effective evaluation methods. Ongoing efforts to address these challenges are introduced.

Mixed Modality LLM Radiology Report Review Concept Academic Lab GenAI

CNS-CLIP: Transforming a Neurosurgical Journal Into a Multimodal Medical Model.

Alyakin A, Kurland D, Alber DA, Sangwon KL, Li D, Tsirigos A, Leuthardt E, Kondziolka D, Oermann EK

•papers•Jun 1 2025

Classical biomedical data science models are trained on a single modality and aimed at one specific task. However, the exponential increase in the size and capabilities of the foundation models inside and outside medicine shows a shift toward task-agnostic models using large-scale, often internet-based, data. Recent research into smaller foundation models trained on specific literature, such as programming textbooks, demonstrated that they can display capabilities similar to or superior to large generalist models, suggesting a potential middle ground between small task-specific and large foundation models. This study attempts to introduce a domain-specific multimodal model, Congress of Neurological Surgeons (CNS)-Contrastive Language-Image Pretraining (CLIP), developed for neurosurgical applications, leveraging data exclusively from Neurosurgery Publications. We constructed a multimodal data set of articles from Neurosurgery Publications through PDF data collection and figure-caption extraction using an artificial intelligence pipeline for quality control. Our final data set included 24 021 figure-caption pairs. We then developed a fine-tuning protocol for the OpenAI CLIP model. The model was evaluated on tasks including neurosurgical information retrieval, computed tomography imaging classification, and zero-shot ImageNet classification. CNS-CLIP demonstrated superior performance in neurosurgical information retrieval with a Top-1 accuracy of 24.56%, compared with 8.61% for the baseline. The average area under receiver operating characteristic across 6 neuroradiology tasks achieved by CNS-CLIP was 0.95, slightly superior to OpenAI's Contrastive Language-Image Pretraining at 0.94 and significantly outperforming a vanilla vision transformer at 0.62. In generalist classification, CNS-CLIP reached a Top-1 accuracy of 47.55%, a decrease from the baseline of 52.37%, demonstrating a catastrophic forgetting phenomenon. This study presents a pioneering effort in building a domain-specific multimodal model using data from a medical society publication. The results indicate that domain-specific models, while less globally versatile, can offer advantages in specialized contexts. This emphasizes the importance of using tailored data and domain-focused development in training foundation models in neurosurgery and general medicine.

CT Classification Neurological Methodology In Silico Academic Lab GenAI Open Dataset

Filter Papers

Tags

Aiding Medical Diagnosis through Image Synthesis and Classification

Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation

Adaptive Breast MRI Scanning Using AI.

ChatGPT-4o's Performance in Brain Tumor Diagnosis and MRI Findings: A Comparative Analysis with Radiologists.

Deep learning driven interpretable and informed decision making model for brain tumour prediction using explainable AI.

Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists.

A Large Language Model to Detect Negated Expressions in Radiology Reports.

Decoding Glioblastoma Heterogeneity: Neuroimaging Meets Machine Learning.

Multi-modal large language models in radiology: principles, applications, and potential.

CNS-CLIP: Transforming a Neurosurgical Journal Into a Multimodal Medical Model.

Ready to Sharpen Your Edge?