Latest Papers on Radiology AI. Sources: pubmed, Tags: GenAI, Order: Best Match, Limit: 10.

Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.

Miaojiao S, Xia L, Xian Tao Z, Zhi Liang H, Sheng C, Songsong W

•papers•Jun 11 2025

Breast ultrasound is essential for evaluating breast nodules, with Breast Imaging Reporting and Data System (BI-RADS) providing standardized classification. However, interobserver variability among radiologists can affect diagnostic accuracy. Large language models (LLMs) like ChatGPT-4 have shown potential in medical imaging interpretation. This study explores its feasibility in improving BI-RADS classification consistency and malignancy prediction compared to radiologists. This study aims to evaluate the feasibility of using LLMs, particularly ChatGPT-4, to assess the consistency and diagnostic accuracy of standardized breast ultrasound imaging reports, using pathology as the reference standard. This retrospective study analyzed breast nodule ultrasound data from 671 female patients (mean 45.82, SD 9.20 years; range 26-75 years) who underwent biopsy or surgical excision at our hospital between June 2019 and June 2024. ChatGPT-4 was used to interpret BI-RADS classifications and predict benign versus malignant nodules. The study compared the model's performance to that of two senior radiologists (≥15 years of experience) and two junior radiologists (<5 years of experience) using key diagnostic metrics, including accuracy, sensitivity, specificity, area under the receiver operating characteristic curve, P values, and odds ratios with 95% CIs. Two diagnostic models were evaluated: (1) image interpretation model, where ChatGPT-4 classified nodules based on BI-RADS features, and (2) image-to-text-LLM model, where radiologists provided textual descriptions, and ChatGPT-4 determined malignancy probability based on keywords. Radiologists were blinded to pathological outcomes, and BI-RADS classifications were finalized through consensus. ChatGPT-4 achieved an overall BI-RADS classification accuracy of 96.87%, outperforming junior radiologists (617/671, 91.95% and 604/671, 90.01%, P<.01). For malignancy prediction, ChatGPT-4 achieved an area under the receiver operating characteristic curve of 0.82 (95% CI 0.79-0.85), an accuracy of 80.63% (541/671 cases), a sensitivity of 90.56% (259/286 cases), and a specificity of 73.51% (283/385 cases). The image interpretation model demonstrated performance comparable to senior radiologists, while the image-to-text-LLM model further improved diagnostic accuracy for all radiologists, increasing their sensitivity and specificity significantly (P<.001). Statistical analyses, including the McNemar test and DeLong test, confirmed that ChatGPT-4 outperformed junior radiologists (P<.01) and showed noninferiority compared to senior radiologists (P>.05). Pathological diagnoses served as the reference standard, ensuring robust evaluation reliability. Integrating ChatGPT-4 into an image-to-text-LLM workflow improves BI-RADS classification accuracy and supports radiologists in breast ultrasound diagnostics. These results demonstrate its potential as a decision-support tool to enhance diagnostic consistency and reduce variability.

Ultrasound Classification Breast Retrospective Clinical In Silico None Academic Lab GenAI

Advancements and Applications of Hyperpolarized Xenon MRI for COPD Assessment in China.

Li H, Li H, Zhang M, Fang Y, Shen L, Liu X, Xiao S, Zeng Q, Zhou Q, Zhao X, Shi L, Han Y, Zhou X

•papers•Jun 10 2025

Chronic obstructive pulmonary disease (COPD) is one of the leading causes of morbidity and mortality in China, highlighting the importance of early diagnosis and ongoing monitoring for effective management. In recent years, hyperpolarized 129Xe MRI technology has gained significant clinical attention due to its ability to non-invasively and visually assess lung ventilation, microstructure, and gas exchange function. Its recent clinical approval in China, the United States and several European countries, represents a significant advancement in pulmonary imaging. This review provides an overview of the latest developments in hyperpolarized 129Xe MRI technology for COPD assessment in China. It covers the progress in instrument development, advanced imaging techniques, artificial intelligence-driven reconstruction methods, molecular imaging, and the application of this technology in both COPD patients and animal models. Furthermore, the review explores potential technical innovations in 129Xe MRI and discusses future directions for its clinical applications, aiming to address existing challenges and expand the technology's impact in clinical practice.

MRI Reconstruction Chest Review Clinical Pilot CE Mark Academic Lab Benchmark SOTA GenAI

Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.

Riaz IB, Naqvi SAA, Ashraf N, Harris GJ, Kehl KL

•papers•Jun 10 2025

Phenotypic information for cancer research is embedded in unstructured electronic health records (EHR), requiring effort to extract. Deep learning models can automate this but face scalability issues due to privacy concerns. We evaluated techniques for applying a teacher-student framework to extract longitudinal clinical outcomes from EHRs. We focused on the challenging task of ascertaining two cancer outcomes-overall response and progression according to Response Evaluation Criteria in Solid Tumors (RECIST)-from free-text radiology reports. Teacher models with hierarchical Transformer architecture were trained on data from Dana-Farber Cancer Institute (DFCI). These models labeled public datasets (MIMIC-IV, Wiki-text) and GPT-4-generated synthetic data. "Student" models were then trained to mimic the teachers' predictions. DFCI "teacher" models achieved high performance, and student models trained on MIMIC-IV data showed comparable results, demonstrating effective knowledge transfer. However, student models trained on Wiki-text and synthetic data performed worse, emphasizing the need for in-domain public datasets for model distillation.

Mixed Modality LLM Radiology Report Other Methodology In Silico None Academic Lab GenAI Reproducibility

RadGPT: A system based on a large language model that generates sets of patient-centered materials to explain radiology report information.

Herwald SE, Shah P, Johnston A, Olsen C, Delbrouck JB, Langlotz CP

•papers•Jun 10 2025

The Cures Act Final Rule requires that patients have real-time access to their radiology reports, which contain technical language. Our objective to was to use a novel system called RadGPT, which integrates concept extraction and a large language model (LLM), to help patients understand their radiology reports. RadGPT generated 150 concept explanations and 390 question-and-answer pairs from 30 radiology report impressions from between 2012 and 2020. The extracted concepts were used to create concept-based explanations, as well as concept-based question-and-answer pairs where questions were generated using either a fixed template or an LLM. Additionally, report-based question-and-answer pairs were generated directly from the impression using an LLM without concept extraction. One board-certified radiologist and 4 radiology residents rated the material quality using a standardized rubric. Concept-based LLM-generated questions were significantly higher quality than concept-based template-generated questions (p < 0.001). Excluding those template-based question-and-answer pairs from further analysis, nearly all (> 95%) of RadGPT-generated materials were rated highly, with at least 50% receiving the highest possible ranking from all 5 raters. No answers or explanations were rated as likely to affect the safety or effectiveness of patient care. Report-level LLM-based questions and answers were rated particularly highly, with 92% of report-level LLM-based questions and 61% of the corresponding report-level answers receiving the highest rating from all raters. The educational tool RadGPT generated high-quality explanations and question-and-answer pairs that were personalized for each radiology report, unlikely to produce harmful explanations and likely to enhance patient understanding of radiology information.

Mixed Modality LLM Radiology Report Other Methodology Prototype None Academic Lab GenAI

Improving Patient Communication by Simplifying AI-Generated Dental Radiology Reports With ChatGPT: Comparative Study.

Stephan D, Bertsch AS, Schumacher S, Puladi B, Burwinkel M, Al-Nawas B, Kämmerer PW, Thiem DG

•papers•Jun 9 2025

Medical reports, particularly radiology findings, are often written for professional communication, making them difficult for patients to understand. This communication barrier can reduce patient engagement and lead to misinterpretation. Artificial intelligence (AI), especially large language models such as ChatGPT, offers new opportunities for simplifying medical documentation to improve patient comprehension. We aimed to evaluate whether AI-generated radiology reports simplified by ChatGPT improve patient understanding, readability, and communication quality compared to original AI-generated reports. In total, 3 versions of radiology reports were created using ChatGPT: an original AI-generated version (text 1), a patient-friendly, simplified version (text 2), and a further simplified and accessibility-optimized version (text 3). A total of 300 patients (n=100, 33.3% per group), excluding patients with medical education, were randomly assigned to review one text version and complete a standardized questionnaire. Readability was assessed using the Flesch Reading Ease (FRE) score and LIX indices. Both simplified texts showed significantly higher readability scores (text 1: FRE score=51.1; text 2: FRE score=55.0; and text 3: FRE score=56.4; P<.001) and lower LIX scores, indicating enhanced clarity. Text 3 had the shortest sentences, had the fewest long words, and scored best on all patient-rated dimensions. Questionnaire results revealed significantly higher ratings for texts 2 and 3 across clarity (P<.001), tone (P<.001), structure, and patient engagement. For example, patients rated the ability to understand findings without help highest for text 3 (mean 1.5, SD 0.7) and lowest for text 1 (mean 3.1, SD 1.4). Both simplified texts significantly improved patients' ability to prepare for clinical conversations and promoted shared decision-making. AI-generated simplification of radiology reports significantly enhances patient comprehension and engagement. These findings highlight the potential of ChatGPT as a tool to improve patient-centered communication. While promising, future research should focus on ensuring clinical accuracy and exploring applications across diverse patient populations to support equitable and effective integration of AI in health care communication.

X-Ray LLM Radiology Report Other Prospective Clinical Pilot Academic Lab GenAI Benchmark SOTA

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

Su H, Sun Y, Li R, Zhang A, Yang Y, Xiao F, Duan Z, Chen J, Hu Q, Yang T, Xu B, Zhang Q, Zhao J, Li Y, Li H

•papers•Jun 9 2025

The integration of large language models (LLMs) into medical diagnostics has garnered substantial attention due to their potential to enhance diagnostic accuracy, streamline clinical workflows, and address health care disparities. However, the rapid evolution of LLM research necessitates a comprehensive synthesis of their applications, challenges, and future directions. This scoping review aimed to provide an overview of the current state of research regarding the use of LLMs in medical diagnostics. The study sought to answer four primary subquestions, as follows: (1) Which LLMs are commonly used? (2) How are LLMs assessed in diagnosis? (3) What is the current performance of LLMs in diagnosing diseases? (4) Which medical domains are investigating the application of LLMs? This scoping review was conducted according to the Joanna Briggs Institute Manual for Evidence Synthesis and adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Relevant literature was searched from the Web of Science, PubMed, Embase, IEEE Xplore, and ACM Digital Library databases from 2022 to 2025. Articles were screened and selected based on predefined inclusion and exclusion criteria. Bibliometric analysis was performed using VOSviewer to identify major research clusters and trends. Data extraction included details on LLM types, application domains, and performance metrics. The field is rapidly expanding, with a surge in publications after 2023. GPT-4 and its variants dominated research (70/95, 74% of studies), followed by GPT-3.5 (34/95, 36%). Key applications included disease classification (text or image-based), medical question answering, and diagnostic content generation. LLMs demonstrated high accuracy in specialties like radiology, psychiatry, and neurology but exhibited biases in race, gender, and cost predictions. Ethical concerns, including privacy risks and model hallucination, alongside regulatory fragmentation, were critical barriers to clinical adoption. LLMs hold transformative potential for medical diagnostics but require rigorous validation, bias mitigation, and multimodal integration to address real-world complexities. Future research should prioritize explainable artificial intelligence frameworks, specialty-specific optimization, and international regulatory harmonization to ensure equitable and safe clinical deployment.

Mixed Modality LLM Radiology Report Other Review Concept None Academic Lab Policy GenAI

Brain tau PET-based identification and characterization of subpopulations in patients with Alzheimer's disease using deep learning-derived saliency maps.

Li Y, Wang X, Ge Q, Graeber MB, Yan S, Li J, Li S, Gu W, Hu S, Benzinger TLS, Lu J, Zhou Y

•papers•Jun 9 2025

Alzheimer's disease (AD) is a heterogeneous neurodegenerative disorder in which tau neurofibrillary tangles are a pathological hallmark closely associated with cognitive dysfunction and neurodegeneration. In this study, we used brain tau data to investigate AD heterogeneity by identifying and characterizing the subpopulations among patients. We included 615 cognitively normal and 159 AD brain 18F-flortaucipr PET scans, along with T1-weighted MRI from the Alzheimer Disease Neuroimaging Initiative database. A three dimensional-convolutional neural network model was employed for AD detection using standardized uptake value ratio (SUVR) images. The model-derived saliency maps were generated and employed as informative image features for clustering AD participants. Among the identified subpopulations, statistical analysis of demographics, neuropsychological measures, and SUVR were compared. Correlations between neuropsychological measures and regional SUVRs were assessed. A generalized linear model was utilized to investigate the sex and APOE ε4 interaction effect on regional SUVRs. Two distinct subpopulations of AD patients were revealed, denoted as SHi and SLo. Compared to the SLo group, the SHi group exhibited a significantly higher global tau burden in the brain, but both groups showed similar cognition distribution levels. In the SHi group, the associations between the neuropsychological measurements and regional tau deposition were weakened. Moreover, a significant interaction effect of sex and APOE ε4 on tau deposition was observed in the SLo group, but no such effect was found in the SHi group. Our results suggest that tau tangles, as shown by SUVR, continue to accumulate even when cognitive function plateaus in AD patients, highlighting the advantages of PET in later disease stages. The differing relationships between cognition and tau deposition, and between gender, APOE4, and tau deposition, provide potential for subtype-specific treatments. Targeting gender-specific and genetic factors influencing tau deposition, as well as interventions aimed at tau's impact on cognition, may be effective.

PET Classification Neurological Retrospective Clinical In Silico None Academic Lab GenAI

Diagnostic and Technological Advances in Magnetic Resonance (Focusing on Imaging Technique and the Gadolinium-Based Contrast Media), Computed Tomography (Focusing on Photon Counting CT), and Ultrasound-State of the Art.

Runge VM, Heverhagen JT

•papers•Jun 9 2025

Magnetic resonance continues to evolve and advance as a critical imaging modality for disease diagnosis and monitoring. Hardware and software advances continue to propel this modality to the forefront of the field of diagnostic imaging. Next generation MR contrast media, specifically gadolinium chelates with improved relaxivity and stability (relative to the provided contrast effect), have emerged providing a further boost to the field. Concern regarding gadolinium deposition in the body with primarily the weaker gadolinium chelates (which have been now removed from the market, at least in Europe) continues to be at the forefront of clinicians' minds. This has driven renewed interest in possible development of manganese-based contrast media. The development of photon counting CT and its clinical introduction have made possible a further major advance in CT image quality, along with the potential for decreasing radiation dose. The possibility of major clinical advances in thoracic, cardiac, and musculoskeletal imaging were first recognized, with its broader impact - across all organ systems - now also recognized. The utility of routine acquisition (without penalty in time or radiation dose) of full spectral multi-energy data is now also being recognized as an additional major advance made possible by photon counting CT. Artificial intelligence is now being used in the background across most imaging platforms and modalities, making possible further advances in imaging technique and image quality, although this field is nowhere yet near to realizing its full potential. And last, but not least, the field of ultrasound is on the cusp of further major advances in availability (with development of very low-cost systems) and a possible new generation of microbubble contrast media.

Mixed Modality Reconstruction Whole Body Review Post Market None Academic Lab GenAI

MRI-mediated intelligent multimodal imaging system: from artificial intelligence to clinical imaging diagnosis.

Li Y, Wang J, Pan X, Shan Y, Zhang J

•papers•Jun 8 2025

MRI, as a mature diagnostic method in clinical application, is favored by doctors and patients, there are also insurmountable bottleneck problems. AI strategies such as multimodal imaging integration and machine learning are used to build an intelligent multimodal imaging system based on MRI data to solve the unmet clinical needs in various medical environments. This review systematically discusses the development of MRI-guided multimodal imaging systems and the application of intelligent multimodal imaging systems integrated with artificial intelligence in the early diagnosis of brain and cardiovascular diseases. The safe and effective deployment of AI in clinical diagnostic equipment can help enhance early accurate diagnosis and personalized patient care.

MRI Neurological Review Academic Lab GenAI

Foundation versus domain-specific models for left ventricular segmentation on cardiac ultrasound.

Chao CJ, Gu YR, Kumar W, Xiang T, Appari L, Wu J, Farina JM, Wraith R, Jeong J, Arsanjani R, Kane GC, Oh JK, Langlotz CP, Banerjee I, Fei-Fei L, Adeli E

•papers•Jun 6 2025

The Segment Anything Model (SAM) was fine-tuned on the EchoNet-Dynamic dataset and evaluated on external transthoracic echocardiography (TTE) and Point-of-Care Ultrasound (POCUS) datasets from CAMUS (University Hospital of St Etienne) and Mayo Clinic (99 patients: 58 TTE, 41 POCUS). Fine-tuned SAM was superior or comparable to MedSAM. The fine-tuned SAM also outperformed EchoNet and U-Net models, demonstrating strong generalization, especially on apical 2-chamber (A2C) images (fine-tuned SAM vs. EchoNet: CAMUS-A2C: DSC 0.891 ± 0.040 vs. 0.752 ± 0.196, p < 0.0001) and POCUS (DSC 0.857 ± 0.047 vs. 0.667 ± 0.279, p < 0.0001). Additionally, SAM-enhanced workflow reduced annotation time by 50% (11.6 ± 4.5 sec vs. 5.7 ± 1.7 sec, p < 0.0001) while maintaining segmentation quality. We demonstrated an effective strategy for fine-tuning a vision foundation model for enhancing clinical workflow efficiency and supporting human-AI collaboration.

Ultrasound Segmentation Cardiac Retrospective Clinical In Silico None Academic Lab GenAI

Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.

Advancements and Applications of Hyperpolarized Xenon MRI for COPD Assessment in China.

Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.

RadGPT: A system based on a large language model that generates sets of patient-centered materials to explain radiology report information.

Improving Patient Communication by Simplifying AI-Generated Dental Radiology Reports With ChatGPT: Comparative Study.

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

Brain tau PET-based identification and characterization of subpopulations in patients with Alzheimer's disease using deep learning-derived saliency maps.

Diagnostic and Technological Advances in Magnetic Resonance (Focusing on Imaging Technique and the Gadolinium-Based Contrast Media), Computed Tomography (Focusing on Photon Counting CT), and Ultrasound-State of the Art.

MRI-mediated intelligent multimodal imaging system: from artificial intelligence to clinical imaging diagnosis.

Foundation versus domain-specific models for left ventricular segmentation on cardiac ultrasound.

Ready to Sharpen Your Edge?