Latest Papers on Radiology AI. Tags: GenAI

Intelligent Head and Neck CTA Report Quality Detection with Large Language Models.

Tian L, Lu Y, Fei X, Lu J

•papers•Aug 27 2025

This study aims to identify common errors in head and neck CTA reports using GPT-4, ERNIE Bot, and SparkDesk, evaluating their potential for supporting quality control in Chinese radiological reports. This study collected 10,000 head and neck CTA imaging reports from Xuanwu Hospital (Dataset 1) and 5000 multi-center reports (Dataset 2). We identified six common types of errors and detected them using three large language models: GPT-4, ERNIE Bot, and SparkDesk. The overall quality of the reports was assessed using a 5-point Likert scale. We conducted a Wilcoxon rank-sum test and Friedman test to compare error detection rates and evaluate the models' performance on different error types and overall scores. For Dataset 2, after manual review, we annotated the six error types and provided overall scoring, while also recording the time taken for manual scoring and model detection. Model performance was evaluated using accuracy, precision, recall, and F1 score. The intraclass correlation coefficient measured consistency between manual and model scores, and ANOVA compared evaluation times. In Dataset 1, the error detection rates for final reports were significantly lower than those for preliminary reports across all three model types. Friedman's test indicated significant differences in error rates among the three models. In Dataset 2, the detection accuracy of the three LLMs for the six error types was above 95%. GPT-4 had a moderate consistency with manual scores (ICC = 0.517), while ERNIE Bot and SparkDesk showed slightly lower consistency (ICC = 0.431 and 0.456, respectively; P < 0.001). The models evaluated one hundred radiology reports significantly faster than human reviewers. LLMs can differentiate the quality of radiology reports and identify error types, significantly enhancing the efficiency of quality control reviews and providing substantial research and practical value in this field.

CT LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI

Evaluating the Quality and Understandability of Radiology Report Summaries Generated by ChatGPT: Survey Study.

Sunshine A, Honce GH, Callen AL, Zander DA, Tanabe JL, Pisani Petrucci SL, Lin CT, Honce JM

•papers•Aug 27 2025

Radiology reports convey critical medical information to health care providers and patients. Unfortunately, they are often difficult for patients to comprehend, causing confusion and anxiety, thereby limiting patient engagement in health care decision-making. Large language models (LLMs) like ChatGPT (OpenAI) can create simplified, patient-friendly report summaries to increase accessibility, albeit with errors. We evaluated the accuracy and clarity of ChatGPT-generated summaries compared to original radiologist-assessed radiology reports, assessed patients' understanding and satisfaction with the summaries compared to the original reports, and compared the readability of the original reports and summaries using validated readability metrics. We anonymized 30 radiology reports created by neuroradiologists at our institution (6 brain magnetic resonance imaging, 6 brain computed tomography, 6 head and neck computed tomography angiography, 6 neck computed tomography, and 6 spine computed tomography). These anonymized reports were processed by ChatGPT to produce patient-centric summaries. Four board-certified neuroradiologists evaluated the ChatGPT-generated summaries on quality and accuracy compared to the original reports, and 4 patient volunteers separately evaluated the reports and summaries on perceived understandability and satisfaction. Readability was assessed using word count and validated readability scales. After reading the summary, patient confidence in understanding (98%, 116/118 vs 26%, 31/118) and satisfaction regarding the level of jargon/terminology (91%, 107/118 vs 8%, 9/118) and time taken to understand the content (97%, 115/118 vs 23%, 27/118) substantially improved. Ninety-two percent (108/118) of responses indicated the summary clarified patients' questions about the report, and 98% (116/118) of responses indicated patients would use the summary if available, with 67% (79/118) of responses indicating they would want access to both the report and summary, while 26% (31/118) of responses indicated only wanting the summary. Eighty-three percent (100/120) of radiologist responses indicated the summary represented the original report "extremely well" or "very well," with only 5% (6/120) of responses indicating it did so "slightly well" or "not well at all." Five percent (6/120) of responses indicated there was missing relevant medical information in the summary, 12% (14/120) reported instances of overemphasis of nonsignificant findings, and 18% (22/120) reported instances of underemphasis of significant findings. No fabricated findings were identified. Overall, 83% (99/120) of responses indicated that the summary would definitely/probably not lead patients to incorrect conclusions about the original report, with 10% (12/120) of responses indicating the summaries may do so. ChatGPT-generated summaries could significantly improve perceived comprehension and satisfaction while accurately reflecting most key information from original radiology reports. Instances of minor omissions and under-/overemphasis were noted in some summaries, underscoring the need for ongoing validation and oversight. Overall, these artificial intelligence-generated, patient-centric summaries hold promise for enhancing patient-centered communication in radiology.

Mixed Modality LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI

FaithfulNet: An explainable deep learning framework for autism diagnosis using structural MRI.

Sujana DS, Augustine DP

•papers•Aug 27 2025

Explainable Artificial Intelligence (XAI) can decode the 'black box' models, enhancing trust in clinical decision-making. XAI makes the predictions of deep learning models interpretable, transparent, and trustworthy. This study employed XAI techniques to explain the predictions made by a deep learning-based model for diagnosing autism and identifying the memory regions responsible for children's academic performance. This study utilized publicly available sMRI data from the ABIDE-II repository. First, a deep learning model, FaithfulNet, was developed to aid in the diagnosis of autism. Next, gradient-based class activation maps and the SHAP gradient explainer were employed to generate explanations for the model's predictions. These explanations were integrated to develop a novel and faithful visual explanation, Faith_CAM. Finally, this faithful explanation was quantified using the pointing game score and analyzed with cortical and subcortical structure masks to identify the impaired brain regions in the autistic brain. This study achieved a classification accuracy of 99.74% with an AUC value of 1. In addition to facilitating autism diagnosis, this study assesses the degree of impairment in memory regions responsible for the children's academic performance, thus contributing to the development of personalized treatment plans.

MRI Classification Neurological Methodology In Silico GenAI

Quantum integration in swin transformer mitigates overfitting in breast cancer screening.

Xie Z, Yang X, Zhang S, Yang J, Zhu Y, Zhang A, Sun H, Dai Q, Li L, Liu H, Ming W, Dou M

•papers•Aug 27 2025

To explore the potential of quantum computing in advancing transformer-based deep learning models for breast cancer screening, this study introduces the Quantum-Enhanced Swin Transformer (QEST). This model integrates a Variational Quantum Circuit (VQC) to replace the fully connected layer responsible for classification in the Swin Transformer architecture. In simulations, QEST exhibited competitive accuracy and generalization performance compared to the original Swin Transformer, while also demonstrating an effect in mitigating overfitting. Specifically, in 16-qubit simulations, the VQC reduced the parameter count by 62.5% compared with the replaced fully connected layer and improved the Balanced Accuracy (BACC) by 3.62% in external validation. Furthermore, validation experiments conducted on an actual quantum computer have corroborated the effectiveness of QEST.

Mammography Classification Breast Methodology In Silico GenAI

A Systematic Review on the Generative AI Applications in Human Medical Genomics

Anton Changalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov

•preprint•Aug 27 2025

Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

Mixed Modality LLM Radiology Report Review Concept Academic Lab GenAI Benchmark SOTA

Two stage large language model approach enhancing entity classification and relationship mapping in radiology reports.

Shin C, Eom D, Lee SM, Park JE, Kim K, Lee KH

•papers•Aug 27 2025

Large language models (LLMs) hold transformative potential for medical image labeling in radiology, addressing challenges posed by linguistic variability in reports. We developed a two-stage natural language processing pipeline that combines Bidirectional Encoder Representations from Transformers (BERT) and an LLM to analyze radiology reports. In the first stage (Entity Key Classification), BERT model identifies and classifies clinically relevant entities mentioned in the text. In the second stage (Relationship Mapping), the extracted entities are incorporated into the LLM to infer relationships between entity pairs, considering actual presence of entity. The pipeline targets lesion-location mapping in chest CT and diagnosis-episode mapping in brain MRI, both of which are clinically important for structuring radiologic findings and capturing temporal patterns of disease progression. Using over 400,000 reports from Seoul Asan Medical Center, our pipeline achieved a macro F1-score of 77.39 for chest CT and 70.58 for brain MRI. These results highlight the effectiveness of integrating BERT with an LLM to enhance diagnostic accuracy in radiology report analysis.

Mixed Modality LLM Radiology Report Methodology In Silico GenAI

HONeYBEE: Enabling Scalable Multimodal AI in Oncology Through Foundation Model-Driven Embeddings

Tripathi, A. G., Waqas, A., Schabath, M. B., Yilmaz, Y., Rasool, G.

•preprint•Aug 27 2025

HONeYBEE (Harmonized ONcologY Biomedical Embedding Encoder) is an open-source framework that integrates multimodal biomedical data for oncology applications. It processes clinical data (structured and unstructured), whole-slide images, radiology scans, and molecular profiles to generate unified patient-level embeddings using domain-specific foundation models and fusion strategies. These embeddings enable survival prediction, cancer-type classification, patient similarity retrieval, and cohort clustering. Evaluated on 11,400+ patients across 33 cancer types from The Cancer Genome Atlas (TCGA), clinical embeddings showed the strongest single-modality performance with 98.5% classification accuracy and 96.4% precision@10 in patient retrieval. They also achieved the highest survival prediction concordance indices across most cancer types. Multimodal fusion provided complementary benefits for specific cancers, improving overall survival prediction beyond clinical features alone. Comparative evaluation of four large language models revealed that general-purpose models like Qwen3 outperformed specialized medical models for clinical text representation, though task-specific fine-tuning improved performance on heterogeneous data such as pathology reports.

Mixed Modality Classification Methodology In Silico Open Source Open Code GenAI

Optimized deep learning for brain tumor detection: a hybrid approach with attention mechanisms and clinical explainability.

Aiya AJ, Wani N, Ramani M, Kumar A, Pant S, Kotecha K, Kulkarni A, Al-Danakh A

•papers•Aug 26 2025

Brain tumor classification (BTC) from Magnetic Resonance Imaging (MRI) is a critical diagnosis task, which is highly important for treatment planning. In this study, we propose a hybrid deep learning (DL) model that integrates VGG16, an attention mechanism, and optimized hyperparameters to classify brain tumors into different categories as glioma, meningioma, pituitary tumor, and no tumor. The approach leverages state-of-the-art preprocessing techniques, transfer learning, and Gradient-weighted Class Activation Mapping (Grad-CAM) visualization on a dataset of 7023 MRI images to enhance both performance and interpretability. The proposed model achieves 99% test accuracy and impressive precision and recall figures and outperforms traditional approaches like Support Vector Machines (SVM) with Histogram of Oriented Gradients (HOG), Local Binary Pattern (LBP) and Principal Component Analysis (PCA) by a significant margin. Moreover, the model eliminates the need for manual labelling-a common challenge in this domain-by employing end-to-end learning, which allows the proposed model to derive meaningful features hence reducing human input. The integration of attention mechanisms further promote feature selection, in turn improving classification accuracy, while Grad-CAM visualizations show which regions of the image had the greatest impact on classification decisions, leading to increased transparency in clinical settings. Overall, the synergy of superior prediction, automatic feature extraction, and improved predictability confirms the model as an important application to neural networks approaches for brain tumor classification with valuable potential for enhancing medical imaging (MI) and clinical decision-making.

MRI Classification Neurological Methodology In Silico Academic Lab GenAI

Adverse cardiovascular events in coronary Plaques not undeRgoing pErcutaneous coronary intervention evaluateD with optIcal Coherence Tomography. The PREDICT-AI risk model.

Bruno F, Immobile Molaro M, Sperti M, Bianchini F, Chu M, Cardaci C, Wańha W, Gasior P, Zecchino S, Pavani M, Vergallo R, Biscaglia S, Cerrato E, Secco GG, Mennuni M, Mancone M, De Filippo O, Mattesini A, Canova P, Boi A, Ugo F, Scarsini R, Costa F, Fabris E, Campo G, Wojakowski W, Morbiducci U, Deriu M, Tu S, Piccolo R, D'Ascenzo F, Chiastra C, Burzotta F

•papers•Aug 26 2025

Most acute coronary syndromes (ACS) originate from coronary plaques that are angiographically mild and not flow limiting. These lesions, often characterised by thin-cap fibroatheroma, large lipid cores and macrophage infiltration, are termed 'vulnerable plaques' and are associated with a heightened risk of future major adverse cardiovascular events (MACE). However, current imaging modalities lack robust predictive power, and treatment strategies for such plaques remain controversial. The PREDICT-AI study aims to develop and externally validate a machine learning (ML)-based risk score that integrates optical coherence tomography (OCT) plaque features and patient-level clinical data to predict the natural history of non-flow-limiting coronary lesions not treated with percutaneous coronary intervention (PCI). This is a multicentre, prospective, observational study enrolling 500 patients with recent ACS who undergo comprehensive three-vessel OCT imaging. Lesions not treated with PCI will be characterised using artificial intelligence (AI)-based plaque analysis (OctPlus software), including quantification of fibrous cap thickness, lipid arc, macrophage presence and other microstructural features. A three-step ML pipeline will be used to derive and validate a risk score predicting MACE at follow-up. Outcomes will be adjudicated blinded to OCT findings. The primary endpoint is MACE (composite of cardiovascular death, myocardial infarction, urgent revascularisation or target vessel revascularisation). Event prediction will be assessed at both the patient level and plaque level. The PREDICT-AI study will generate a clinically applicable, AI-driven risk stratification tool based on high-resolution intracoronary imaging. By identifying high-risk, non-obstructive coronary plaques, this model may enhance personalised management strategies and support the transition towards precision medicine in coronary artery disease.

OCT Classification Cardiac Prospective Clinical Pilot Academic Lab GenAI

Toward Non-Invasive Voice Restoration: A Deep Learning Approach Using Real-Time MRI

Saleh, M. W.

•preprint•Aug 26 2025

Despite recent advances in brain-computer interfaces (BCIs) for speech restoration, existing systems remain invasive, costly, and inaccessible to individuals with congenital mutism or neurodegenerative disease. We present a proof-of-concept pipeline that synthesizes personalized speech directly from real-time magnetic resonance imaging (rtMRI) of the vocal tract, without requiring acoustic input. Segmented rtMRI frames are mapped to articulatory class representations using a Pix2Pix conditional GAN, which are then transformed into synthetic audio waveforms by a convolutional neural network modeling the articulatory-to-acoustic relationship. The outputs are rendered into audible form and evaluated with speaker-similarity metrics derived from Resemblyzer embeddings. While preliminary, our results suggest that even silent articulatory motion encodes sufficient information to approximate a speakers vocal characteristics, offering a non-invasive direction for future speech restoration in individuals who have lost or never developed voice.

MRI Image Synthesis Methodology Concept Academic Lab GenAI

Filter Papers

Tags

Intelligent Head and Neck CTA Report Quality Detection with Large Language Models.

Evaluating the Quality and Understandability of Radiology Report Summaries Generated by ChatGPT: Survey Study.

FaithfulNet: An explainable deep learning framework for autism diagnosis using structural MRI.

Quantum integration in swin transformer mitigates overfitting in breast cancer screening.

A Systematic Review on the Generative AI Applications in Human Medical Genomics

Two stage large language model approach enhancing entity classification and relationship mapping in radiology reports.

HONeYBEE: Enabling Scalable Multimodal AI in Oncology Through Foundation Model-Driven Embeddings

Optimized deep learning for brain tumor detection: a hybrid approach with attention mechanisms and clinical explainability.

Adverse cardiovascular events in coronary Plaques not undeRgoing pErcutaneous coronary intervention evaluateD with optIcal Coherence Tomography. The PREDICT-AI risk model.

Toward Non-Invasive Voice Restoration: A Deep Learning Approach Using Real-Time MRI

Ready to Sharpen Your Edge?