Sort by:
Page 1 of 216 results
Next

Contextual structured annotations on PACS: a futuristic vision for reporting routine oncologic imaging studies and its potential to transform clinical work and research.

Wong VK, Wang MX, Bethi E, Nagarakanti S, Morani AC, Marcal LP, Rauch GM, Brown JJ, Yedururi S

pubmed logopapersJul 26 2025
Radiologists currently have very limited and time-consuming options to annotate findings on the images and are mostly limited to arrows, calipers and lines to annotate any type of findings on most PACS systems. We propose a framework placing encoded, transferable, highly contextual structured text annotations directly on PACS images indicating the type of lesion, level of suspicion, location, lesion measurement, and TNM status for malignant lesions, along with automated integration of this information into the radiology report. This approach offers a one-stop solution to generate radiology reports that are easily understood by other radiologists, patient care providers, patients, and machines while reducing the effort needed to dictate a detailed radiology report and minimizing speech recognition errors. It also provides a framework for automated generation of large volume high quality annotated data sets for machine learning algorithms from daily work of radiologists. Enabling voice dictation of these contextual annotations directly into PACS similar to voice enabled Google search will further enhance the user experience. Wider adaptation of contextualized structured annotations in the future can facilitate studies understanding the temporal evolution of different tumor lesions across multiple lines of treatment and early detection of asynchronous response/areas of treatment failure. We present a futuristic vision, and solution with the potential to transform clinical work and research in oncologic imaging.

Disease probability-enhanced follow-up chest X-ray radiology report summary generation.

Wang Z, Deng Q, So TY, Chiu WH, Lee K, Hui ES

pubmed logopapersJul 24 2025
A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at a given examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the authors' knowledge, there is only one work dedicated to generating summary of the latter findings, i.e., follow-up radiology report summary. In this study, we propose a transformer-based framework to tackle this task. Motivated by our observations on the significance of medical lexicon on the fidelity of report summary generation, we introduce two mechanisms to bestow clinical insight to our model, namely disease probability soft guidance and masked entity modeling loss. The former mechanism employs a pretrained abnormality classifier to guide the presence level of specific abnormalities, while the latter directs the model's attention toward medical lexicon. Extensive experiments were conducted to demonstrate that the performance of our model exceeded the state-of-the-art.

LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning.

Che H, Jin H, Gu Z, Lin Y, Jin C, Chen H

pubmed logopapersJul 21 2025
Large Language Models (LLMs) have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge, we present FedMRG, the first framework that leverages Federated Learning (FL) to enable privacy-preserving, multi-center development of LLM-driven MRG models, specifically designed to overcome the critical challenge of communication-efficient LLM training under multi-modal data heterogeneity. To start with, our framework tackles the fundamental challenge of communication overhead in federated LLM tuning by employing low-rank factorization to efficiently decompose parameter updates, significantly reducing gradient transmission costs and making LLM-driven MRG feasible in bandwidth-constrained FL settings. Furthermore, we observed the dual heterogeneity in MRG under the FL scenario: varying image characteristics across medical centers, as well as diverse reporting styles and terminology preferences. To address the data heterogeneity, we further enhance FedMRG with (1) client-aware contrastive learning in the MRG encoder, coupled with diagnosis-driven prompts, which capture both globally generalizable and locally distinctive features while maintaining diagnostic accuracy; and (2) a dual-adapter mutual boosting mechanism in the MRG decoder that harmonizes generic and specialized adapters to address variations in reporting styles and terminology. Through extensive evaluation of our established FL-MRG benchmark, we demonstrate the generalizability and adaptability of FedMRG, underscoring its potential in harnessing multi-center data and generating clinically accurate reports while maintaining communication efficiency.

Medical radiology report generation: A systematic review of current deep learning methods, trends, and future directions.

Izhar A, Idris N, Japar N

pubmed logopapersJul 19 2025
Medical radiology reports play a crucial role in diagnosing various diseases, yet generating them manually is time-consuming and burdens clinical workflows. Medical radiology report generation aims to automate this process using deep learning to assist radiologists and reduce patient wait times. This study presents the most comprehensive systematic review to date on deep learning-based MRRG, encompassing recent advances that span traditional architectures to large language models. We focus on available datasets, modeling approaches, and evaluation practices. Following PRISMA guidelines, we retrieved 323 articles from major academic databases and included 78 studies after eligibility screening. We critically analyze key components such as model architectures, loss functions, datasets, evaluation metrics, and optimizers - identifying 22 widely used datasets, 14 evaluation metrics, around 20 loss functions, over 25 visual backbones, and more than 30 textual backbones. To support reproducibility and accelerate future research, we also compile links to modern models, toolkits, and pretrained resources. Our findings provide technical insights and outline future directions to address current limitations, promoting collaboration at the intersection of medical imaging, natural language processing, and deep learning to advance trustworthy AI systems in radiology.

Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.

Shi MJ, Wang ZX, Wang SK, Li XH, Zhang YL, Yan Y, An R, Dong LN, Qiu L, Tian T, Liu JX, Song HC, Wang YF, Deng C, Cao ZB, Wang HY, Wang Z, Wei W, Song J, Lu J, Wei X, Wang ZC

pubmed logopapersJul 7 2025
Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous. To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons. Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients. This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation.

Jin H, Che H, He S, Chen H

pubmed logopapersJul 3 2025
Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.

He Z, Wong ANN, Yoo JS

pubmed logopapersJul 3 2025
Radiology reports are essential in medical imaging, providing critical insights for diagnosis, treatment, and patient management by bridging the gap between radiologists and referring physicians. However, the manual generation of radiology reports is time-consuming and labor-intensive, leading to inefficiencies and delays in clinical workflows, particularly as case volumes increase. Although deep learning approaches have shown promise in automating radiology report generation, existing methods, particularly those based on the encoder-decoder framework, suffer from significant limitations. These include a lack of explainability due to black-box features generated by encoder and limited adaptability to diverse clinical settings. In this study, we address these challenges by proposing a novel deep learning framework for radiology report generation that enhances explainability, accuracy, and adaptability. Our approach replaces traditional black-box features in computer vision with transparent keyword lists, improving the interpretability of the feature extraction process. To generate these keyword lists, we apply a multi-label classification technique, which is further enhanced by an automatic keyword adaptation mechanism. This adaptation dynamically configures the multi-label classification to better adapt specific clinical environments, reducing the reliance on manually curated reference keyword lists and improving model adaptability across diverse datasets. We also introduce a frequency-based multi-label classification strategy to address the issue of keyword imbalance, ensuring that rare but clinically significant terms are accurately identified. Finally, we leverage a pre-trained text-to-text large language model (LLM) to generate human-like, clinically relevant radiology reports from the extracted keyword lists, ensuring linguistic quality and clinical coherence. We evaluate our method using two public datasets, IU-XRay and MIMIC-CXR, demonstrating superior performance over state-of-the-art methods. Our framework not only improves the accuracy and reliability of radiology report generation but also enhances the explainability of the process, fostering greater trust and adoption of AI-driven solutions in clinical practice. Comprehensive ablation studies confirm the robustness and effectiveness of each component, highlighting the significant contributions of our framework to advancing automated radiology reporting. In conclusion, we developed a novel deep-learning based radiology report generation method for preparing high-quality and explainable radiology report for chest X-ray images using the multi-label classification and a text-to-text large language model. Our method could address the lack of explainability in the current workflow and provide a clear and flexible automated pipeline to reduce the workload of radiologists and support the further applications related to Human-AI interactive communications.

Knowledge Graph-Based Few-Shot Learning for Label of Medical Imaging Reports.

Li T, Zhang Y, Su D, Liu M, Ge M, Chen L, Li C, Tang J

pubmed logopapersJul 1 2025
The application of artificial intelligence (AI) in the field of automatic imaging report labeling faces the challenge of manually labeling large datasets. To propose a data augmentation method by using knowledge graph (KG) and few-shot learning. A KG of lumbar spine X-ray images was constructed, and 2000 data were annotated based on the KG, which were divided into training, validation, and test sets in a ratio of 7:2:1. The training dataset was augmented based on the synonym/replacement attributes of the KG and was the augmented data was input into the BERT (Bidirectional Encoder Representations from Transformers) model for automatic annotation training. The performance of the model under different augmentation ratios (1:10, 1:100, 1:1000) and augmentation methods (synonyms only, replacements only, combination of synonyms and replacements) was evaluated using the precision and F1 scores. In addition, with the augmentation ratio was fixed, iterative experiments were performed by supplementing the data of nodes that perform poorly in the validation set to further improve model's performance. Prior to data augmentation, the precision was 0.728 and the F1 score was 0.666. By adjusting the augmentation ratio, the precision increased from 0.912 at a 1:10 augmentation ratio to 0.932 at a 1:100 augmentation ratio (P<.05), while F1 score improved from 0.853 at a 1:10 augmentation ratio to 0.881 at a 1:100 augmentation ratio (P<.05). Additionally, the effectiveness of various augmentation methods was compared at a 1:100 augmentation ratio. The augmentation method that combined synonyms and replacements (F1=0.881) was superior to the methods that only used synonyms (F1=0.815) and only used replacements (F1=0.753) (P<.05). For nodes that exhibited suboptimal performance on the validation set, supplementing the training set with target data improved model performance, increasing the average F1 score to 0.979 (P<.05). Based on the KG, this study trained an automatic labeling model of radiology reports using a few-shot data set. This method effectively reduces the workload of manual labeling, improves the efficiency and accuracy of image data labeling, and provides an important research strategy for the application of AI in the domain of automatic labeling of image reports.

Artificial Intelligence in Breast US Diagnosis and Report Generation.

Wang J, Tian H, Yang X, Wu H, Zhu X, Chen R, Chang A, Chen Y, Dou H, Huang R, Cheng J, Zhou Y, Gao R, Yang K, Li G, Chen J, Ni D, Dong F, Xu J, Gu N

pubmed logopapersJun 18 2025
<i>"Just Accepted" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Purpose To develop and evaluate an artificial intelligence (AI) system for generating breast ultrasound (BUS) reports. Materials and Methods This retrospective study included 104,364 cases from three hospitals (January 2020-December 2022). The AI system was trained on 82,896 cases, validated on 10,385 cases, and tested on an internal set (10,383 cases) and two external sets (300 and 400 cases). Under blind review, three senior radiologists (> 10 years of experience) evaluated AI-generated reports and those written by one midlevel radiologist (7 years of experience), as well as reports from three junior radiologists (2-3 years of experience) with and without AI assistance. The primary outcomes included the acceptance rates of Breast Imaging Reporting and Data System (BI-RADS) categories and lesion characteristics. Statistical analysis included one-sided and two-sided McNemar tests for non-inferiority and significance testing. Results In external test set 1 (300 cases), the midlevel radiologist and AI system achieved BI-RADS acceptance rates of 95.00% [285/300] versus 92.33% [277/300] (<i>P</i> < .001; non-inferiority test with a prespecified margin of 10%). In external test set 2 (400 cases), three junior radiologists had BI-RADS acceptance rates of 87.00% [348/400] versus 90.75% [363/400] (<i>P</i> = .06), 86.50% [346/400] versus 92.00% [368/400] ( <i>P</i> = .007), and 84.75% [339/400] versus 90.25% [361/400] (<i>P</i> = .02) with and without AI assistance, respectively. Conclusion The AI system performed comparably to a midlevel radiologist and aided junior radiologists in BI-RADS classification. ©RSNA, 2025.
Page 1 of 216 results
Show
per page
12»

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.