Automated generation of structured breast ultrasound reports using BreastViT and ChatGPT.

July 2, 2026

papers

DOI: 10.1186/s12911-026-03677-w PMID: 42393650

Authors

Feng G,Xie X,Jiang J,Lee JM,Cui L

Affiliations (3)

Department of Ultrasound, Peking University Third Hospital, Beijing, 100191, China.
Department of Radiology, Seoul National University Hospital, Seoul, Korea.
Department of Ultrasound, Peking University Third Hospital, Beijing, 100191, China. [email protected].

Abstract

Breast cancer is the most common malignancy in women. Ultrasound plays a critical role in dense breasts, and BI-RADS provides a standardized framework for lesion assessment. However, conventional reports may suffer from variability. Deep learning and large language models (LLMs) show promise in automated report generation. We propose a workflow integrating deep learning with GPT-4o for structured breast ultrasound reports. We retrospectively collected 2,243 ultrasound images from 362 patients (BI-RADS 4B, 4C, 5; 2019-2024). The proposed BreastViT model, a VisionEncoderDecoderModel (pretrained: nlpconnect/vit-gpt2-image-captioning), was compared against three baseline architectures: CNN-Transformer (R2Gen), CNN-Attention-LSTM, and CNN-RNN. Generated texts were refined by GPT-4o for language optimization and terminology standardization. An external validation set (49 cases, Oct-Dec 2024) compared three outputs: GPT-4o alone, BreastViT outputs, and BreastViT + GPT-4o. Internally, BreastViT achieved a best BLEU of 0.9187 and loss of 0.1277. GPT-4o refinement markedly improved fluency and structure. In external validation, GPT-4o alone produced natural language but occasional image inconsistencies; BreastViT outputs captured key findings but lacked structure; the combined approach yielded the best accuracy, completeness, and terminological consistency. In blinded radiologist evaluation, the BreastViT + GPT-4o reports were rated highest for structural integrity and terminology standardization. In the external validation, a blinded evaluation was conducted by three senior radiologists. The intraclass correlation coefficient (ICC) demonstrated excellent inter-rater reliability (ICC = 0.8808; 95% CI: 0.86-0.90). Results indicated that the combined BreastViT + GPT-4o model achieved the highest scores across all categories, including Clinical Accuracy, Information Completeness, Structural Integrity, and Terminology Standardization (7.31 ± 0.94, 7.61 ± 0.66, 8.03 ± 0.93, 8.02 ± 0.98, respectively). These scores were significantly superior to those of the standalone models (all P < 0.05). The proposed BreastViT + GPT-4o workflow automatically generates clinically compliant structured breast ultrasound reports, which enhances readability and standardization. This approach can improve report consistency and efficiency, offering a promising pathway for clinical integration.

View Source Full Text PDF

Topics

Journal Article

Automated generation of structured breast ultrasound reports using BreastViT and ChatGPT.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?