Reasoning Model-Assisted Second-Reader Quality Control of Chinese-Language Ultrasound Reports: A Retrospective Imaging Informatics Study.

June 29, 2026

papers

DOI: 10.1007/s10278-026-02068-x PMID: 42371279

Authors

Zhang Z,Jiang Z,Qi Y,Hu Z,Zhuang M,Li J,Wang L

Affiliations (5)

General Affairs Office, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, 610041, China.
School of Health Sciences, College of Health and Human Sciences, Purdue University, West Lafayette, IN, 47907, USA.
Finance Department, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, 610041, China.
Ultrasound Medical Center, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, 610041, China.
Ultrasound Medical Center, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, 610041, China. [email protected].

Abstract

The purpose of this study is to evaluate whether the reasoning model DeepSeek-R1 can function as a second-reader quality control (QC) tool for Chinese-language ultrasound reports. In this retrospective diagnostic-accuracy study with a parallel blinded review design, 500 deidentified finalized ultrasound reports were randomly sampled from 9711 eligible reports finalized in 2024 at a tertiary cancer center. DeepSeek-R1 and physician reviewer groups independently evaluated the same reports, and none of the review conditions had access to the outputs of the others or to the consensus reference standard. DeepSeek-R1 achieved 69.1% sensitivity and 98.1% specificity. DeepSeek-R1 showed numerically higher sensitivity than senior physicians (69.1% vs 47.1%), though this difference did not reach significance after Bonferroni correction (adjusted p = 0.147); specificity was identical at 98.1% for both. The model performed best for findings-impression discordance (36/42) and more modestly for completeness/template/indicator violations (7/21). In a post hoc exploratory OR-rule simulation, the combined workflow yielded 95.6% sensitivity (95% CI 87.8-98.5) and 96.3% specificity (95% CI 94.1-97.7). This retrospective single-center study provides workflow-level feasibility evidence that a reasoning model can serve as a high-specificity second-reader control for finalized ultrasound report text, with human review retained for local rules, exceptions, and final sign-off.

View Source Full Text PDF

Topics

Journal Article

Reasoning Model-Assisted Second-Reader Quality Control of Chinese-Language Ultrasound Reports: A Retrospective Imaging Informatics Study.

Authors

Affiliations (5)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?