Back to all papers

DeepSeek-R1 for automated scoring in radiology residency examinations: an agreement and test-retest reliability study.

November 11, 2025pubmed logopapers

Authors

Niu S,Liu X,Huang L,Li Y,Wang G

Affiliations (3)

  • Department of Radiology, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong, China.
  • Department of Radiology, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong, China. [email protected].
  • Department of Radiology, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong, China. [email protected].

Abstract

This study evaluates the feasibility of employing DeepSeek-R1 for automated scoring in examinations for radiology residents, comparing its performance with that of radiologists. A cross-sectional study was undertaken to assess 504 diagnostic radiology reports produced by eighteen third-year radiology residents. The evaluations were independently conducted by Radiologist A, Radiologist B, and DeepSeek-R1 (as of June 15, 2025), utilizing standardized scoring rubrics and predefined evaluation criteria. One month after the initial evaluation, a re-assessment was performed by DeepSeek-R1 and Radiologist A. The inter-rater reliability among Radiologist A, Radiologist B, and DeepSeek-R1, in addition to the test-retest reliability, was analyzed using intraclass correlation coefficients (ICC). The ICC values between DeepSeek-R1 and Radiologist A, DeepSeek-R1 and Radiologist B, and Radiologist A and Radiologist B were found to be 0.879, 0.820, and 0.862, respectively. The test-retest ICC for DeepSeek-R1 was determined to be 0.922, whereas for Radiologist A, it was 0.952. The ICC between DeepSeek-R1 (re-test) and Radiologist A (re-test) was 0.885. The performance of DeepSeek-R1 was comparable to that of radiologists in the evaluation of radiology residents' reports. The integration of DeepSeek-R1 into medical education could effectively assist in assessment tasks, potentially alleviating faculty workload while preserving the quality of evaluations.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.