Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Authors

Leoncini A,Trimboli P

Affiliations (3)

  • Clinic for Radiology, Imaging Institute of Southern Switzerland, Ente Ospedaliero Cantonale (EOC), Bellinzona, Switzerland.
  • Thyroid Unit, Clinic for Endocrinology and Diabetology, Ente Ospedaliero Cantonale (EOC), Bellinzona, Switzerland. [email protected].
  • Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland. [email protected].

Abstract

As ultrasound (US) is the most accurate tool for assessing the thyroid nodule (TN) risk of malignancy (RoM), international societies have published various Thyroid Imaging and Reporting Data Systems (TIRADSs). With the recent advent of artificial intelligence (AI), clinicians and researchers should ask themselves how AI could interpret the terminology of the TIRADSs and whether or not AIs agree in the risk assessment of TNs. The study aim was to analyze the interobserver agreement (IOA) between AIs in assessing the RoM of TNs across various TIRADSs categories using a cases series created combining TIRADSs descriptors. ChatGPT, Google Gemini, and Claude were compared. ACR-TIRADS, EU-TIRADS, and K-TIRADS, were employed to evaluate the AI assessment. Multiple written scenarios for the three TIRADS were created, the cases were evaluated by the three AIs, and their assessments were analyzed and compared. The IOA was estimated by comparing the kappa (κ) values. Ninety scenarios were created. With ACR-TIRADS the IOA analysis gave κ = 0.58 between ChatGPT and Gemini, 0.53 between ChatGPT and Claude, and 0.90 between Gemini and Claude. With EU-TIRADS it was observed κ value = 0.73 between ChatGPT and Gemini, 0.62 between ChatGPT and Claude, and 0.72 between Gemini and Claude. With K-TIRADS it was found κ = 0.88 between ChatGPT and Gemini, 0.70 between ChatGPT and Claude, and 0.61 between Gemini and Claude. This study found that there were non-negligible variability between the three AIs. Clinicians and patients should be aware of these new findings.

Topics

Journal Article
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.