An Agentic, No Code Artificial Intelligence Workflow for Developing and Externally Validating a Thyroid Nodule Ultrasound Malignancy Classifier
Authors
Affiliations (1)
Affiliations (1)
- Mercy
Abstract
Convolutional neural networks (CNNs) can classify thyroid nodules on ultrasound, yet published models are seldom available for independent testing, require machine-learning expertise to develop and deploy, and are validated mostly on papillary thyroid carcinoma. ObjectiveTo test whether an autonomous ("agentic"), no-code artificial intelligence (AI) agent can develop a calibrated thyroid-nodule malignancy classifier, and to validate it internally and on an external cohort spanning multiple cancer histologies. MethodsThis is a retrospective, computational diagnostic study with prespecified endpoints. A no-code agent (Hugging Face ML-Intern) autonomously reviewed data, selected and trained the model and calibrated probabilities, using the open-source TN5000 dataset (3500 training, 500 validation, and 1000 test images). The trained ResNet-18 model was externally validated on 232 nodules from the University of Colorado, including follicular, medullary, oncocytic, and follicular-variant of papillary carcinomas. ResultsOn the internal test set, an agentic AI model achieved AUROC 0.94 (95% CI, 0.920-0.953), sensitivity 0.90, and specificity 0.80. On external validation, agentic AI model achieved an AUROC of 0.90 (95% CI, 0.850-0.936), sensitivity of 0.92, and specificity of 0.68, negative predictive value of 0.96, and positive predictive value of 0.52, exceeding the performance of a previously published classifier on the same cohort (AUROC of 0.83). ConclusionsAn agentic, no-code AI workflow produced a calibrated, externally validated thyroid nodule classifier, supporting accessible, reproducible, and independently testable medical AI development. Prospective validation and local recalibration are required before clinical use.