Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.
Authors
Affiliations (7)
Affiliations (7)
- NordSim, Center for Skills Training and Simulation, Aalborg University Hospital, Aalborg, Denmark.
- Department of Otorhinolaryngology - Head and Neck Surgery, Aalborg University Hospital, Aalborg, Denmark.
- Copenhagen Academy for Medical Education and Simulation (CAMES), Capital Region of Denmark, Copenhagen, Denmark.
- Department of Clinical Medicine, Aalborg University, Aalborg, Denmark.
- Department of Otorhinolaryngology, Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
- Department of Obstetrics and Gynecology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.
Abstract
Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training. Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time. The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030). A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.