Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.

September 7, 2025

papers

DOI: 10.1080/0142159X.2025.2555353 PMID: 40914879

Authors

Bang Andersen I,Søndergaard Svendsen MB,Risgaard AL,Sander Danstrup C,Todsen T,Tolsgaard MG,Friis ML

Affiliations (7)

NordSim, Center for Skills Training and Simulation, Aalborg University Hospital, Aalborg, Denmark.
Department of Otorhinolaryngology - Head and Neck Surgery, Aalborg University Hospital, Aalborg, Denmark.
Copenhagen Academy for Medical Education and Simulation (CAMES), Capital Region of Denmark, Copenhagen, Denmark.
Department of Clinical Medicine, Aalborg University, Aalborg, Denmark.
Department of Otorhinolaryngology, Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.
Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
Department of Obstetrics and Gynecology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.

Abstract

Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training. Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time. The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030). A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.

View Source Full Text PDF

Topics

Journal Article

Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.

Authors

Affiliations (7)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?