Back to all papers

Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.

Authors

Bang Andersen I,Søndergaard Svendsen MB,Risgaard AL,Sander Danstrup C,Todsen T,Tolsgaard MG,Friis ML

Affiliations (7)

  • NordSim, Center for Skills Training and Simulation, Aalborg University Hospital, Aalborg, Denmark.
  • Department of Otorhinolaryngology - Head and Neck Surgery, Aalborg University Hospital, Aalborg, Denmark.
  • Copenhagen Academy for Medical Education and Simulation (CAMES), Capital Region of Denmark, Copenhagen, Denmark.
  • Department of Clinical Medicine, Aalborg University, Aalborg, Denmark.
  • Department of Otorhinolaryngology, Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.
  • Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
  • Department of Obstetrics and Gynecology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.

Abstract

Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training. Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time. The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030). A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.