From Consensus to Standardization: Evaluating Deep Learning for Nerve Block Segmentation in Ultrasound Imaging.
Authors
Affiliations (3)
Affiliations (3)
- From the Department of Experimental Surgery, McGill University Health Center, Montreal, Quebec, Canada.
- Department of Anesthesia, McGill University, Montreal, Quebec, Canada.
- Department of Psychiatry, Integrated Program in Neuroscience (IPN), McGill University, Montreal, Quebec, Canada.
Abstract
Deep learning can automate nerve identification by learning from expert-labeled examples to detect and highlight nerves in ultrasound images. This study aims to evaluate the performance of deep-learning models in identifying nerves for ultrasound-guided nerve blocks. A total of 3594 raw ultrasound images were collected from public sources-an open GitHub repository and publicly available YouTube videos-covering 9 nerve block regions: Transversus Abdominis Plane (TAP), Femoral Nerve, Posterior Rectus Sheath, Median and Ulnar Nerves, Pectoralis Plane, Sciatic Nerve, Infraclavicular Brachial Plexus, Supraclavicular Brachial Plexus, and Interscalene Brachial Plexus. Of these, 10 images per nerve region were kept for testing, with each image labeled by 10 expert anesthesiologists. The remaining 3504 were labeled by a medical anesthesia resident and augmented to create a diverse training dataset of 25,000 images per nerve region. Additionally, 908 negative ultrasound images, which do not contain the targeted nerve structures, were included to improve model robustness. Ten convolutional neural network-based deep-learning architectures were selected to identify nerve structures. Models were trained using a 5-fold cross-validation approach on an Extended Video Graphics Array (EVGA) GeForce RTX 3090 GPU, with batch size, number of epochs, and the Adam optimizer adjusted to enhance the models' effectiveness. Posttraining, models were evaluated on a set of 10 images per nerve region, using the Dice score (range: 0 to 1, where 1 indicates perfect agreement and 0 indicates no overlap) to compare model predictions with expert-labeled images. Further validation was conducted by 10 medical experts who assessed whether they would insert a needle into the model's predictions. Statistical analyses were performed to explore the relationship between Dice scores and expert responses. The R2U-Net model achieved the highest average Dice score (0.7619) across all nerve regions, outperforming other models (0.7123-0.7619). However, statistically significant differences in model performance were observed only for the TAP nerve region (χ² = 26.4, df = 9, P = .002, ε² = 0.267). Expert evaluations indicated high accuracy in the model predictions, particularly for the Popliteal nerve region, where experts agreed to insert a needle based on all 100 model-generated predictions. Logistic modeling suggested that higher Dice overlap might increase the odds of expert acceptance in the Supraclavicular region (odds ratio [OR] = 8.59 × 10⁴, 95% confidence interval [CI], 0.33-2.25 × 10¹⁰; P = .073). The findings demonstrate the potential of deep-learning models, such as R2U-Net, to deliver consistent segmentation results in ultrasound-guided nerve block procedures.