Optimizing and Evaluating Robustness of AI for Brain Metastasis Detection and Segmentation via Loss Functions and Multi-dataset Training
Authors
Affiliations (1)
Affiliations (1)
- Baylor St Luke\'s Medical Center
Abstract
Purpose: Accurate detection and segmentation of brain metastases (BM) from MRI are critical for the appropriate management of cancer patients. This study investigates strategies to enhance the robustness of artificial intelligence (AI)-based BM detection and segmentation models. Method: A DeepMedic-based network with a loss function, tunable with a sensitivity/specificity tradeoff weighting factor \alpha- was trained on T1 post-contrast MRI datasets from two institutions (514 patients, 4520 lesions). Robustness was evaluated on an external dataset from a third institution dataset (91 patients, 397 lesions), featuring ground truth annotations from two physicians. We investigated the impact of loss function weighting factor, \alpha and training dataset combinations. Detection performance (sensitivity, precision, F1 score) and segmentation accuracy (Dice similarity, and 95% Hausdorff distance (HD95)) were evaluated using one physician contours as the reference standard. The optimal AI model was then directly compared to the performance of the second physician. Results: Varying demonstrated a trade-off between sensitivity (higher ) and precision (lower ), with =0.5 yielding the best F1 score (0.80 {+/-} 0.04 vs. 0.78 {+/-} 0.04 for =0.95 and 0.72 {+/-} 0.03 for =0.99) on the external dataset. The optimally trained model achieved detection performance comparable to the physician (F1: AI=0.83 {+/-} 0.04, Physician=0.83 {+/-} 0.04), but slightly underperformed in segmentation (Dice: 0.79 {+/-} 0.04 vs. AI=0.74 {+/-} 0.03; HD95: 2.8 {+/-} 0.14 mm vs. AI=3.18 {+/-} 0.16 mm, p<0.05). Conclusion: The derived optimal model achieves detection and segmentation performance comparable to an expert physician in a parallel comparison.