Back to all papers

Impact of automated and manual segmentation errors on knee osteoarthritis classification using MRI-registered data on CT scans.

April 13, 2026pubmed logopapers

Authors

Fox S,Ciliberti FK,Jónsson H,Gargiulo P,Recenti M

Affiliations (3)

  • Institute of Biomedical and Neural Engineering, Reykjavik University, Menntavegur 1, 102, Reykjavik, Iceland.
  • Institute of Biomedical and Neural Engineering, Reykjavik University, Menntavegur 1, 102, Reykjavik, Iceland. Electronic address: [email protected].
  • Institute of Biomedical and Neural Engineering, Reykjavik University, Menntavegur 1, 102, Reykjavik, Iceland; Department of Science, Landspitali University Hospital, Skaftahlíð 24, 105, Reykjavik, Iceland.

Abstract

Knee osteoarthritis (OA) is a prevalent, disabling disease for which early, accurate diagnosis is essential to guide treatment and reduce long-term burden. Machine learning (ML) approaches using quantitative imaging biomarkers show promise for automated OA classification, but their reliability under imperfect image segmentation remains unclear. This study evaluated the robustness of cartilage-based radiodensity and morphological features derived from MRI-registered CT scans against simulated segmentation errors. Manual segmentations of femoral, patellar, lateral tibial, and medial tibial cartilages were modified using morphological operations (erosion, dilation, and closing) to imitate automated and manual segmentation inaccuracies. A total of 130 knee scans (79 control, 51 degenerative) were analyzed. Several ML models, including tree-based models and support vector classifiers (SVC) with different kernels, were trained and tested using nested cross-validation. Statistical analyses confirmed that cartilage density variation and medial tibial cartilage volume and surface area remained significant discriminators despite pixel perturbations. Among ML models, SVC with radial basis kernel (RBF SVC) achieved the highest performance on the original dataset (F1 0.86, ROC AUC 0.91), with Linear and RBF SVC and Logistic Regression performing comparably under error-modified datasets (F1 between 0.80 and 0.86, ROC AUC between 0.86 and 0.92). While tree-based models were more sensitive to dilation errors, most models maintained weighted F1 scores ≥ 0.75. These findings demonstrate that ML classifiers can robustly distinguish degenerative from control knees even when cartilage masks are imprecisely segmented. Thus, highly precise manual segmentations may not be strictly required for reliable OA classification, suggesting potential for scalable and cost-effective deployment of ML-based diagnostic tools in clinical imaging workflows.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.