Back to all papers

A machine learning approach for non-invasive PCOS diagnosis from ultrasound and clinical features.

Authors

Agirsoy M,Oehlschlaeger MA

Affiliations (2)

  • Department of Mechanical, Aerospace, and Nuclear Engineering, Rensselaer Polytechnic Institute, Troy, USA. [email protected].
  • Department of Mechanical, Aerospace, and Nuclear Engineering, Rensselaer Polytechnic Institute, Troy, USA.

Abstract

This study investigates the use of machine learning (ML) algorithms to support faster and more accurate diagnosis of polycystic ovary syndrome (PCOS), with a focus on both predictive performance and clinical applicability. Multiple algorithms were evaluated-including Artificial Neural Networks (ANN), Support Vector Machines (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost). XGBoost consistently outperformed the other models and was selected for final development and validation. To align with the Rotterdam criteria, the dataset was structured into three feature categories: clinical, biochemical, and ultrasound (USG) data. The study explored various combinations of these feature subsets to identify the most efficient diagnostic pathways. Feature selection using the chi-square-based SelectKBest method revealed the top 10 predictive features, which were further validated through XGBoost's internal feature importance, SHAP analysis, and expert clinical assessment. The final XGBoost model demonstrated robust performance across multiple feature combinations: • Clinical + USG + AMH: AUC = 0.9947, Precision = 0.9553, F1 Score = 0.9553, Accuracy = 0.9553. • Clinical + USG: AUC = 0.9852, Precision = 0.9583, F1 Score = 0.9388, Accuracy = 0.9384. The most influential features included follicle count on both ovaries, weight gain, Anti-Müllerian Hormone (AMH), hair growth, menstrual irregularity, fast food consumption, pimples, and hair loss, levels. External validation was performed using a publicly available dataset containing 320 instances and 18 diagnostic features. The XGBoost model trained on the top-ranked features achieved perfect performance on the test set (AUC = 1.0, Precision = 1.0, F1 Score = 1.0, Accuracy = 1.0), though further validation is necessary to rule out overfitting or data leakage. These findings suggest that combining clinical and ultrasound features enables highly accurate, non-invasive, and cost-effective PCOS diagnosis. This study demonstrates the potential of ML-driven tools to streamline clinical workflows, reduce reliance on invasive diagnostics, and support early intervention in women's health.

Topics

Polycystic Ovary SyndromeMachine LearningJournal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.