Back to all papers

An artificial intelligence platform for automated measurement and count estimation of ovarian follicles during ovarian stimulation and IVF: a multicenter study.

January 3, 2026pubmed logopapers

Authors

Wygocki P,Zapała A,Ulfig M,Zieleń M,Zieliński K,Gajewska N,Drzyzga D,Wrochna M,Sankowski P,Letterie G

Affiliations (7)

  • MIM Fertility, Ul. Świeradowska 47, 02-662, Warsaw, Poland. [email protected].
  • Institute of Informatics, University of Warsaw, Warsaw, Poland. [email protected].
  • MIM Fertility, Ul. Świeradowska 47, 02-662, Warsaw, Poland.
  • Institute of Informatics, University of Warsaw, Warsaw, Poland.
  • INVICTA Research and Development Center, Sopot, Poland.
  • Department of Biomedical Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland.
  • Seattle Reproductive Medicine, Seattle, WA, USA.

Abstract

Ultrasound measurement of follicle diameter is essential in IVF monitoring. This study evaluates the analytical performance of follicle counts and size measurements from two-dimensional images using an AI-based platform, compared to assessments by certified sonographers. A total of 5508 TVUS scans from 1689 patients undergoing controlled ovarian stimulation across four IVF centers (Poland, Argentina, Colombia, and the USA) were retrospectively analyzed. All visible follicles were marked using bounding boxes. The dataset included three subsets: training/validation for model development, independent test for evaluating performance across ultrasound systems, and a consensus test set (102 scans from 27 patients) annotated by three expert sonographers. Model performance was assessed using precision, recall, and F1 score. Annotation efficiency was measured by comparing manual and AI-assisted times. Real-world performance was evaluated on a prospective cohort of 904 scans from 269 patients, based on expert adjustments to AI annotations. For follicles ≥ 10 mm, the model achieved 98.2% precision (95% CI, 96.5-99.2), 88.9% recall (85.0-91.8), and 93.3% F1 score (90.7-95.1). For all follicles, precision and recall were 94.2% (92.8-95.4) and 68.9% (65.9-71.9). Annotation time was reduced 2.5-fold (p < 0.01), with an average of 0.54 expert adjustments per scan (CI, 0.47-0.62). Model performance was stable across ultrasound platforms. This AI platform enables accurate, automated follicle counting and measurement during ovarian stimulation. It matches expert-level performance, improves efficiency, and supports scalable, cost-effective fertility care without compromising quality.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 8,000+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.