Back to all papers

Multicenter Evaluation of an Artificial Intelligence System for Automatic Recognition of Fetal Ultrasound Findings Suggestive of Congenital Malformations.

January 8, 2026pubmed logopapers

Authors

Morisset C,Logé-Munerel F,Debavelaere V,Besson R,Turan S,Fries N,Stirnemann J,Oyelese Y,Ville Y

Affiliations (1)

  • Sonio SAS, and Obstetrics, Paris Descartes University, Necker-Enfants Malades Hospital, Paris, and the IMAGYN'ECHO Medical Center, Montpellier, France; the Department of Obstetrics, Gynecology and Reproductive Sciences, University of Maryland, Baltimore, Maryland; and the Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts.

Abstract

To evaluate the diagnostic performance of an artificial intelligence (AI) system for detecting eight abnormal fetal ultrasound findings across cephalic, thoracic, and abdominal regions in routine, unfiltered, multicenter images. We performed a multicenter, retrospective evaluation of an AI software that detects eight abnormal ultrasound findings on still images. Ground truth was established by a multidisciplinary panel (board-certified reviewers with 5 or more years of experience) using a three-step process (view identification, structure visibility, sign presence or absence) with majority consensus. The software evaluated eight findings on six standard views: absence of the cavum septum pellucidum, absence of the corpus callosum, malposition of the great vessels, absence or unusual size of one of the three vessels, disequilibrium or absence of at least one of the two ventricles, thoracic situs inversus, abdominal situs inversus, and nonvisibility of a single stomach bubble or abnormally big stomach. For thoracic and abdominal situs, an evaluability step preceded classification. Primary end points were sensitivity and specificity per finding on evaluable images, with subgroup analyses by geography, device manufacturer, trimester, body mass index (BMI), demographics, anatomy, indication, and finding status. Cluster bootstrap accounted for within-patient clustering; multiplicity was controlled with Bonferroni or Hochberg correction. We analyzed 6,452 images from 1,115 examinations (11-41 weeks of gestation) from approximately 1,000 pregnancies in 942 patients across 75 international sites over five countries; 6,094 images contributed to performance estimates. Mean sensitivity for AI detection was 93.2% (95% CI, 91.6-94.6%) and mean specificity was 90.8% (95% CI, 89.5-92.0%) across the eight findings. Sensitivity was superior to 87% and specificity was superior to 81% for all findings. Abdominal situs inversus had the highest performance (sensitivity 99.3%, 95% CI, 97.6-100%; specificity 99.3%, 95% CI, 98.4-100%). Among thoracic findings, sensitivity was lowest for malposition of the great vessels (87.7%), and specificity was lowest for absence or unusual size of at least one of the three vessels (81.5%). Subgroup performance was generally consistent across manufacturers, regions, BMI categories, and trimesters. In a heterogeneous, multicenter dataset, the software reliably identified predefined ultrasound findings suggestive of congenital malformations. These results support its potential as a real-time assistant to standardize interpretation and to flag suspicious findings.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 8,100+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.