Back to all papers

Transforming a clinical study database into a structured database adapted to artificial intelligence applications.

February 16, 2026pubmed logopapers

Authors

Sauron T,Lazarus C,Kurtz C,Cloppet F,Thomassin Naggara I,Fournier L

Affiliations (4)

  • LIPADE, Université Paris Cité, Paris, France.
  • Philips Research France, Paris, France.
  • Sorbonne Université, AP-HP, Department of Diagnostic and Interventional Imaging, Hôpital Tenon, Paris, France.
  • Université Paris Cité, AP-HP, Department of Radiology, Hôpital Européen Georges Pompidou, PARCC UMRS 970, INSERM, Paris, France. [email protected].

Abstract

Medical imaging databases suitable for training machine learning/computer vision algorithms are scarce, limiting the potential for development and generalisation of clinical tools. Clinical trial databases are a source of data, known for their high-quality data and reliable annotations. However, they are not tailored to the needs of machine learning or deep learning models. Our objective was to develop a methodology and tools that enable the curation of these databases specifically for the training or testing of artificial intelligence tools. MRIs from the French centres of the EURAD clinical trial (MRI of women with pelvic adnexal lesions) were used to constitute the database. We developed the steps required to curate a clinical trial database: definition of inclusion and exclusion criteria, removal of unnecessary data according to the principle of parsimony, quality control, and harmonisation. A total of 713 patients were included in our study. The directory structure was simplified, and the number of files and folders decreased by 44% and 95% respectively. Only 62 DICOM fields were considered necessary for artificial intelligence (AI) model applications. Quality control was implemented in repeated cycles of automatic checks, followed by a final manual random inspection. Finally, sequence names were harmonised for easy identification when developing models. Using a clinical trial database, we propose a methodology to build a database suitable to train or test AI algorithms. This study underlines the need for a more global and systematic framework for the secondary use of health data to develop AI imaging tools for patient care. We propose and detail a framework and tools to curate a clinical trial database to allow secondary use of the high-quality annotated data generated in clinical trials for the training and testing of artificial intelligence models. Clinical trial imaging databases are not adapted for AI model development. A curation process of MRI databases was developed for machine learning applications. We share the open-source tools and methodology developed in this study.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.