A Case Study on Colposcopy-Based Cervical Cancer Staging Reveals an Alarming Lack of Data Sharing Hindering the Adoption of Machine Learning in Clinical Practice
Authors
Affiliations (1)
Affiliations (1)
- Department of Medical Statistics, University Medical Center Goettingen
Abstract
BackgroundThe inbuilt ability to adapt existing models to new applications has been one of the key drivers of the success of deep learning models. Thereby, sharing trained models is crucial for their adaptation to different populations and domains. Not sharing models prohibits validation and potentially following translation into clinical practice, and hinders scientific progress. In this paper we examine the current state of data and model sharing in the medical field using cervical cancer staging on colposcopy images as a case example. MethodsWe conducted a comprehensive literature search in PubMed to identify studies employing machine learning techniques in the analysis of colposcopy images. For studies where raw data was not directly accessible, we systematically inquired about accessing the pre-trained model weights and/or raw colposcopy image data by contacting the authors using various channels. ResultsWe included 46 studies and one publicly available dataset in our study. We retrieved data of the latter and inquired about data access for the 46 studies by contacting a total of 92 authors. We received 15 responses related to 14 studies (30%). The remaining 32 studies remained unresponsive (70%). Of the 15 responses received, two responses redirected our inquiry to other authors, two responses were initially pending, and 11 declined data sharing. Despite our follow-up efforts on all responses received, none of the inquiries led to actual data sharing (0%). The only available data source remained the publicly available dataset. ConclusionsDespite the long-standing demands for reproducible research and efforts to incentivize data sharing, such as the requirement of data availability statements, our case study reveals a persistent lack of data sharing culture. Reasons identified in this case study include a lack of resources to provide the data, data privacy concerns, ongoing trial registrations and low response rates to inquiries. Potential routes for improvement could include comprehensive data availability statements required by journals, data preparation and deposition in a repository as part of the publication process, an automatic maximal embargo time after which data will become openly accessible and data sharing rules set by funders.