In the last decade, it has become increasingly common for researchers to make their data available to others when they complete a study. This is usually referred to as data sharing or data publishing. Data sharing is growing mostly due to recent data policies from journals and funders.
Other Considerations
How can I maximize my data's reuse?
- Share data and code in open trusted repositories
- Use persistent links from publication to data and code
- Citation to data and code should be a standard
- Document data, code, workflows, and computational environment
- Use open license for your code and data
- Make use of a data provenance tool
What is reproducibility and why does it matter?
- Reproducibility and Replication (National Science Foundation) (see Reproducibility)
- Reproducibility: the ability for a researcher to replicate the results of a prior study using the same materials and procedures used by the original investigator (reproducibility)
- Replication: the same procedures are followed but new data are collected (replication)
- Empirical, Computational, Statistical Reproducibility (Stodden, 2014)
- Empirical: data and collection details are made freely available
- Computational: code, software, hardware, and implementations details are provided
- Statistical: details on choice of statistics tests, model parameters are provided
How do I handle sensitive data?
- You may find yourself in a situation where your ideal sharing method or repository is at odds with data sharing requirements
- For example, the data you’ve collected may contain sensitive information (see Data Security) which could limit you from publishing in journals where open data is required
- Therefore, in order to share you should plan to make a de-identified or subset of the data available (see Clinical Data Management)
Learn more about data sharing in this webinar.