Collecting and Organizing DataThe data lifecycle stage for Collect & Create includes substeps Collaborative Tools & Software, Electric Lab Notebooks, Documentation & Metadata, and Reproducibility. The data lifecycle stage for Analyze & Collaborate includes Analysis Ready Datasets, Image Management, and Version Control.

When it comes to managing data and documents within a team or project, you must implement processes that work best for everyone. Proper data documentation provides the information necessary to fully understand and interpret the data, now and in the future.

Capturing appropriate metadata facilitates:

  • discovery
  • reuse
  • reproducibility
  • preservation
  • archiving of data

Tips for Getting Started

  • Keep sufficient documentation: There are many “levels” at which Documentation & Metadata are written and they come in many forms. Record all information necessary to understand the content and context of the data. Stored alongside your research data such as in lab notebooks, databases, or in README files.
  • Using a system for active project Version Control can help keep track of all sorts of files, including text documents and analysis code. And avoid Final_v2_rev3_final_FINAL.docx headaches!
  • Plan for reproducibility before starting a research project by creating a plan and setting up the research space. Reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials and procedures as were used by the original investigator.

Considerations for Data Analysis

Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making.

The choices you make while analyzing your data can also contribute to effectively managing your research data:

  • Document your steps: Consider the software you use for analysis, and whether those applications automatically generate information about your data files and process steps. Keeping track of your steps can save you time when you want to recreate your work, or share your methodology with others! Use Electronic Lab Notebooks, Collaborative Tools & Software, and Image Management platforms.
  • Keep your data safe: Describe your data as you capture it, organize your files, and make smart choices about where you store your data. Since some software programs produce files that are proprietary and can only be opened in their applications, consider saving data in formats that can be opened by different software programs. Ensure you are working with Analysis Ready Datasets.

Scenario Examples

  • File evolution and migration

    Read about how to manage growing and changing projects

    As the complexity of the project and data evolves, you may find a need to update conventions set-up at the start of the project. For example, when a new project is spawned from a long multi-year project that has different needs or you inherit data and files transferred from another institution that used different naming conventions than your institution. If there is already a structure in place, see what fits for your team, and modify as needed. If your project already has hundreds of unmanaged files or in a structure that does not work for your team, you'll need to weigh the cost and time to implement the above strategies.

    There are three main options to tackle this issue, from least to most time intensive to implement:

    1. Only new files will fully follow the team's collect & create policies (old files will remain in old system)
    2. Pull forward old files from the old system into the new as they are used
    3. All files will be moved into the team's system

     

    Using README files, when inheriting project files or getting a handle on unmanaged files, will help document choices made not only for the team, but for yourself when you need to review several months or years later.