Defining concepts, services, and tools for managing data


The transfer of material to a facility authorized to appraise, preserve, and provide access to those records.

big data

Large data sets that may be analyzed computationally to reveal patterns or relationships.

cloud computing

Cloud computing allows users to take advantage of a network of servers either in a single location or distributed across the country or world. Cloud computing is easier to scale than owning local servers and therefore allows researchers to increase computing power on an as-needed basis.

common data elements (CDE)

Common Data Elements (CDEs) are standardized, precisely defined questions paired with a set of specific allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection. CDEs may consist of a single data element, such as height, gender, or date of birth, or a collection of connected questions, such as a survey instrument used as a depression index or a quality of life scale.


Recorded factual material commonly accepted in the scientific community as required to validate research findings. The term "data" does not have one clear definition; it is often interpreted differently depending on the field of study.

data lifecycle

The data lifecycle represents all of the stages of data throughout its life from creation to distribution and reuse.

data management plan (DMP)

A Data Management Plan (DMP) determines how data should be collected, normalized, processed, analyzed, preserved, used, and re-used over its lifetime. A data management plan associated with a research study can include comprehensive information including the types of data, metadata standards used, policies for access and sharing, and plans for archiving and preserving data to make accessible over time. DMPs ensure data will be properly documented and available for use by researchers in the future and are often required by grant funding agencies such as the National Science Foundation. To learn more, visit Data Management Plan.

data repository

A place to hold data, make data available for use, and organize data in a logical manner. An appropriate, subject-specific location where researchers can submit their data. Data repositories may have specific requirements concerning subject or research domain; data re-use and access; file format and data structure; and the types of metadata that can be used.

data security

Data security refers to ways data is kept safe from harm, alteration, or unauthorized access during gathering, analysis, storage, and transmission. Computer systems used to store data should have security measures such as firewalls, virus protection, and strong password protection.

data sharing

Data sharing makes scholarly research data available to other investigators. Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are considered important parts of scientific discovery. Currently, in the biomedical field, the National Science Foundation and the National Institutes of Health have implemented data sharing policies that either expect or require scientific researchers to share their data.

data use agreement

An expression of conditions under which a data set may be used. May be formal, as in a license or contract, or an informal expression of the preferences of the data owner(s).

electronic lab notebook

A software tool that in its most basic form replicates an interface much like a page in a paper lab notebook. In an electronic notebook, you can enter protocols, observations, notes, and other data using your computer or mobile device. This offers several advantages over the traditional paper notebook.

knowledge transfer file

Assists with the transfer of knowledge from one part of the organization to another. Knowledge transfer seeks to organize, create, capture or distribute knowledge and ensure its availability for future users. The file should contain essential informative information related to projects and datasets to ensure the success of future users.


Structured information about a resource that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage that resource. It ensures that the context for how your data was created, analyzed, or stored is clear, detailed, and therefore, reproducible. For additional information about how and when to record metadata, visit the Metadata Overview.

open access

Open Access is a movement to make research including articles, data, software, etc., freely available online to the public without financial and technical barriers. Researchers may be mandated by funders or institutions to make their work open access. While open access works are free to view, they may not be free to reuse depending on the license. Creative Commons and Public Domain licenses are the most commonly attached to open access works.


A plain text file that contains information about other files in a folder. It is a best practice to create a README document for each distinct dataset at the beginning of a project.

research data management (RDM)

Research Data Management (RDM) is a concept used to describe the managing, sharing, and archiving of research data to make it more accessible to the broader research community. Research data management provides an opportunity for researchers to create a plan ensuring data will be organized and shared with other researchers, or archived for long-term preservation.

restricted data

Data made available under stringent, secure conditions. Typically confidential or sensitive data.

This list was compiled in part using terms from Data Research Glossary-Cornell and ICPSR and NNLM.