Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. It ensures that the context for how your data was created, analyzed and stored, is clear, detailed and therefore, reproducible.
Subject-Specific Guidance
-
Clinical Data
Researchers are urged to use standardized terminology wherever possibleClinical metadata may include elements that pose a risk of patient identification. It is the institutions' responsibility to provide the appropriate privacy training for handling this information. Explore the NIAID GSCID/BRC Clinical Metadata Standard.
Consider consulting and using the following codes, identifiers, and terminologies in your data, as applicable:
Common Data Elements and Standards
- CDEs provide a way to standardize data collection so that related data can be pooled and analyzed across multiple studies or to investigate relationships between data in unrelated datasets
- NIH Common Data Elements (CDE) Repository
Patient Identifiers
- Social Security Number
- Taxpayer Identification Number
- National Provider Number
Diagnosis and procedure codes
- International Classification of Diseases (ICD)
- Current Procedural Terminology (CPT)
Drug and device codes
-
Experimental Biomedical Research
Metadata about scientific experiments are essential for finding, retrieving, and reusing dataThere are several current projects at LMA (many NIH-funded) that are focused on developing recommended metadata standards for describing experimental biomedical research data. In addition to the mentioned guidelines, biomedical researchers are also encouraged to consult FAIRSharing.org, an educational resource, and portal to metadata standards, databases, and data policies for a variety of disciplines.
Please Note: When describing experimental biomedical research data, it is important to identify not only the canonical reagents but also the actual batches of those reagents that were used to create your data.
- A canonical reagent is the ideal of the reagent, and its definition and description are true for all examples of that reagent
- Batches are the physical lots (daughters) of the canonical reagent, and there is often slight variation between batches
NIH LINCS
“The LINCS project is based on the premise that disrupting any one of the many steps of a given biological process will cause related changes in the molecular and cellular characteristics, behavior, and/or function of the cell – the observable composite of which is known as the cellular phenotype. Observing how and when a cell’s phenotype is altered by specific stressors can provide clues about the underlying mechanisms involved in perturbation and, ultimately, disease.”
- LINCS metadata standards
- These metadata standards were developed to describe LINCS reagents, assays, and experiments. They provide guidance for required, required if applicable, and optional elements.
- These metadata standards were developed to describe LINCS reagents, assays, and experiments. They provide guidance for required, required if applicable, and optional elements.
Illuminating the Druggable Genome (IDG) Consortium
“The goal of the Illuminating the Druggable Genome (IDG) program is to identify and provide information on proteins that are currently not well studied within commonly drug-targeted protein families.”
-
Proteomics Data
Metadata is crucial to interpret and reanalyze deposited datasetsThe following resources provide metadata recommendations for proteomics, interactomics, and metabolomics research:
HUPO Proteomics Standards Initiative
"The HUPO Proteomics Standards Initiative defines community standards for data representation in proteomics and interactomics to facilitate data comparison, exchange and verification."
- Dai C, Füllgrabe A, Pfeuffer J, Solovyeva EM, Deng J, Moreno P, Kamatchinathan S, Kundu DJ, George N, Fexova S, Grüning B. "A proteomics sample metadata representation for multiomics integration and big data analysis." Nature Communications. 2021. 12(1): 1-8. https://doi.org/10.1038/s41467-021-26111-3
- Snyder, M, Mias, G, Standberry, L, Kolker, E. "Metadata Checklist for the Integrated Personal OMICS Study: Proteomics and Metabolomics Experiments." Omics. 2014. 18(1): 81-85. https://doi.org/10.1089/omi.2013.0148
Additional Medical Metadata Standards
-
Metadata Schemas
Labeling, tagging or coding system used for recording cataloging information or structuring descriptive records- MIBBI – Minimum Information for Biological Investigators (portal to over 40 biomedical data standards)
- OME – Open Microscopy Environment – Data Model and File Formats 6.2.0 (microscopy data)
- Protocol Data Elements Definitions (clinical trials data)
- ISA-Tab (“omics-based” biological data)
- Digital Curation Centre’s list of Disciplinary Metadata Standards
- Data Documentation Initiative (social sciences data)
- Dublin Core (general)
- Darwin Core (biological data)
-
Controlled Vocabularies
Lists of predefined terms by a community or research groupMedical research and biomedical professional communities may employ controlled vocabulary standards such as:
- CMS coding and other coding and billing standards
- NLM’s Unified Medical Language System that consolidates several health and biomedical vocabularies and standards to enable interoperability between computer systems.
-
Ontologies
Variety of controlled vocabulary that defines components and describes relationships among componentsMost ontologies are used for interoperability among databases, some using (Web Ontology Language (OWL) or Resource Description Frameworks (RDF)). Here are some examples of ontologies used in biomedical research:
- BioPortal: the portal for the U.S. National Center for Biomedical Ontology, hosted at Stanford.
- Gene Ontology: a bioinformatics initiative that aims to standardize the representation of gene and gene product attributes across species and databases.
- Medical Subject Headings (MeSH): a controlled vocabulary used for indexing articles for PubMed.
- Chemical Entities of Biological Interest (ChEBI): an ontology of small chemical compounds.
- Microarray Gene Expression Society Ontology (MGED): an ontology designed to describe microarray experiments.
-
Technical Standards
Established norm or requirement for a repeatable technical taskTechnical Standards establish norms or requirements for a repeatable technical task establishing uniform criteria, methods, processes, and practices. ISO standards are internationally agreed by experts, and create a formula that describes the best way of doing something.
- International Organization for Standardization (ISO): develops innovation management standards (e.g., ISO-8601 date and time format YYYYMMDD)
- National Institute of Standards and Technology (NIST): promote measurement science, standards, and technology to enhance productivity, facilitate trade, and improve the quality of life (e.g., Mass Spectrometry Data Center)
- IEEE Xplore: search for technical standards related to electrical engineering, computer engineering, and electronics