Harvard DataverseHarvard Dataverse is a general-purpose data repository built on open-source software that is intended for sharing and facilitating citation of research data. It is under continuous development by Harvard Library, Harvard University IT, and IQSS. Several other institutions have made use of this open-source software project to develop independent Dataverse installations at other locations.

 

Compare Harvard Dataverse to other options in the Repository Matrix.

Please contact us if you have any questions or suggestions about the content of this page. Last updated: 2020-03-10

Features & Specifications

  • Data Size and Format

    File Size Limit:  To use the browser-based upload function, file can’t exceed 2.5GB. However, Harvard Dataverse is willing to work with Harvard researchers who have larger files.

    Dataset Size Limit:  1TB, but Harvard Dataverse will work with Harvard researchers who have larger  datasets (>1 TB).

    Data Types and Formats Hosted:  All file formats accepted (tabular, non-tabular, and compressed as a zip file bundle with file hierarchy feature to preserve directory structure)

  • Data Licensing

    Waiver:   Harvard Dataverse strongly encourages use of a Creative Commons Zero (CC0)  waiver for all public datasets, but dataset owners can specify other terms of use and restrict access to data.

  • Data Attribution and Citation Tools

    • Within Harvard Dataverse, specific programs or projects can create nested dataverses (collections), and each nested dataverse (collections) itself can contain nested dataverses (collections) or one or more datasets. Harvard Dataverse assigns a DOI to each dataset and datafile within a dataverse.
    • When substantive changes are made to the metadata and files associated with a published dataset, a new version number is assigned to the existing dataset citation; the DOI remains constant. Users have the option to determine if substantial metadata changes should result in a “major version” change.
    • Minor version changes will not impact the existing citation version number, but a minor version number will appear in the “versions” tab of the dataset page (*.*). All deletions/additions/replacement of data files will result in a major version# change that is displayed in the citation ( v1 ---> v2) and in the “version s” tab of the dataset and file landing page. 
    • Whenever a dataset is edited (metadata or files), the resulting draft version must be published in order to visualize the changes. Researchers can export dataset citation files in several formats (Endnote XML, Bibtex, RIS) to manage citations in Latex, Endnote, Zotero, and more. Web browser plugins (e.g. Zotero and Endnote plugins) can also extract dataset citation info from dataset pages.
  • User Access Controls

    Option to Share:  Harvard Dataverse allows draft, unpublished, and published (public) datasets. For draft and unpublished datasets, a variety of tiers of access can be assigned to different registered users.

  • Data Access Tools

    Search: 

    • Data descriptors and metadata: At the dataset level, Harvard Dataverse offers several different metadata templates appropriate for datasets from different disciplines, and the life sciences metadata template adheres to the ISA-TAB specification. 
    • Additional free-form keyword fields are provided. These dataset-level metadata are searchable, but depositors cannot add their own detailed file-level metadata.
    • Dataverse extracts variable-level metadata from ingested tabular files, extracts metadata from FITS files, and makes that file-level metadata searchable.

    Download:  In addition to individual file downloading, Harvard Dataverse has multiple APIs for programmatic data and metadata access, as described in their API Guide

    Proprietary File Format Access:  Tabular files are converted to tabular format which allows download of some proprietary files in tabular format as well as other formats. See Tabular Data File Ingest.

    Data Analysis:  Harvard Dataverse includes Data Explorer and File Preview. The Data Explorer provides a UI that displays the variables of tabular data files and allows users to search, chart, and conduct cross tabulation analysis. The File Preview is a set of tools that display the content of files - including audio, HTML, images, PDF, text, video - allowing them to be viewed without downloading. Additional features can be found on Dataverse’s features page.

  • Cost

    Free

  • Other Features

    Pros:

    • Can share data with collaborators or the public
    • Seeking a DOI for a dataset or group of related datasets
    • Provides a mechanism by which a journal's editors and reviewers can have anonymous access to a dataset or dataverse before it is made public. See Private URLs
       

    Cons:

    • Users of proprietary file types must have access to the necessary software to access such files once they are downloaded