Versioning

Version tracking, or version control, is the management of changes to any file or document. Version control is like a savings program for your project. It's a way that we can keep track of our projects across time, space, different users, and different systems!

Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.

Complex research projects will inevitably produce multiple versions of files. Implementing a file versioning strategy at the beginning of your research project will help to avoid confusion amongst collaborators and avoid lost time and effort trying to recover the "right" version of a file.

The Simple Way

File versioning can be as simple as integrating version numbers into your file naming convention (i.e., v1, v1_2, v2) or by using dates. Avoid using ambiguous terms such as ‘final’ or ‘revision’.

Best practices include:

  • decide how many versions of a file to keep, which versions to keep, for how long and how to organize versions
  • identify milestone versions to keep, for example major versions rather than minor versions
  • uniquely identify different versions of files using a systematic naming convention
  • record changes made to a file when a new version is created
  • record relationships between items where needed, for example between code and the data file it is run against
  • track the location of files if they are stored in a variety of locations
  • identify a single location for the storage of milestone and master versions

Software Tools

There are specific software tools that can be used to strictly maintain version control.

Git

Git is a distributed version control tool that can manage a development project's source code history. Git is the most common and widely accepted version control software, which you can run locally on your computer. Learn more about Git.

GitHub

GitHub is a web-based service for Git repositories (i.e., groups of tracked files). GitHub is commonly used for managing and sharing different versions of code for programming projects, but it can be used just as effectively for version control of other types of files, such as text documents. GitHub has a huge open-source community. Get started with GitHub.

GitLab

GitLab is an open source software that provides a Git repository hosting service and collaborative revision control. GitLab has project management, issue tracking, and free private repository hosting. See more about GitLab.

Bitbucket

Bitbucket is a web-based version control repository hosting service owned by Atlassian, for source code and development projects that use Git. Bitbucket tends to have mostly enterprise and business users. Learn more about Bitbucket.

Apache Subversion

Apache Subversion is a server-client software versioning and revision control system. Software developers use Subversion to maintain current and historical versions of files such as source code, web pages, and documentation. Get started with Apache Subversion.

File Sharing Platforms

Certain storage and file sharing platforms also have built-in version tracking with the ability to get back to earlier file versions.

Dropbox

Dropbox is a file hosting service that offers cloud storage, file synchronization, personal cloud, and client software. Dropbox saves all of your lost files and restores older versions of files. Consumer Dropbox for SPH, and Dropbox for Business for HMS

O365 OneDrive & SharePoint 

Personal file storage for individual workspace productivity and organizational file storage for management of departmental document libraries and files. Both offer storage for up to level 3 data, external/internal file sharing, co-authoring, and version control. See the HUIT Service Catalog for more on Onedrive and SharePoint.

Open Science Framework

Open Science Framework (OSF) provides free and open source project management support for researchers across the entire research lifecycle. As a flexible repository, it can store and archive research data, protocols, and materials. OSF has built-in version control and retains all copies of a file added to OSF, and further provides access to versions of files stored on third-party storage providers. Get started with Open Science Framework.

Google Drive

Google Drive is a file storage and synchronization service that allows users to store files on their servers, synchronize files across devices, and share files. Google keeps track of each revision to the file with built-in version tracking and the ability to get back to earlier file versions. Google Apps for Harvard are supported at some school.

Last Updated: 2020-04-24