Invest in Research Data Management: Get a Data Manager

Data Documents“If you were struck by lightning, would other people be able to access and understand your data?”

Sarah Arena shares her experience joining the Laboratory of Systems Pharmacology (LSP) as a Data Manager during the COVID pandemic, and how a commitment to data management can lead to creating more accessible and reproducible research.


“If you were struck by lightning, would other people be able to access and understand your data?” As the Data Manager for the Harvard Program in Therapeutic Science (HiTS) and the Laboratory in Systems Pharmacology (LSP) this was one of my more dire justifications for good research data management practices. As someone hired in 2020 during the COVID pandemic, I also know current events serve as a painful and relevant pressure test to examine and update existing ways of doing things to ensure we are planning for the long-term.

My first six months in the LSP involve navigating how to remotely support the lab and its researchers. I am learning how to apply my background in Library and Information Science to the day-to-day reality of working in a lab. Building relationships via Zoom has helped familiarize me with lab norms, the work of colleagues, and areas where individuals, platforms, and the lab as a whole can more effectively manage data. The range of tasks I participate in—recommending metadata schemas, documenting platform workflows, and revamping publication tracking—exemplify that research data management is relevant throughout the research lifecycle.

Notably, in my first months, I often quoted two of Dr. Peter Sorger’s favorite phrases about data management: he refers to uncovering documents and the history of lab practices as “performing archeology” and advocates for an “80-20 rule” in developing functional solutions that concentrate energy to achieve the most effective results. These maxims demonstrate important lessons I’ve internalized about data management in the LSP—finding answers and designing solutions are both iterative processes that involve collaboration, question-asking, lots of research, and recognizing when the perfect is becoming the enemy of the good.

One round of archeology I embarked upon was sifting through many versions of Dropbox folders in order to compile an up-to-date lab handbook. In fact, the process of starting a new lab handbook prompted broader discussions around file organization, creating and updating documentation, and the most appropriate ways to make information available to lab members and collaborators who sit in different institutions. In an effort to create a handbook that can be updated regularly and used widely, we pivoted away from using a more traditional PDF document and instead are creating a lab SharePoint. This allows us to maintain version control and manage permissions for each page, as well as retain an organized asset library, which means the site can be dynamic while also maintaining lab records. My involvement with the planning and creation of the site allowed me to build best-practices into the site's content and structure along the way.

For a large lab with many collaborators, data management can also serve as a tool and a justification for communication. While different institutions and fields will follow different standards, it is important to clearly communicate which storage locations, naming conventions, and code will be used to ensure consistency and effective data hand-off. For one multi-lab grant, I am promoting the deposition of project data in a centralized repository to facilitate transparency and access. The aim of a centralized storage location allows researchers to access files when they need them, keeps participants informed of project progress, and prepares data for future analysis and archiving. Through discussions with researchers, we identified the need for project-wide sample naming conventions and to track different samples across assays. I am also working with researchers to develop standards for data and metadata stored in the repository. Having these conversations from the project’s outset means these standards are planned with long-term archiving and reproducibility in mind and include documenting important information in README files along the way and following minimum information standards and subject repository standards.

Overall, I have been impressed with HiTS commitment to being a leader in supporting open access, creating metadata standards, and pushing others in the field to follow suit. I appreciate their vision to hire a data manager and to foster a culture in which many people are involved and invested in creating accessible and reproducible research. I am also fortunate to have many colleagues dedicated to data management, including Jeremy Muhlich, Madison Tyler, and Cat Luria in the LSP, and the members of the Harvard Longwood Medical Area Research Data Management Working Group. Data management is ongoing throughout the research process, and these individuals exemplify the importance of having many knowledgeable advocates to take on the task together.

Written by Sarah Arena, Data Manager, Harvard Program in Therapeutic Science