Research Data Management

This guide offers guidance and resources for managing research data in any discipline.

Advantages of Open Research

What is Open Science/Open Research?

The terms "open science" and "open data", and the idea of open research more broadly, are becoming ubiquitous in academic discourse. Far from being merely a trendy concept, though, open research is an important ideal and set of principles and practices for the scholarly community. But what exactly do we mean when we say open research, open science, and open data?

The UNESCO Recommendation on Open Science defines Open Science as "an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community."
The FOSTER Open Science Training Handbook defines it as "the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods."
The Open Knowledge Foundation's Open Definition succinctly states "Open data and content can be freely used, modified, and shared by anyone for any purpose."

Benefits of Open Research

Improve transparency and elevate research quality: Research transparency allows for wider, more effective peer evaluation and stronger scrutiny of research, promoting rigorous & high-quality scholarship.
Enhance efficiency: Greater access to research data and outputs can reduce the effort and costs of duplicating research results and recreating data, and enables more research to be done using the same data.
Facilitate collaboration and interdisciplinary projects: By breaking down paywalls and boundaries between disciplines, open research fosters collaborative and interdisciplinary projects to tackle complex contemporary problems.
Increase the impact and value of research: Open research is more widely available, broadening the audience and greatly expanding the opportunities for research and data to be consulted, reused, synthesized, or followed up.
Build greater trust in scholarly work: By being available to the public and engaging stakeholders outside of the academy, open research may help rebuild trust among those who see academic research as a cloistered activity.

RDM and Open Research

How do research data management best practices contribute to open research? The ideals of open research go beyond merely making research and associated data available, however important that aspect may be. Shared research and data should also be well-organized, well-documented, readily understandable and interpreted, and reusable.

We believe in shifting attitudes, to go "beyond compliance" toward an ethos of stewardship and expanded sense of responsibility for producing and sharing research that has been conducted and managed with care. The principles and practices of research data management serve as recommendations and guidelines for how research data should be handled so that it reflects and embodies open ideals.

UNESCO Recommendation on Open Science
A landing page for the UNESCO Recommendation, including links to an introductory document as well as the full text.
TOPS Open Science 101
A series of training modules from the NASA initiative TOPS (Transform to OPen Science).
What are the benefits of Open Science?
A page from FOSTER covering benefits of Open Science. Includes links to other relevant resources.
Concordat on Open Research Data
A set of principles relating to open data agreed upon by a multi-stakeholder group in the UK.

Designing and Using Reproducible Workflows

Moving Toward Reproducibility

Modes of analysis that make use of computation, code, and other sorts of processing on digital data are trending toward ubiquity in modern research. Because of the affordances of computational analysis, there is an opportunity to ensure that such research is as reproducible as possible. While the word 'reproducible' can mean different things in the context of research, here we want to focus on the idea of computational reproducibility, defined by the NASEM as "obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis."

Some recommended practices for designing and using reproducible workflows are ones we have discussed elsewhere in this guide, such as organizing your files into a clear directory structure and creating detailed documentation. There are also practices specific to this topic, including:

Document operations: In addition to other forms of documentation, it is crucial to document what operations are performed during an automated workflow -- processing, transformations, conversions, creation of new files, etc. This can often be done via comments in the code.
Automate as much as possible: Highly automated workflows are more reproducible because steps requiring manual intervention are more prone to error and/or deviation. Thus, scripts/code are preferable to GUI tools.
Use relative filepaths: Absolute filepaths will almost always need to be fixed if the project is ported to a new machine or used by another person.
Clearly conceptualize the workflow design: Workflows should be designed as a series of logically related and sequential steps "glued together" where the output of one step feeds directly into the next step as input.
Use a main script: If your workflow uses more than one script, bundle up the execution of each script into a "main" or controller script. A workflow is more reproducible if another user only needs to execute once.

Literate Programming

One highly useful concept for creating reproducible workflows in your research is literate programming. Literate programming is an approach to writing code that embeds snippets or chunks of executable code in a document that contains natural language explanations of the operations and analysis.

Jupyter notebooks is one popular application for writing "executable documents" in this style, which supports Python, R, and Julia. RStudio offers R Markdown, which works similarly to Jupyter notebooks and supports several languages. Quarto is an outgrowth of R Markdown that bills itself as an "open-source scientific and technical publishing system" which combines support for literate programming with interactive/dynamic features and additional integrations.

If you use R for your research, one powerful and flexible combination that may help you design reproducible workflows is using RStudio (and the capabilities of R Markdown mentioned previously) together with Git and GitHub. There is an R package called 'usethis' that allows you to execute Git commands from the R console and view commits in the RStudio window. This will allow you to create a repository and perform version control all from the same program that you write your analytical code in. For more information on this, see the RStudio & Version Control link below.

Workflow Capture

Another available option for reproducible workflow development is workflow capture software (also sometimes referred to as integrated workflow programs or other similar monikers). These programs were created to allow researchers to design workflows in a systematic way. Some examples of such software include Kepler and Taverna. This software is highly specialized and beyond the scope of this guide.

The Basic Reproducible Workflow Template
A chapter from The Practice of Reproducible Research covering workflows.
Jupyter Notebooks (NYU Libraries)
A page on the NYU RDM LibGuide introducing Jupyter notebooks.
R Markdown Documentation
Official documentation for getting started with R Markdown in RStudio.
RStudio & Version Control: Git, GitHub & RStudio
A page from Duke Libraries covering how to use Git together with RStudio. Includes a link to a recording of a helpful workshop on computational reproducibility.
Workflow Capture (USGS)
A page briefly discussing workflow capture from USGS's data management resources.

Open Science Framework

The Open Science Framework (OSF) is an open source, web-based project management tool created and maintained by the Center for Open Science (COS), a non-profit devoting effort to "proactively reform the norms and reward system in science and elevate rigor, transparency, sharing, and reproducibility."

The functionality of OSF is centered around projects, which are collaborative workspaces that offer various features including uploading files, version control, setting permissions for collaborators, and integrations with various external tools.

(Image sourced from the OSF website, which is licensed under a CC BY-SA 4.0 license)

OSF Features

If your research project is collaborative and you want a platform that will support your work, or if you have an interest in sharing parts of your research (such as methodology, plan for analysis, etc.) publicly while the project is ongoing, OSF may be a good choice for you. Some useful features of OSF include:

Project registration: Registering a project with OSF allows for the sharing of a public "snapshot" of your research, which contributes to transparency.
Storage and version control for files: OSF provides generous storage for files and automatically tracks versions of files uploaded with the same filename.
Built-in wiki functionality: Use the wiki tool to write documentation, meeting minutes, lab procedures, and more.
Granular permissions for collaborators: Adjust permissions (read only, read/write, administrator) for all collaborators on each component of your project.
Integrations with other popular tools/apps: OSF integrates with storage options such as Google Drive and Box, code repository platforms like GitHub and GitLab, and citation management software Zotero and Mendeley, among others.
- OSF also integrates with instances of Dataverse repositories like LibraData to enable publishing of your project at various stages of the research lifecycle.
Project analytics and identifiers: OSF offers analytics and unique persistent URLs for each component of your project. Each project can also be given a DOI if it is made public.
File previews: Common file formats like Word, PDF, Excel spreadsheet, and Powerpoint slides can be previewed in OSF.
Choose what to share publicly: OSF allows for different components of a project to have different visibility, meaning you can make your protocols and documentation public while keeping your data private.

The University of Virginia is an OSF Institutions member, meaning you can sign in using your UVA credentials and affiliate your projects with UVA.

Interested in using OSF to manage your project or as a collaboration platform and have questions? Please reach out to us for more information.

Introduction to OSF Hands-on Guide
An official guide to getting started on OSF, including how to make an account and create projects.
OSF Support
A support page for OSF containing a FAQ section and other help topics.
OSF Data Management Documentation
Some guidance on data management from OSF.
Center for Open Science YouTube
COS's YouTube channel hosts webinars and conference talks on the OSF as well as other open science topics.

Additional Resources

The Carpentries
The Carpentries is a nonprofit organization that teaches foundational coding and data science skills to researchers worldwide. Upcoming workshops and the materials for the various curricula can be found on the website.
Open, rigorous, and reproducible research: a practitioner's handbook
A thorough but accessible overview of doing rigorous and reproducible research from Stanford Data Science. Note that this resource is a work in progress and some sections may still be incomplete.
The Practice of Reproducible Research
The open, online version of the book The Practice of Reproducible Research (published in print form by the University of California Press in 2018). This book contains a collection of 31 case studies of reproducible research workflows, written by academic researchers in the data-intensive sciences.
Reproducibility and Replicability in Science
A consensus study report by the National Academies of Sciences, Engineering, and Medicine covering reproducibility and replicability.

Research Data Management

Your Research Data Management Team

Senior Research Data Management Librarian

Advantages of Open Research

Designing and Using Reproducible Workflows

Open Science Framework

Additional Resources