Skip to Main Content

Research Data Management

This guide offers guidance and resources for managing research data in any discipline.

Advantages of Open Research

The terms "open science" and "open data", and the idea of open research more broadly, are becoming ubiquitous in academic discourse. Far from being merely a trendy concept, though, open research is an important ideal and set of principles and practices for the scholarly community. But what exactly do we mean when we say open research, open science, and open data?

  • The UNESCO Recommendation on Open Science defines Open Science as "an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community."
  • The FOSTER Open Science Training Handbook defines it as "the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods."
  • The Open Knowledge Foundation's Open Definition succinctly states "Open data and content can be freely used, modified, and shared by anyone for any purpose."

Some benefits of open research:

  • Improve transparency and elevate research quality: Research transparency allows for wider, more effective peer evaluation and stronger scrutiny of research, promoting rigorous & high-quality scholarship.
  • Enhance efficiency: Greater access to research data and outputs can reduce the effort and costs of duplicating research results and recreating data, and enables more research to be done using the same data.
  • Facilitate collaboration and interdisciplinary projects: By breaking down paywalls and boundaries between disciplines, open research fosters collaborative and interdisciplinary projects to tackle complex contemporary problems.
  • Increase the impact and value of research: Open research is more widely available, broadening the audience and greatly expanding the opportunities for research and data to be consulted, reused, synthesized, or followed up.
  • Build greater trust in scholarly work: By being available to the public and engaging stakeholders outside of the academy, open research may help rebuild trust among those who see academic research as a cloistered activity.

How do research data management best practices contribute to open research? The ideals of open research go beyond merely making research and associated data available, however important that aspect may be. Shared research and data should also be well-organized, well-documented, readily understandable and interpreted, and reusable.

We believe in shifting attitudes, to go "beyond compliance" toward an ethos of stewardship and expanded sense of responsibility for producing and sharing research that has been conducted and managed with care. The principles and practices of research data management serve as recommendations and guidelines for how research data should be handled so that it reflects and embodies open ideals.

Designing and Using Reproducible Workflows

Modes of analysis that make use of computation, code, and other sorts of processing on digital data are trending toward ubiquity in modern research. Because of the affordances of computational analysis, there is an opportunity to ensure that such research is as reproducible as possible. While the word 'reproducible' can mean different things in the context of research, here we want to focus on the idea of computational reproducibility, defined by the NASEM as "obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis."

Some recommended practices for designing and using reproducible workflows are ones we have discussed elsewhere in this guide, such as organizing your files into a clear directory structure and creating detailed documentation. There are also practices specific to this topic, including:

  • Document operations: In addition to other forms of documentation, it is crucial to document what operations are performed during an automated workflow -- processing, transformations, conversions, creation of new files, etc. This can often be done via comments in the code.
  • Automate as much as possible: Highly automated workflows are more reproducible because steps requiring manual intervention are more prone to error and/or deviation. Thus, scripts/code are preferable to GUI tools.
  • Use relative filepaths: Absolute filepaths will almost always need to be fixed if the project is ported to a new machine or used by another person.
  • Clearly conceptualize the workflow design: Workflows should be designed as a series of logically related and sequential steps "glued together" where the output of one step feeds directly into the next step as input.
  • Use a main script: If your workflow uses more than one script, bundle up the execution of each script into a "main" or controller script. A workflow is more reproducible if another user only needs to execute once.

One highly useful concept for creating reproducible workflows in your research is literate programming. Literate programming is an approach to writing code that embeds snippets or chunks of executable code in a document that contains natural language explanations of the operations and analysis.

Jupyter notebooks is one popular application for writing "executable documents" in this style, which supports Python, R, and Julia. RStudio offers R Markdown, which works similarly to Jupyter notebooks and supports several languages. Quarto is an outgrowth of R Markdown that bills itself as an "open-source scientific and technical publishing system" which combines support for literate programming with interactive/dynamic features and additional integrations.

If you use R for your research, one powerful and flexible combination that may help you design reproducible workflows is using RStudio (and the capabilities of R Markdown mentioned previously) together with Git and GitHub. There is an R package called 'usethis' that allows you to execute Git commands from the R console and view commits in the RStudio window. This will allow you to create a repository and perform version control all from the same program that you write your analytical code in. For more information on this, see the RStudio & Version Control link below.

Another available option for reproducible workflow development is workflow capture software (also sometimes referred to as integrated workflow programs or other similar monikers). These programs were created to allow researchers to design workflows in a systematic way. Some examples of such software include Kepler and Taverna. This software is highly specialized and beyond the scope of this guide.

Open Science Framework

The Open Science Framework (OSF) is an open source, web-based project management tool created and maintained by the Center for Open Science (COS), a non-profit devoting effort to "proactively reform the norms and reward system in science and elevate rigor, transparency, sharing, and reproducibility."

The functionality of OSF is centered around projects, which are collaborative workspaces that offer various features including uploading files, version control, setting permissions for collaborators, and integrations with various external tools.

(Image sourced from the OSF website, which is licensed under a CC BY-SA 4.0 license)

If your research project is collaborative and you want a platform that will support your work, or if you have an interest in sharing parts of your research (such as methodology, plan for analysis, etc.) publicly while the project is ongoing, OSF may be a good choice for you. Some useful features of OSF include:

  • Project registration: Registering a project with OSF allows for the sharing of a public "snapshot" of your research, which contributes to transparency.
  • Storage and version control for files: OSF provides generous storage for files and automatically tracks versions of files uploaded with the same filename.
  • Built-in wiki functionality: Use the wiki tool to write documentation, meeting minutes, lab procedures, and more.
  • Granular permissions for collaborators: Adjust permissions (read only, read/write, administrator) for all collaborators on each component of your project.
  • Integrations with other popular tools/apps: OSF integrates with storage options such as Google Drive and Box, code repository platforms like GitHub and GitLab, and citation management software Zotero and Mendeley, among others.
    • OSF also integrates with instances of Dataverse repositories like LibraData to enable publishing of your project at various stages of the research lifecycle.
  • Project analytics and identifiers: OSF offers analytics and unique persistent URLs for each component of your project. Each project can also be given a DOI if it is made public.
  • File previews: Common file formats like Word, PDF, Excel spreadsheet, and Powerpoint slides can be previewed in OSF.
  • Choose what to share publicly: OSF allows for different components of a project to have different visibility, meaning you can make your protocols and documentation public while keeping your data private.

The University of Virginia is an OSF Institutions member, meaning you can sign in using your UVA credentials and affiliate your projects with UVA.

Interested in using OSF to manage your project or as a collaboration platform and have questions? Please reach out to us for more information.

Additional Resources