Skip to Main Content

Research Data Management

This guide offers guidance and resources for managing research data in any discipline.

What is Research Data Management?

Research Data Management refers to a set of principles, processes, and best practices for handling research data both during and after the completion of a research project. The intention of this guide is to help researchers better understand and implement these principles and practices, including:

Following research data management recommendations helps make the researcher an effective steward of their research data, ensuring it is accessible, comprehensible, and fit for use and reuse.

How is the concept of research data defined? One commonly cited definition, used in federal regulations, is "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings." We understand that data is a concept that extends outside the scientific fields, however, so we might choose to say that research data consists of materials or other sources that form the basis for your conclusions and serve to justify scholarly claims and/or findings.

Why should you work to improve your data management? One reason is compliance: the norms and expectations for research are shifting and requirements that data be stored and shared for later reuse are increasingly common and will eventually apply to most (if not all) federally funded research. Additionally, data management plans are now a required component of many grant applications.

However, it is our view that good research data management simply makes for better research. Data is now commonly viewed as a first-class research output alongside the journal article, and well-managed data both facilitates and reflects well-conducted studies. Some specific benefits of research data management can be found below.

The Research Data Lifecycle

Researchers will be familiar with the "research lifecycle" - the broadly accepted outline of steps that constitute the research process, from devising the research questions and securing funding to disseminating the findings via publication. The research data lifecycle is closely related to the research lifecycle. Different models include or emphasize different steps, but they typically cover similar ground, including:

  • Project planning
  • Data collection and/or generation
  • Data processing and analysis
  • Data storage and security
  • Preparing data for preservation
  • Data sharing
  • Data reuse

The various activities of effective research data management play their part in different stages of the model; for instance, writing a data management plan and preparing a dataset for deposit into a data repository occur at different points in the research process. Other practices, such as writing and updating documentation and consistently following an organization scheme, extend across multiple stages.

(UVA Library Research Data Lifecycle model courtesy of Sherry Lake)

We introduce the research data lifecycle to highlight that research data management is a set of activities that span the entire research project, from inception to completion (and potentially beyond). We recommend taking a look at several research data lifecycle models, and while you are going through this guide, considering where each practice/activity is situated in the data lifecycle. We hope that this additional context helps research data management seem like a coherent process of caring for and attending to your data throughout a project rather than merely additional tasks to complete.

Benefits of Good Data Management

Some of the many benefits of practicing good research data management include:

  • Compliance: As noted above, at least some aspects of research data management such as data sharing are increasingly required by funders and peer-reviewed journals. Other practices may be recommended or required by your institution.
  • Credit: Collecting/generating and managing data is a part of the research process that is gaining more recognition as an undervalued part of scholarship. Shared data is now itself cited in the literature, along with increasing the visibility of the associated journal articles.
  • Protection from loss: Data can be easily lost due to a multitude of factors, including natural disaster, hardware failure, software failure, obsolescence, human error, or simply neglect. Good practices around storage and backups can safeguard against losing critical files.
  • Efficiency: Good data management practices around files, organization, and documentation result in greater efficiency for workflows and other research processes. Following good data management recommendations may also save time in the long run by making it easier to locate and understand data and files from earlier in the project.
  • Transparency and reproducibility: Well-documented and planned research is both more transparent and more reproducible, qualities we should strive for as scholarly ideals that uphold research integrity. Reproducible research also helps eliminate wasteful duplicate data collection.
  • Improved research and data quality: Effective research data management may reduce the chance of errors, mistakes, or other blemishes on your research caused by inattentive habits or practices around handling files, data, and code.
  • Easier collaboration: Careful planning and consistent organization agreed on by all investigators helps smooth out the occasionally rocky experience of conducting collaborative and interdisciplinary studies.
  • Enables preservation, sharing, and reuse: Preparing a dataset for preservation, sharing and reuse takes considerable effort, including data cleaning, file transformations, reorganization, writing documentation, and generating metadata. Following data management best practices throughout a study simplifies this process and makes your life easier when it comes time to deposit, helping to ensure your data is actually fit for long-term archival and has the potential for fruitful reuse down the line.