(Adapted from Rourk, Will, 2020, "150- Architectural Detail - Pavilion V parlor", https://doi.org/10.18130/V3/CVKEMV, University of Virginia Dataverse, V1; pav_v_parlor_bucrania_STL.stl [fileName] with a CC0 1.0 Public Domain License.)
Data sharing isn't new. Researchers have been sending each other data files for years. What is new is that many funders require data sharing as a key component of their research funding strategy. Publishers may also require you to share the data that supports the articles that they publish.
Research data is a valuable resource, usually requiring much time and money to be produced. Many data collections have a significant value beyond usage for the original research. The ease with which digital data can be stored, disseminated and made easily accessible online to users means that many institutions are keen to share research data to increase the impact and visibility of their research.
Why should you share your research data?
How do I share my data?
Data repositories are not just place holders - many of them also preserve and curate the data. Funders may specify repositories for the research data produced by projects they fund. Publishers may require that the data supporting research they publish be deposited in a specific location.
Advantages to using a repository:
About Research Data Repositories
Research data repositories are storage locations and services designed and intended for long-term data archiving and preservation. Data repositories generally offer support for data sharing, enable access controls for datasets, provide persistent and citable identifiers, and create landing pages for deposited datasets that display descriptive metadata. While most data repositories share some core features, they vary considerably in the level of support offered beyond the basics.
Research data repositories are frequently separated into three broad categories:
Beyond these categories, one other major factor in choosing a repository is whether or not it offers curation services. The NLM defines data curation as "the ongoing processing and maintenance of data throughout its lifecycle to ensure long term accessibility, sharing, and preservation." Some repositories will provide curation services for hosted datasets, or expect deposited datasets meet some minimum standard level of curation. Generalist repositories do not curate deposited data. Some guidance that aligns with data curation is given in a later section.
Data Repository Characteristics
What makes a data repository effective and trustworthy? Federal agencies have devised a consensus set of "desirable characteristics" of data repositories for federally funded data, including but not limited to:
We have linked to this document below.
We have the following recommendations for choosing a repository for your dataset:
Choice of repository may also depend on details such as data formats and size, a repository's submission requirements and preservation policies, and whether or not there are associated fees. If you have questions or concerns about choice of repository please reach out to us at dmconsult@virginia.edu. One of the topics we cover in consultations with faculty is where and when to deposit and share your data.
Many recommendations and practices covered in this guide apply when it comes time to prepare a dataset for deposit. Below is a checklist that should help you think over what steps you have taken and what is left to be done before your dataset is deposit-ready:
(This FAIR graphic is licensed under a CC BY-SA 4.0 International license.)
Defining FAIR
FAIR stands for Findable, Accessible, Interoperable, and Reusable, and has become recognized as an ideal for archived and shared research data since its introduction as a framework. While these terms may seem clear, it is helpful to explain their specific meaning in the context of the FAIR guiding principles:
Please consult the links below on the FAIR principles for a more thorough exposition as well as examples.
Machine Actionablility
The main feature that distinguishes the FAIR guiding principles from other principles or recommendations regarding data is their strong emphasis on machine actionability. For (meta)data to be machine actionable, it must be able to be discovered, accessed, parsed, and in general usefully acted upon by a computational system or agent with little or no human intervention.
This capability is enabled by use of standards and protocols that are defined and structured in ways that programs and algorithms can "understand" and follow. Machine actionability is a desirable quality for hosted datasets due to our increasing reliance on computer systems to automate, expedite, or otherwise facilitate handling large volumes of data.