Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management

This guide provides best practices and resources for managing your research data for any discipline.

What are Documentation and Metadata?

Documentation and Metadata provide the descriptive information about a data set that explains its meaning.  Good metadata provides context for the data. Metadata and documentation should be sufficient to enable research data to be found, understood, reused and managed throughout its lifecycle. 

  • Enables efficient organization of the research data
  • Facilitates discovery
  • Facilitates research data sharing
  • Identifies the creator(s) of the data
  • Provides permanent identifiers for the data
  • Links the data to other related products, such as articles and other data
  • Supports archiving and preservation

Do you have questions about the metadata for your dataset?  Need assistance? Contact me at dmconsult@virginia.edu. 

Why Document Data?

Ensuring that data be understood, interpreted and used, requires clear and detailed data documentation. Sharing data for long-lasting usability would be impossible with out documentation (also known as metadata) .

It is important to begin to document your data at the very beginning of your research project and continue throughout the project. By doing so will make data documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project. Don’t wait until the end to start to document your research project and its data!

What to Document

Research Project Documentation:

  • Context of data collection
  • Data collection methods
  • Structure, organization of data files
  • Data sources used 
  • Data validation, quality assurance
  • Transformations of data from the raw data through analysis
  • Information on confidentiality, access & use conditions

Dataset Documentation:

  • Variable names, and descriptions
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data
  • File format 
  • Software - version, OS

Types of Documentation and Metadata Standards

Documentation:

  • Data dictionaries
  • Permanent Identifiers - DOI
  • Code books
  • File directories
  • Methodologies
  • File naming conventions
  • Data definition files
  • Glossary
  • ReadMe files
  • Lab notebooks

Metadata:

  • Schema
  • Standards (general) - Dublin Core
  • Standards (discipline-specific)

There are hundred's of metadata standards available.  Dublin Core is one which was created to be used for any type of resource, including a dataset. It has 15 elements that cover the basics.  I've listed them, and a brief discription of the elements, in the next section.  There other attributes which are important to properly identify your dataset:

  • Method: how the data were generated/created, software used, algorithms, protocols, equipment.
  • Funder: which organization funded the research project
  • Directory: listing of all the files in a data package (supplemental documents, codebook, data dictionary, etc.)
  • Access: information about accessing the dataset, including repository information. 

Many disciplines have standards and schemas designed specifically for their types of data.  Funders may require the data that you will share to use a specific standard. Repositories may require that all data submitted to them for deposit use a specific standard.  It is best practice to choose a repository early in a project so you can identify which standard you will be required to use.

There are several services that list many of the standards.

  • Digital Curation Centre (DCC) lists metadata standards by subject area
  • RDA Metadata Standards Catalog Open directory of metadata standards applicable to research data. Developed by the Research Data Alliance and hosted by the University of Bath. 
  • The Research Data Alliance (RDA) lists standards by broad discipline categories
  • OpenGeoportal provides a guide to geospatial metadata tools, standards and information

Like to try using the Dublin Core? The dublin core generator is an online tool that you can use to generate fully-formed Dublin Core metadata code.

Dublin Core Element Set

The Simple Dublin Core Metadata Element Set is a set of 15 vocabulary terms that can be used to describe resources. It is a standard for cross-domain resource description: ANSI/NISO Z39.85-2012. Each element is optional, and can be repeated.  The elements are listed below, followed by a short description of how they could be used for a dataset.  This element set is also appropriate for other related resourses, such as supplemental documentation.

  • Title: name of dataset. This could be the name of the research project.  
  • Creator: name(s) of researcher, group, or organization that created the dataset. Best practice for personal names is surname first. Include ORCID IDs if available.
  • Subject: keywords or keyphrases describing the content of the data. Best practice is to use a controlled vocabulary.
  • Description: an account of the dataset. This could be an abstract, a table of contents, or a free-text account.
  • Publisher: name of who made the dataset available. This could be the PI, the institution, or an organization. 
  • Contributor: name(s) of who has contributed to the creation of the dataset. This could be co-PIs, research staff, research scientists. Include ORCID IDs if available. 
  • Date: point or period of time associated with an event in the lifecycle of the data. Usually indicates when the dataset was completed. Best practice is to use a standard like ISO 8601 (YYYY-MM-DD). 
  • Type: the nature or genre of the dataset. Should be 'Dataset'. Best practice is to use a controlled vocabulary: DCMI Type Vocabulary [DCMITYPE]. 
  • Format: file format, physical medium or size of the resource. Best practice is to use a controlled vocabulary: Internet Media Types [MIME]
  • Identifier: a unique reference to the dataset. Best Practice is to use a DOI or similar permanent number. Often supplied by the repository you are submitting the dataset to. 
  • Source: a related resource from which the dataset is derived.  If you reused another dataset, you would list it here.
  • Language: language of the dataset. Best practice is to use a controlled vocabulary: IANA Language Subtag Registry
  • Relation: a related resource. For a dataset this might be an article about the dataset. Best practice is to use a formal identification system (ISBN for a book, ISSN & issue/volume info for a journal, DOI, PURL, etc.) 
  • Coverage: spacial or temporal topic of the dataset. This is used for a place name or GIS coordinates (spatial) or a namded period or date range (temporal). Best practice is to use a controlled vocabulary: Getty Thesaurus of Geographic Names 
  • Rights: a statement about any know rights information. This could include intellectual property rights, licenses (such as Creative Commons), data use restrictions, and statutory rights. 

This section was created using information from the National Information Standards Organization. ANSI/NISO Z39.85-2012