LibraData User Guide

UVa's Local Instance of the Dataverse Software

File Handling

Certain file types in the Dataverse software are supported by additional functionality, which can include downloading in different formats, subsets, file-level metadata preservation, file-level data citation; and exploration through data visualization and analysis. See the sections below for information about special functionality for specific file types.

File Previews

UVa’s Dataverse Repository has installed Previewers for several common file types: ASCII Text (txt, html), audio (mp3, mpeg, wav, ogg, X-m4a), image (gif, jpeg, png), pdf, video (mp4, ogg, quicktime), csv, tsv, tab, stata syntax, R-syntax, Hypothesis, and geo+json. The preview appears on the file page. If the file type is one of the above preview tools, the preview will be created and will display automatically, after terms have been agreed to or a guestbook entry has been made, if necessary. File previews are not available for restricted files unless they are being accessed using a Preview URL. See also Reviewing an Unpublished Dataset. When the dataset license is not the default license, users will be prompted to accept the license/data use agreement before the preview is shown. See also Terms.

Tabular Data Files

Files in certain formats - Stata, SPSS, R, CSV, and TSV - may be ingested as tabular data (see “Tabular Data Ingest” section for details). UVa's Dataverse has turned off automatic ingestion of Excel (xlsx) files.

Additional download options available for tabular data (found in the same drop-down menu under the “Download” button):

The original file uploaded by the user;
As tab-delimited data (with the variable names in the first row);
Saved as R data (if the original file was not in R format);
Variable Metadata (as a DDI Codebook XML file);
Data File Citation (currently in either RIS, EndNote XML format, or BibTeX format);

Research Code

Code files - such as Stata, R, MATLAB, or Python files or scripts - have become a frequent addition to the research data deposited in Dataverse repositories. Research code is typically developed by few researchers with the primary goal of obtaining results, while its reproducibility and reuse aspects are sometimes overlooked. Because several independent studies reported issues trying to rerun research code, please consider the following guidelines if your dataset contains code.

The following are general guidelines applicable to all programming languages.

Create a README text file in the top-level directory to introduce your project. It should answer questions that reviewers or reusers would likely have, such as how to install and use your code. If in doubt, consider using existing templates such as our README template.
Depending on the number of files in your dataset, consider having data and code in distinct directories, each of which should have some documentation like a README.
Consider adding a license to your source code. You can do that by creating a LICENSE file in the dataset or by specifying the license(s) in the README or directly in the code. Find out more about code licenses at the Open Source Initiative webpage.
If possible, use free and open-source file formats and software to make your research outputs more reusable and accessible.
Consider testing your code in a clean environment before sharing it, as it could help you identify missing files or other errors. For example, your code should use relative file paths instead of absolute (or full) file paths, as they can cause an execution error.
Consider providing notes (in the README) on the expected code outputs or adding tests in the code, which would ensure that its functionality is intact.

Capturing code dependencies will help other researchers recreate the necessary runtime environment. Without it, your code will not be able to run correctly (or at all). Please contact libra@virginia.edu or your subject liaison for help with packaging your data and software for reproducibility on LibraData.

Compressed Files

Compressed files in .zip format are not unpacked automatically on the UVa Dataverse repository. A "zipped" file stays zipped (.zip) after upload.

Other File Types

There are several advanced options available for certain file types.

Image files: jpgs, pngs, and tiff files are able to be selected as the default thumbnail for a dataset. The selected thumbnail will appear on the search result card for that dataset.
SPSS files: SPSS files can be tagged with the language they were originally coded in. This is found by clicking on Advanced Options and selecting the language from the list provided.