Skip to Main Content

LibraData User Guide

UVa's Local Instance of the Dataverse Software

File Handling

Certain file types in the Dataverse software are supported by additional functionality, which can include downloading in different formats, subsets, file-level metadata preservation, file-level data citation; and exploration through data visualization and analysis. See the sections below for information about special functionality for specific file types.

File Previews

UVa’s Dataverse Repository has installed Previewers for several common file types: ASCII Text (txt, html), audio (mp3, wav, ogg), image (gif, jpeg, png), pdf, video (mp4, ogg, quicktime), csv, tsv, stata syntax, and r syntax. The preview appears on the file page. If the file type is one of the above preview tools, the preview will be created and will display automatically. File previews are not available for restricted files.

Tabular Data Files

Files in certain formats - Stata, SPSS, R, and CSV - may be ingested as tabular data (see “Tabular Data Ingest” section for details). UVa's Dataverse has turned off automatic ingestion of Excel(xlsx) files.

Additional download options available for tabular data (found in the same drop-down menu under the “Download” button):

  • The original file uploaded by the user;
  • As tab-delimited data (with the variable names in the first row);
  • Saved as R data (if the original file was not in R format);
  • Variable Metadata (as a DDI Codebook XML file);
  • Data File Citation (currently in either RIS, EndNote XML format, or BibTeX format);

Research Code

Code files - such as Stata, R, MATLAB, or Python files or scripts - have become a frequent addition to the research data deposited in Dataverse repositories. Research code is typically developed by few researchers with the primary goal of obtaining results, while its reproducibility and reuse aspects are sometimes overlooked. Because several independent studies reported issues trying to rerun research code, please consider the following guidelines if your dataset contains code.

The following are general guidelines applicable to all programming languages.

  • Create a README text file in the top-level directory to introduce your project. It should answer questions that reviewers or reusers would likely have, such as how to install and use your code. If in doubt, consider using existing templates such as our README template.
  • Depending on the number of files in your dataset, consider having data and code in distinct directories, each of which should have some documentation like a README.
  • Consider adding a license to your source code. You can do that by creating a LICENSE file in the dataset or by specifying the license(s) in the README or directly in the code. Find out more about code licenses at the Open Source Initiative webpage.
  • If possible, use free and open-source file formats and software to make your research outputs more reusable and accessible.
  • Consider testing your code in a clean environment before sharing it, as it could help you identify missing files or other errors. For example, your code should use relative file paths instead of absolute (or full) file paths, as they can cause an execution error.
  • Consider providing notes (in the README) on the expected code outputs or adding tests in the code, which would ensure that its functionality is intact.

Capturing code dependencies will help other researchers recreate the necessary runtime environment. Without it, your code will not be able to run correctly (or at all). Please contact libra@virginia.edu or your subject liaison for help with packaging your data and software for reproducibility on LibraData.

Astronomy (FITS)

Metadata found in the header section of Flexible Image Transport System (FITS) files are automatically extracted by the Dataverse software, aggregated and displayed in the Astronomy Domain-Specific metadata of the dataset that the file belongs to. This FITS file metadata, is therefore searchable and browsable (facets) at the Dataset-level.

Compressed Files

Compressed files in .zip format are unpacked automatically. If a .zip file fails to unpack for whatever reason, it will upload as is. If the number of files inside are more than a set limit (1,000 by default, configurable by the Administrator), you will get an error message and the .zip file will upload as is.

If the uploaded .zip file contains a folder structure, the Dataverse software will keep track of this structure. A file’s location within this folder structure is displayed in the file metadata as the File Path. When you download the contents of the dataset, this folder structure will be preserved and files will appear in their original locations.

These folder names are subject to strict validation rules. Only the following characters are allowed: the alphanumerics, ‘_’, ‘-‘, ‘.’ and ‘ ‘ (white space). When a zip archive is uploaded, the folder names are automatically sanitized, with any invalid characters replaced by the ‘.’ character. Any sequences of dots are further replaced with a single dot. For example, the folder name data&info/code=@137 will be converted to data.info/code.137. When uploading through the Web UI, the user can change the values further on the edit form presented, before clicking the ‘Save’ button.

Note
If you upload multiple .zip files to one dataset, any subdirectories that are identical across multiple .zips will be merged together when the user downloads the full dataset.

 

Note
To keep a “zip” file zipped on upload, you need to “zip” the .zip file (double zip). Contact libra@virginia.edu if you need help with zip files.

 
Other File Types

There are several advanced options available for certain file types.

  • Image files: jpgs, pngs, and tiff files are able to be selected as the default thumbnail for a dataset. The selected thumbnail will appear on the search result card for that dataset.
  • SPSS files: SPSS files can be tagged with the language they were originally coded in. This is found by clicking on Advanced Options and selecting the language from the list provided.