Certain file types in the Dataverse software are supported by additional functionality, which can include downloading in different formats, subsets, file-level metadata preservation, file-level data citation; and exploration through data visualization and analysis. See the sections below for information about special functionality for specific file types.
UVa’s Dataverse Repository has installed Previewers for several common file types: ASCII Text (txt, html), audio (mp3, wav, ogg), image (gif, jpeg, png), pdf, video (mp4, ogg, quicktime), csv, tsv, stata syntax, and r syntax. The preview appears on the file page. If the file type is one of the above preview tools, the preview will be created and will display automatically. File previews are not available for restricted files.
Files in certain formats - Stata, SPSS, R, and CSV - may be ingested as tabular data (see “Tabular Data Ingest” section for details). UVa's Dataverse has turned off automatic ingestion of Excel(xlsx) files.
Additional download options available for tabular data (found in the same drop-down menu under the “Download” button):
Code files - such as Stata, R, MATLAB, or Python files or scripts - have become a frequent addition to the research data deposited in Dataverse repositories. Research code is typically developed by few researchers with the primary goal of obtaining results, while its reproducibility and reuse aspects are sometimes overlooked. Because several independent studies reported issues trying to rerun research code, please consider the following guidelines if your dataset contains code.
The following are general guidelines applicable to all programming languages.
Capturing code dependencies will help other researchers recreate the necessary runtime environment. Without it, your code will not be able to run correctly (or at all). Please contact libra@virginia.edu or your subject liaison for help with packaging your data and software for reproducibility on LibraData.
Metadata found in the header section of Flexible Image Transport System (FITS) files are automatically extracted by the Dataverse software, aggregated and displayed in the Astronomy Domain-Specific metadata of the dataset that the file belongs to. This FITS file metadata, is therefore searchable and browsable (facets) at the Dataset-level.
Compressed files in .zip format are unpacked automatically. If a .zip file fails to unpack for whatever reason, it will upload as is. If the number of files inside are more than a set limit (1,000 by default, configurable by the Administrator), you will get an error message and the .zip file will upload as is.
If the uploaded .zip file contains a folder structure, the Dataverse software will keep track of this structure. A file’s location within this folder structure is displayed in the file metadata as the File Path. When you download the contents of the dataset, this folder structure will be preserved and files will appear in their original locations.
These folder names are subject to strict validation rules. Only the following characters are allowed: the alphanumerics, ‘_’, ‘-‘, ‘.’ and ‘ ‘ (white space). When a zip archive is uploaded, the folder names are automatically sanitized, with any invalid characters replaced by the ‘.’ character. Any sequences of dots are further replaced with a single dot. For example, the folder name data&info/code=@137
will be converted to data.info/code.137
. When uploading through the Web UI, the user can change the values further on the edit form presented, before clicking the ‘Save’ button.
There are several advanced options available for certain file types.