Skip to Main Content

LibraData User Guide

UVa's Local Instance of the Dataverse Software

File Upload

You can upload files to a dataset while first creating that dataset. You can also upload files after creating a dataset by clicking the “Edit Dataset” button at the top of the dataset page and from the dropdown list selecting “Files (Upload)” or clicking the “Upload Files” button above the files table in the Files tab. From either option you will be brought to the Upload Files page for that dataset. NOTE: compressed files (zip) will stay as zipped files on upload.

Once you have uploaded files, you will be able to edit the file name, add a description, restrict access [1] , add tags. Click “Save Changes” to complete the upload. If you uploaded a file by mistake, you can delete it before saving by clicking the checkbox to select the file, click “Edit Files” button above the files table, and then select “Delete”.

All file formats are supported. The limit on individual self-deposited files is 6GB. 

Please contact libra@virginia.edu if you need to upload a file that is larger than 6GB. Archiving large dataset sizes may incur a fee. Contact us if you have questions.

Certain file types in a Dataverse installation are supported by additional functionality, which can include downloading in different formats, previews, file-level metadata preservation, file-level data citation with UNFs, and exploration through data visualization and analysis. See the File Handling section of this page for more information.

[1] LibraData is an open repository, where restricting datasets is STRONGLY discouraged.

 

Folder Upload

On the UVA Dataverse repository, LibraData, you can upload files from a local folder and subfolders and keep the folder structure. To do this, you first must create a dataset (you can create one without any files) and save the dataset. On the newly created dataset page click "Upload Files". Next, click the “Upload a Folder” button. A new webpage opens where you click "Select a Directory" to select the folder you wish to upload. By default all files and sub-folders will be uploaded. You can select/unselect specific files from the folder, finally click “Start Uploads”.

Refresh your dataset page, to see the uploaded files.

Command-line DVUploader

The open-source DVUploader tool is a stand-alone command-line Java application that uses the Dataverse software API to upload files to a specified Dataset. Since it can be installed by users, and requires no server-side configuration, it can be used with any Dataverse installation. It is intended as an alternative to uploading files through the Dataverse web interface in situations where the web interface is inconvenient due to the number of files or file locations (spread across multiple directories, mixed with files that have already been uploaded or file types that should be excluded) or the need to automate uploads. Since it uses the Dataverse software API, transfers are limited in the same ways as HTTP uploads through the Dataverse web interface in terms of size and performance. The DVUploader logs its activity and can be killed and restarted as desired. If stopped and resumed, it will continue processing from where it left off.

Usage

The DVUploader is open source and is available as source, as a Java jar, and with documentation at https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader. The DVUploader requires Java 1.8+. Users will need to install Java if they don’t already have it and then download the latest release of the jar file. Users will need to know the URL of the Dataverse Repository, the DOI of their existing Dataverse Dataset, and have generated a Dataverse software API Key (an option in the user’s profile menu).

Basic usage is to run the command:

java -jar DVUploader-*.jar -server=<Dataverse server URL> -did=<Dataset DOI> -key=<User's API Key> <file or directory list>

Additional command line arguments are available to make the DVUploader list what it would do without uploading, limit the number of files it uploads, recurse through sub-directories, verify fixity, exclude files with specific extensions or name patterns, and/or wait longer than 60 seconds for any Dataverse software ingest lock to clear (e.g. while the previously uploaded file is processed, as discussed in the File Handling section below).

DVUploader is a community-developed tool. You can find more information on DVUploader at the project’s GitHub repository or you can email libra@virginia.edu.
 

Duplicate Files

The Dataverse software handles duplicate files (filename and checksums):

  • Files with the same checksum can be included in a dataset, even if the files are in the same directory.
  • Files with the same filename can be included in a dataset as long as the files are in different directories.
  • If a user uploads a file to a directory where a file already exists with that directory/filename combination, the Dataverse software will adjust the file path and names by adding “-1” or “-2” as applicable. This change will be visible in the list of files being uploaded.
  • If the directory or name of an existing or newly uploaded file is edited in such a way that would create a directory/filename combination that already exists, the Dataverse software will display an error.
  • If a user attempts to replace a file with another file that has the same checksum, an error message will be displayed and the file will not be able to be replaced.
  • If a user attempts to replace a file with a file that has the same checksum as a different file in the dataset, a warning will be displayed.
     

BagIt Support

BagIt is a set of hierarchical file system conventions designed to support disk-based storage and network transfer of arbitrary digital content. It offers several benefits such as integration with digital libraries, easy implementation, and transfer validation. See the Wikipedia article for more information.

UVA Dataverse, LibraData has enabled BagIt file handling. When uploading BagIt files the repository will validate the checksum values listed in each BagIt’s manifest file against the uploaded files and generate errors about any mismatches. The repository may identify a certain number of errors, such as the first five errors in each BagIt file, before reporting the errors. If so, you can fix the errors and re-upload the BagIt files.