File Management Principles
The way project directories are structured and folders are organized is an underappreciated but important aspect of research data management. Of course, sensible file organization schemes are highly context-dependent and will vary from project to project and across domains. That said, there are some general principles and guidelines that apply broadly:
Example Directory Structure
Below is a diagram of an example directory structure. Again, there is no right answer, and you may choose to organize your project folders differently (for instance, you might keep code and results together, or you may choose to keep documentation specific to each experiment or sub-study).
(Example file directory image created by Michael and licensed CC-BY)
Best practice for file naming is that the names are descriptive of the contents of the file. The goal is to be able to understand and recall at a glance what is in any given file. Some potential attributes and information to include are:
Some technical formatting guidelines for file naming:
About File Formats
File formats should be chosen to enable sharing, long-term access, and preservation of your data. Ideally, this means standard and open (non-proprietary) formats, but this may not always be possible depending on the file type and project needs. Researchers of course must consider which formats are best suited to data creation/collection and analysis vs. which are most easily preserved and shared.
When open formats are an option, however, openly available documentation and continued community support for these formats increase the likelihood that such files will be successfully preserved and able to be (re)used down the road by a wider audience.
If you use a program with a proprietary file format as a part of your research, we recommend exporting a copy of that data/file in an open format if possible (e.g. exporting tabular data from Excel as a comma-separated value file), especially when it comes time to deposit and share your data. Please note and be aware that such format conversions may result in the loss of data, metadata, formatting, or other information in some cases. For this reason, we also recommend you keep the original data files, as they may be the files with the most complete version of your dataset.
For certain file types such as images, audio, and video, you will have a choice between lossy and lossless formats. Lossy formats employ (irreversible) compression to reduce filesize, at the cost of fidelity. Lossless formats are generally preferred unless storage space is at a premium.
Recommended Digital Formats Overview (for preservation):
The above are just common suggestions. If you have a repository in mind, check their site to see what they recommend. If you do convert your files or export copies of your files in standard/open formats, consider adjusting filenames to make this clear, as the differences between these files will not necessarily be obvious from the file extensions/types alone.
If you have questions about file types/formats and are wondering which are best for your project, reach out to us.
It is often important when working with research files to be able to track changes and revert to an earlier version of a file. Version control refers to both the process of tracking multiple versions of a file and also to software that implements such a functionality. Many folks might be familiar with version control as something that programmers use and need, but it has benefits for various types of research and data files besides code. These benefits may be particularly useful for datasets that require complex processing and for collaborative research projects where many people edit the same files.
If you want to employ version control in your project's file management, there are a few ways to do it:
Some best practices for manual version control: