README files are created for a variety of reasons:
1. to document changes to files or file names within a folder;
2. to explain file naming conventions, practices, etc. "in general" for future reference; and
3. to specifically accompany files/data being deposited in a repository.
It is best practice to create a README document for each dataset regardless of whether it is being deposited in a repository because the document might become necessary at a later point.
A good data practice is to store a readme.txt with each distinct dataset that explains your file naming convention along with any abbreviations or codes you have used. Write your README document as a plain text file, and avoid proprietary formats, such as Microsoft Word, whenever possible. A README template is below, with recommendations for information you may want to include.
Note: If you deposit your final datasets in a data repository, the repository may ask you to provide a README document with additional details about your datasets, such as methodological information or sharing/access information. Creating a README document at the beginning of your research process, and updating it consistently throughout your research, will help you to compile a final README document when your data is ready for deposit.
README Template (Recommended minimum content is in bold)
Title of dataset
Name/institution/contact information for:
Principal Investigator (or person responsible for collecting the data)
File name structure:
Structure: Provide the template you are using for your filenames.
Attributes: Describe the attributes used to name the files.
Codes: Provide a complete list of any codes/abbreviations used.
Provide examples of above items.
File formats: Provide a list of all file formats present in this dataset. If you need to convert or migrate your data files from one format to another, be aware of the potential risk of the loss or corruption of your data and take appropriate steps to avoid/minimize.
File Format Examples:
- Databases: XML, CSV
- Geospatial: SHP, DBF, GeoTIFF, NetCDF
- Moving Images: MOV, MPEG, AVI, MXF
- Audio: WAVE, AIFF, MP3, MXF
- Numbers/statistics: ASCII, DTA, POR, SAS, SAV\Images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
- Text: PDF/A, HTML, ASCII, XML, UTF-8
- Graphs: JSON, YAML, XML
- File Format Examples:
Column headings for tabular data: For any tabular data, list and define column headings, including:
Units of measurement
Data formats, such as YYYYMMDD
Versioning: Establish a procedure for documenting changes in files. One option is to create a changelog in this ReadMe document, where every step that will change the output files is listed.
Example README Document
Dataset title: Raw Images for Experiment A, Smith Lab
Principal Investigator: John Smith, PI, 555-555-5555, email@example.com
File name structure:
The base file name is composed of the name of the experiment, the ID number of the instrument used, the date and time that the image was captured, and the unique identifier of the image.
ExperimentName = Name of the experiment.
Instrument ID = Five-digit code assigned to the lab instrument.
See the Codes section for a list of instruments and their ID numbers.
CaptureDateTime = Date and time at which the image was captured,
in YYYYMMDDThhmm format.
Image ID = Three-digit unique identifier for image, such as 001, 002, 003.
[List of instruments and IDs]
4. File formats: tif
5. Versioning: All changes to this dataset will be documented in a changelog in this ReadMe document.