README Files are a common way to document the contents and structure of a folder and/or a dataset so that a researcher can locate the information they need. Data documentation can be maintained in a variety of forms. Explore additional Documentation & Metadata practices.

README Resources

  • README Template

    Title of dataset
     

    Name/institution/contact information for:

    • Principal Investigator (or person responsible for collecting the data)
    • Data manager or custodian
       

    File name structure

    • Structure: Provide the template you are using for your filenames
    • Attributes: Describe the attributes used to name the files
    • Codes: Provide a complete list of any codes/abbreviations used
    • Provide examples of the above items
       

    File formats

    • Provide a list of all file formats present in this dataset. If you need to convert or migrate your data files from one format to another, be aware of the potential risk of the loss or corruption of your data and take appropriate steps to avoid/minimize it
    • File Format Examples:
      • Databases: XML, CSV
      • Geospatial: SHP, DBF, GeoTIFF, NetCDF
      • Moving Images: MOV, MPEG, AVI, MXF
      • Audio: WAVE, AIFF, MP3, MXF
      • Numbers/statistics: ASCII, DTA, POR, SAS, SAV\Images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
      • Text: PDF/A, HTML, ASCII, XML, UTF-8
      • Graphs: JSON, YAML, XML

    Column headings for tabular data

    • For tabular data, list and define column headings:
      • Units of measurement
      • Data formats, such as YYYY/MM/DD
      • Calculations
      • Versioning: Establish a procedure for documenting changes in files. One option is to create a changelog in this README file, where every step that will change the output files is listed.
  • Example README File

    Dataset Title: Raw Images for Experiment A, Smith Lab
     

    Principal Investigator: John Smith, PI, 555-555-5555, jsmith@hms.harvard.edu
     

    File Naming Convention: ExperimentName_InstrumentID_CaptureDateTime_ImageID.tif
    The base file name is composed of the name of the experiment, the ID number of the instrument used, the date and time that the image was captured, and the unique identifier of the image.
     

    Attributes: Also see the Codes section for a list of instruments and their ID numbers

    1. ExperimentName = Name of the experiment
    2. Instrument ID = Five-digit code assigned to the lab instrument
    3. CaptureDateTime = Date and time at which the image was captured, in YYYYMMDD format
    4. Image ID = Three-digit unique identifier for image, such as 001, 002, 003
       

    Codes:

    1. [List of instruments and IDs]
       

    Examples:

    1. File formats: daf2-age1_14052_20150412T0515_005.tif
    2. Versioning: All changes to this dataset will be documented in a changelog in this README file
  • Additional Guidance

What's in a README?
RDM Seminar Recorded Summer 2020
Watch the following video to get started with README Files