Section 11 Code Documentation Best Practices

Repository, code, and function documentation are all integral parts of making code understandable to those reviewing your code and other end users.

11.1 Repository documentation

When you create a repository, you create a README.md file, but what goes in there?! Here are a few basics that you can use to populate this README file so that others know what your repo does and who to ask questions about it!

A few sentences about what the repo does.
- Include where the data originate (data provenance) and indicate whether or not it uses a symlink folder.
- If there is an end product (say, a Shiny app), state what that is - and if it is deployed, link it to the README.
A primary contact for the repo. If you include your email, spell out “dot” and “at” so that you don’t get spammed.
A brief overview of the repo architecture. An onlooker might not know what src is, so and overview for a repo that has a src and data folder might look like this:
Folder Descriptions:
- src: this folder contains all scripts for this repo
- data: this folder is untracked in git, and must be set up using a symlink for the code to function. Contact XXX for the link to the folder.

11.2 Code Documentation

Code documentation takes two primary forms when you are working in an .Rmd, .Qmd, or .ipynb:

Text outside of code chunks that gives an overview of the script (which may appear at the top of a script as a chunk that might be called ‘Purpose’ or ‘Overview’). In general, any person should be able to read the text (or headers!) that are outside of the code chunks and know what the script does without ever looking at the code chunks themselves.
Commented text inside the code chunks that gives explicit/in depth descriptions about what you are doing. This is information that might be specific to the workflow, but may not be necessary to understand the function of a chunk - like:

#using a for-loop here instead of a funciton because… blah blah

Code documentation in .R and .py files is limited to commented text only. If there is a workflow that uses .R and .py files (like, a targets workflow), it is expected that the code is commented AND there is a bookdown or markdown-type file that is rendered alongside the workflow that describes it.

11.3 Function Documentation

Functions should be documented using formal roxygen-style tags (for R scripts) and docstring-style tags (for Python). This is particularly important for {targets} workflows, but should become common place in any non-literate code we create. This type of documentation is not required for functions that are in a linear workflow, like those in a literate script (.Rmd, .Qmd, .ipynb).

11.3.1 Roxygen-style documentation

Roxygen is a formal way of describing a function. It is often used in package development as a way to populate the --help command for a function. We use the style of Roxygen to add documentation to our code. Here is an example of using Roxygen style to add documentation:

#' Function to check for and install (but not load) packages. 
#' This is likely to be used for most {targets} pipelines. 
#' 
#' @param x a package name, in quotations
#' @returns text string to indicate whether a package was
#' installed, or if it was already installed.
#' 
#' @seealso [package_loader()]
#' 
#' 
package_installer <- function(x) {
  if (x %in% installed.packages()) {
    print(paste0('{', x ,'} package is already installed.'))
  } else {
    install.packages(x)
    print(paste0('{', x ,'} package has been installed.'))
  }
}

Read more about Roxygen documentation here.

11.3.2 docstring Documentation

Docstring is a simple, no-frills way to add documentation to Python scripts. While there are many other documentation methods that are more similar to Roxygen, this is the one that is the most straightforward for our purposes.

def csv_to_eeFeat(df, proj):
  """Function to create an eeFeature from location info

  Args:
      df: point locations .csv file with Latitude and Longitude
      proj: CRS projection of the points

  Returns:
      ee.FeatureCollection of the points 
  """
  features=[]
  for i in range(df.shape[0]):
    x,y = df.Longitude[i],df.Latitude[i]
    latlong =[x,y]
    loc_properties = {'system:index':str(df.id[i]), 'id':str(df.id[i])}
    g=ee.Geometry.Point(latlong, proj) 
    feature = ee.Feature(g, loc_properties)
    features.append(feature)
  ee_object = ee.FeatureCollection(features)
  return ee_object

Docstring documentation is particularily easy to create in VS Code editor using the extension ‘autoDocstring’. This will automatically populate the ‘Args’ and ‘Returns’ fields including the ‘Args’ variable names.

11.4 Bookdown Documentation

We will often use {bookdown} to document large projects, especially when using {targets}. {bookdown} is a package that allows you to create a book-like document and deploy as a github.io website, if desired. You can also render to .pdfs.

11.4.1 General style guidance

Keep your repository tidy by putting your bookdown .Rmd files, index.Rmd, and associated .yml files into their own directory called “bookdown”. The output directory should be set to “docs” in the _bookdown.yml file. See this example of .yml content:

delete_merged_file: true
output_dir: "../docs"
language:
  ui:
    chapter_name: "Section "

Note that the output_dir is set in reference to where the bookdown is being rendered (in the “bookdown/” directory).

Keep your headers to 3 subheadings (“###’) or fewer.