Testing file-based R pipelines using testthat

I have been working on a set of parametrized markdown scripts for various transcriptomic workflows. These are broken into sequences of scripts with defined input and outputs that are chained together manually, or with good ol’ make. However, despite the somewhat low-tech nature the tooling here, integration testing across the scripts is still helpful–nay–mandatory!

I struggled with trying to develop automated testing for this until I:

  1. Recalled seeing Jenny Bryan’s excellent blogpost on self-cleaning test fixtures on twitter some months back,
  2. Still had it open as a tab waiting for the appropriate mood to strike me.

GeneseeSC, the pipeline package we’re writing sets up a project workspace, copies down some templated and parameterized markdown scripts, and then it’s up to the user to call these scripts, or further modify them to solve the analytic task at hand. To test this, I needed to

  1. Create project in a temporary directory
  2. Copy down a representative sample of data
  3. Run some scripts with various sets of parameters
  4. Then clean up everything once done.

To solve 1. we use withr::defer(), essentially following Jenny’s recipe for create_local_package:

create_local_project = function(dir = tempdir(), env = parent.frame(),
...){
  oldwd = getwd()
  withr::defer({
    setwd(oldwd)
    unlink(project_dir, recursive = TRUE)
  }, envir = env)

  project_dir = geneseesc_skeleton(geneseesc_root = dir, ...)
  setwd(project_dir)
  }

The function in 1. is already pretty handy – we can use to to create example projects with different parameters (hidden in the …) and test the behaviors.

And it doesn’t take too much to more to solve 2. We merely need to wrap create_local_project into another function that handles the rest of the project set up.

create_exampleproject = function(dir = tempdir(), env =
parent.frame(), ...){
  project_dir = create_local_project(dir, env, ...)
  pkg_dir = sprintf('%s_projectexample/', list(...)$project_type)
  pkg_contents = file.path(system.file(pkg_dir, package = 'GeneseeSC'), '.')
  file.copy(pkg_contents, project_dir, recursive = TRUE)
  }

For 3. we wrap create_exampleproject into a local({, or test_that({ block and then call various tests in testthat.

local({ #begin exampleproject
  create_exampleproject(authors = 'you and me', project_type =
  'scRNA', ...) # etc, arguments passed to `geneseesc_skeleton`
  test_that('Can run QC with citeseq', {
  rmarkdown::render("01qc.Rmd",
  params = list(tenx_h5 = 'scratch/AGG1/raw_feature_bc_matrix.h5',
  auto_filter = FALSE, ...)) # etc, various combinations of parameters
  })
  # More tests
  # ...
  })
  ## destroy exampleproject

The local block defines the scope of the deferred actions in create_local_project – once we leave this block, we change back to the original working directory and unlink, handling task 4 as well.

Thanks to Jenny for the excellent blogpost explaining the process!