Sep 17, 2021 - Testthat With Side Effects

Testing file-based R pipelines using testthat

I have been working on a set of parametrized markdown scripts for various transcriptomic workflows. These are broken into sequences of scripts with defined input and outputs that are chained together manually, or with good ol’ make. However, despite the somewhat low-tech nature the tooling here, integration testing across the scripts is still helpful–nay–mandatory!

I struggled with trying to develop automated testing for this until I:

  1. Recalled seeing Jenny Bryan’s excellent blogpost on self-cleaning test fixtures on twitter some months back,
  2. Still had it open as a tab waiting for the appropriate mood to strike me.

GeneseeSC, the pipeline package we’re writing sets up a project workspace, copies down some templated and parameterized markdown scripts, and then it’s up to the user to call these scripts, or further modify them to solve the analytic task at hand. To test this, I needed to

  1. Create project in a temporary directory
  2. Copy down a representative sample of data
  3. Run some scripts with various sets of parameters
  4. Then clean up everything once done.

To solve 1. we use withr::defer(), essentially following Jenny’s recipe for create_local_package:

create_local_project = function(dir = tempdir(), env = parent.frame(),
...){
  oldwd = getwd()
  withr::defer({
    setwd(oldwd)
    unlink(project_dir, recursive = TRUE)
  }, envir = env)

  project_dir = geneseesc_skeleton(geneseesc_root = dir, ...)
  setwd(project_dir)
  }

The function in 1. is already pretty handy – we can use to to create example projects with different parameters (hidden in the …) and test the behaviors.

And it doesn’t take too much to more to solve 2. We merely need to wrap create_local_project into another function that handles the rest of the project set up.

create_exampleproject = function(dir = tempdir(), env =
parent.frame(), ...){
  project_dir = create_local_project(dir, env, ...)
  pkg_dir = sprintf('%s_projectexample/', list(...)$project_type)
  pkg_contents = file.path(system.file(pkg_dir, package = 'GeneseeSC'), '.')
  file.copy(pkg_contents, project_dir, recursive = TRUE)
  }

For 3. we wrap create_exampleproject into a local({, or test_that({ block and then call various tests in testthat.

local({ #begin exampleproject
  create_exampleproject(authors = 'you and me', project_type =
  'scRNA', ...) # etc, arguments passed to `geneseesc_skeleton`
  test_that('Can run QC with citeseq', {
  rmarkdown::render("01qc.Rmd",
  params = list(tenx_h5 = 'scratch/AGG1/raw_feature_bc_matrix.h5',
  auto_filter = FALSE, ...)) # etc, various combinations of parameters
  })
  # More tests
  # ...
  })
  ## destroy exampleproject

The local block defines the scope of the deferred actions in create_local_project – once we leave this block, we change back to the original working directory and unlink, handling task 4 as well.

Thanks to Jenny for the excellent blogpost explaining the process!

Dec 18, 2019 - Compiling R From Source

Compiling R from source on a Mac

I recently needed to run R devel on a Mac and couldn’t get the precompiled binaries to work (ggplot2 would segfault when I tried to install it – probably a compiler issue). In any case, R is pretty quick to compile from source (at least in terms of wall clock), though there ended up being quite a few Mac-specific issues to tackle.

  1. Install clang8 or greater, eg here or perhaps with homebrew.
  2. export SDKROOT="$(xcrun --sdk macosx --show-sdk-path)" so clang can find headers that Apple puts in a weird place. Consider adding this to .bash_profile, and potentially .Renviron (I need to logout and back in to test if this is necessary). This was the weirdest and hardest to solve problem.
  3. Download R sources, unpack.
  4. Edit config.site following instructions in the R installation and Administration guide, especially those about setting variables CC, CXX and R_LD_LIBRARY_PATH to point to where your upgraded version of clang lives. Somehow, after installation, R will magically know to continue to use these compilation variables for all libraries that are installed!
  5. Run ./configure setting at least --enable-memory-profiling --enable-R-framework --x-libraries=/opt/X11/lib if you want to be able to run RStudio against your compiled R. I also set --with-blas="-framework Accelerate", and you could try to set --enable-lto if you have clang10 (?) or higher for a potential speed-up.
  6. make; make install if you set --enable-R-framework. You will probably need sudo for make install, and might need to set SDKROOT again if you get complaints about missing headers. If you didn’t --enable-R-framework, then consider make install rhome=/usr/local/lib/R-4.0.0-devel since setting rhome to be something non-default will let multiple versions coexist.

Oct 9, 2019 - Mkl Without Root

Using Intel MKL in R without root (or local installation)

Optimized versions of linear algebra libraries can offer substantial time savings for numeric computing. But getting them installed and your programs linked against them can be a real odyssey.

In R, the blas library is left undefined until runtime by looking for a library in R RHOME called libRblas.so. In principal, you can drop in any replacement for blas by replacing this symlink.

If you don’t have write acces to RHOME, you could beg your sysadmin to make a custom install. Or you could compile and manage your own installation. A third option, however is available by making use of a library variable R consults at startup R_LD_LIBRARY_PATH.

By setting this to a directory that you can write to that contains symlinking to your desired alternatives, you can set this on a per-instance basis. For instance I make a directory 3.9-blas-mkl containing the following:

> [3.9-blas-mkl]$ ls -l
-rwxrwx--x 1  7936 Oct  9 12:02 libRblas.so
lrwxrwxrwx 1  11 Oct  9 12:02 libRlapack.so -> libRblas.so
lrwxrwxrwx 1   40 Oct  9 12:15 libR.so -> /software/r/3.6.1/b2/lib64/R/lib/libR.so
-rw-rw---- 1  245 Oct  9 12:03 README
-rw-rw---- 1  39 Oct  9 12:01 shim.c

Here libRblas.so is a library that contains links to all the stuff needed for Intel MKL, and was generated by compiling shim.c using icc following directions here.

Detailed instructions

  1. Install or find a version of intel MKL. I was able to load one from my HPC with module load intel. This puts icc and bunch of other stuff on the path.
  2. Make a directory to hold symlinks you are going to drop into R and change to it.
  3. Download and compile shim.c following directions here. This is probably optional, but elegantly handles the fact that I didn’t want to hard code in symlinks to the mysterious locations of the intel MKL libraries in the module that I loaded, and may protect against other subtle issues. This should give you a libRblas.so linked against MKL.
  4. Make a symlink from libRlapack.so -> libRblas.so
  5. Make a symlink from R RHOME libR.so to libR.so. I did this because otherwise R complained that it couldn’t find libR.so with the value of R_LD_LIBRARY_PATH. This may also be possible to avoid, but was an easy fix for me.
  6. export R_LD_LIBRARY_PATH=”/home/R/3.9-blas-mkl”. Note I got tripped up trying to use the shell expansion ~ here and had to use an absolute path.
  7. Start R and verify success by examining sessionInfo().