Automatically determine run-time dependencies for R packages on Linux

If you are distributing binary R packages (or any other binary) for Linux, it is important that you check and declare the run-time dependencies for your binaries. This can easily be automated, and prevents many problems and conflicts. Currently RSPM leaves the client guessing which system libraries the binaries are linked to, which results in users installing unnecessary build-time dependencies, sometimes even the wrong ones.

Once you distinguish between build-time and run-time system libraries in Linux distributions, the solution is obvious, and the system will become much simpler and more robust.

This is not a hack, Linux package managers have been designed to automatically determine dependencies between system libraries. You should use the same tools when providing binaries for R packages, even if they are not distributed in a rpm or deb format.

Dynamic linking to system libraries on Linux

Many R packages on Linux require external system libraries. When you build the package from source, you need the build-time system library, which includes header files and has many additional dependencies needed at build-time. These build-time system libraries are always named with a -dev or -devel postfix, for example libcurl4-openssl-dev on Debian/Ubuntu, and curl-devel on Fedora/RHEL.

But, here is the crucial part: once the R package has been compiled, you only need the run-time system library to use it! This is a different package which is much lighter, because the build-time package always depends on the run-time package, but not the other way around.

Run-time system libraries are:

  • Much lighter than build-time: no headers, less dependencies
  • Never conflict with each other (because: no headers)
  • Versioned: they have a different package name for different ABI versions of the library
  • Can automatically be determined using ldd on the R package .so file

For example, if you build an R package against libcurl4-openssl-dev, then the run-time dependency is libcurl4.

When you provide users with pre-compiled binaries on Linux, you really need to provide the metadata about the run-time dependencies of those binaries. You can easily automate this, and it would make RSPM dependency management much simpler and more reliable.

Automatically determine runtime system-dependencies

In a nutshell: After you have successfully built an R package on your Linux server, run ldd on the package .so file to list the shared libraries it links to. The operating system package manager (e.g. yum or dpkg) can tell you which system package each file belongs to. Simply add this information to the binary package DESCRIPTION file that you are shipping. That’s it!

To make it even easier: the maketools package has an example function that shows the system dependencies for installed R packages on Linux. For example, let’s have a look at the dependencies of the sf CRAN package. On Ubuntu 20.04 we see:

> maketools::package_sysdeps("sf")
                shlib      package     headers source              version
1   libproj.so.15.3.1    libproj15 libproj-dev   proj              6.3.1-1
2   libgdal.so.26.0.4    libgdal26 libgdal-dev   gdal   3.0.4+dfsg-1build3
3 libgeos_c.so.1.13.1 libgeos-c1v5 libgeos-dev   geos        3.8.0-1build1
4 libstdc++.so.6.0.28   libstdc++6        <NA>    gcc 10-20200411-0ubuntu1

And on Fedora 32 we get:

> maketools::package_sysdeps("sf")
                shlib   package    headers source version
1   libproj.so.15.3.2      proj proj-devel   proj   6.3.2
2   libgdal.so.26.0.4 gdal-libs gdal-devel   gdal   3.0.4
3 libgeos_c.so.1.13.3      geos geos-devel   geos   3.8.1
4 libstdc++.so.6.0.28 libstdc++       <NA>    gcc  10.2.1

The first column shlib tells you which shared libraries the R package is linked to, i.e. the filenames of the .so files. The second column shows which system package this file belongs to. This is the (only) relevant piece of information when you are distributing the binary, because these are exactly the system packages the client needs to have installed for the binary R package to work. Nothing more, nothing less!

A suggested workflow

A simple way to build R binary packages is on a server or container that has all build-time libraries pre-installed (the per-package build-time dependencies are really not relevant). For example you can use the cranlike cran/debian or cran/ubuntu docker images for the latest version of Debian and Ubuntu.

docker run -it cran/ubuntu

After building and installing an R package, you check the package run-time dependencies, for example:

> install.packages("openssl")
## ...
## ...
## ** checking absolute paths in shared objects and dynamic libraries
## ** testing if installed package can be loaded from final location
## ** testing if installed package keeps a record of temporary installation path
## * DONE (openssl)
> maketools::package_sysdeps("openssl")
              shlib   package    headers  source         version
1    libssl.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2
2 libcrypto.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2

For every R binary package you distribute, you should provide, at a minimum, the information from the package column. The best way would be to add this to the DESCRIPTION file of the binary R package, and ideally also expose this in the PACKAGES repository index. Thereby clients can lookup the required system dependencies needed for this binary R package, 100% reliably, without guessing or conflicts.