--- title: "Automatically determine run-time dependencies for R packages on Linux" output: html_document vignette: > %\VignetteIndexEntry{Automatically determine run-time dependencies for R packages on Linux} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` If you are distributing binary R packages (or any other binary) for Linux, it is important that you check and declare the __run-time dependencies__ for your binaries. This can easily be automated, and prevents many problems and conflicts. Currently RSPM leaves the client guessing which system libraries the binaries are linked to, which results in users installing unnecessary build-time dependencies, sometimes even the wrong ones. Once you distinguish between build-time and run-time system libraries in Linux distributions, the solution is obvious, and the system will become much simpler and more robust. This is not a hack, Linux package managers have been designed to [automatically determine dependencies](http://ftp.rpm.org/max-rpm/s1-rpm-depend-auto-depend.html) between system libraries. You should use the same tools when providing binaries for R packages, even if they are not distributed in a `rpm` or `deb` format. ## Dynamic linking to system libraries on Linux Many R packages on Linux require external system libraries. When you build the package from source, you need the build-time system library, which includes header files and has many additional dependencies needed at build-time. These build-time system libraries are always named with a `-dev` or `-devel` postfix, for example [`libcurl4-openssl-dev`](https://packages.debian.org/bullseye/libcurl4-openssl-dev) on Debian/Ubuntu, and `curl-devel` on Fedora/RHEL. But, here is the crucial part: __once the R package has been compiled, you only need the run-time system library to use it!__ This is __a different package__ which is much lighter, because the build-time package always depends on the run-time package, but not the other way around. Run-time system libraries are: - Much lighter than build-time: no headers, less dependencies - Never conflict with each other (because: no headers) - Versioned: they have a different package name for different ABI versions of the library - Can automatically be determined using `ldd` on the R package `.so` file For example, if you build an R package against [`libcurl4-openssl-dev`](https://packages.debian.org/bullseye/libcurl4-openssl-dev), then the run-time dependency is [`libcurl4`](https://packages.debian.org/bullseye/libcurl4). When you provide users with pre-compiled binaries on Linux, you really need to provide the metadata about the run-time dependencies of those binaries. You can easily automate this, and it would make RSPM dependency management much simpler and more reliable. ## Automatically determine runtime system-dependencies In a nutshell: After you have successfully built an R package on your Linux server, run `ldd` on the package `.so` file to list the shared libraries it links to. The operating system package manager (e.g. `yum` or `dpkg`) can tell you which system package each file belongs to. Simply add this information to the binary package DESCRIPTION file that you are shipping. That's it! To make it even easier: the `maketools` package has an example function that shows the system dependencies for installed R packages on Linux. For example, let's have a look at the dependencies of the `sf` CRAN package. On Ubuntu 20.04 we see: ```r > maketools::package_sysdeps("sf") shlib package headers source version 1 libproj.so.15.3.1 libproj15 libproj-dev proj 6.3.1-1 2 libgdal.so.26.0.4 libgdal26 libgdal-dev gdal 3.0.4+dfsg-1build3 3 libgeos_c.so.1.13.1 libgeos-c1v5 libgeos-dev geos 3.8.0-1build1 4 libstdc++.so.6.0.28 libstdc++6 gcc 10-20200411-0ubuntu1 ``` And on Fedora 32 we get: ```r > maketools::package_sysdeps("sf") shlib package headers source version 1 libproj.so.15.3.2 proj proj-devel proj 6.3.2 2 libgdal.so.26.0.4 gdal-libs gdal-devel gdal 3.0.4 3 libgeos_c.so.1.13.3 geos geos-devel geos 3.8.1 4 libstdc++.so.6.0.28 libstdc++ gcc 10.2.1 ``` The first column `shlib` tells you which shared libraries the R package is linked to, i.e. the filenames of the `.so` files. The second column shows which system package this file belongs to. This is the (only) relevant piece of information when you are distributing the binary, because these are exactly the system packages the client needs to have installed for the binary R package to work. Nothing more, nothing less! ## A suggested workflow A simple way to build R binary packages is on a server or container that has all build-time libraries pre-installed (the per-package build-time dependencies are really not relevant). For example you can use the cranlike [`cran/debian`](https://hub.docker.com/repository/docker/cran/debian) or [`cran/ubuntu`](https://hub.docker.com/repository/docker/cran/ubuntu) docker images for the latest version of Debian and Ubuntu. ```r docker run -it cran/ubuntu ``` After building and installing an R package, you check the package run-time dependencies, for example: ```r > install.packages("openssl") ## ... ## ... ## ** checking absolute paths in shared objects and dynamic libraries ## ** testing if installed package can be loaded from final location ## ** testing if installed package keeps a record of temporary installation path ## * DONE (openssl) > maketools::package_sysdeps("openssl") shlib package headers source version 1 libssl.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2 2 libcrypto.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2 ``` For every R binary package you distribute, you should provide, at a minimum, the information from the `package` column. The best way would be to add this to the DESCRIPTION file of the binary R package, and ideally also expose this in the [PACKAGES](https://packagemanager.rstudio.com/cran/__linux__/xenial/latest/src/contrib/PACKAGES) repository index. Thereby clients can lookup the required system dependencies needed for this binary R package, 100% reliably, without guessing or conflicts.