Imagine a little ARM board with a 64 bit processor on it. Something like a Raspberry Pi 3, or a newer Pi 2. These things aren’t exactly the first to come to mind when anyone asks for “a fast computer”.
So what should we do if we want to run NixOS on one of these, which routinely uses a gigabyte of memory or more to do system updates? We could add a lot of swap space and wait a while, sure, but that gets annoying pretty quickly and doesn’t exactly speed things up either.
Now what could we do to make this whole process a little nicer?
There are two approaches to making a slow system run NixOS maintenance operations faster: do them Somewhere else, or do them somewhere Else.
capital S somewhere else
One of these is to use remote builds.
Remote builds are a feature of nix that allow one machine to offload work to another machine, in any fraction from none at all to everything—including the tasks nixos-rebuild
runs to turn a configuration.nix
file into a system configuration.
When a nix task is to be run (and any remote builder configured on the system has resources to run that task) nix may run the task on a remote builder instead of running it locally.
(There are many variables that determine whether or not a task will be run locally or remotely.
The manual has a much better explanation of these than can be provided here, but we can summarize it as “as much as possible as quickly as possible”.)
Offloading work to another machine of course requires that other machine to be able to do that work. For nix this usually requires that the machine doing the offloading and the machine being offloaded to be the same architecture (we won’t go much into why that is, only note that it is). If you have a small ARM board and a very powerful ARM board this works perfectly fine, but if you have a small ARM board and a more powerful x86 laptop it works a lot less well.
The way to make it work at all requires the x86 machine to emulate an ARM CPU, and run all tasks within that emulation.
NixOS conveniently provides a few options to enable such emulation, for the case of supporting a 64 bit ARM machine it’s as easy as setting boot.binfmt.emulatedSystems = [ "aarch64-linux" ];
on the systems tasks are offloaded to.
For the ARM machine to use this new emulated sibling it must be told where to look for help, and what help that help can actually provide.
Since requests for help are sent via ssh
commands the machine doing the helping must also be configured to accept those commands:
# slow machine
nix.distributedBuilds = true;
nix.buildMachines = [ {
hostName = "fast-machine";
sshUser = "builder";
systems = [ "x86_64-linux" "aarch64-linux" ];
supportedFeatures = [ "big-parallel" ]; # allow big compile tasks
} ];
# fast machine
users.users.builder = {
isNormalUser = true;
openssh.authorizedKeys.keys = [
"..." # ssh public key of root on the slow machine
];
};
nix.trustedUsers = [ "builder" ];
Unfortunately emulating a different CPU architecture is not a very efficient thing to do, so offloaded tasks may not run much faster than they did originally on the machine that did the offloading. But they do run and they often have more memory to work with, which in itself can speed up a task a lot even when running under emulation—just from not having to swap constantly.
This approach works well if only a few small packages have to be compiled from source, otherwise it will still take a pretty long time from all the emulation overhead.
somewhere capital E else
The other option is to cross-compile everything. Where before we were running lots of small tasks on the architecture of the slow machine and generated results for the architecture of the slow machine, now we simply skip that first bit: we run all tasks on a fast machine (what architecture isn’t important any more) and generate results for the architecture of the slow machine. This can give us a massive speed boost on large tasks.
The easiest way to achieve this kind of thing is setting a few configuration.nix
options on the slow machine and using remote builders as before, just without the binfmt
trick:
nixpkgs.localSystem = { system = "x86_64-linux"; };
nixpkgs.crossSystem = { system = "aarch64-linux"; };
Setting these two will run all tasks on an x86 machine, but produce results for an aarch64 machine (eg, a Pi 3).
Sounds perfect, right?
not so fast
There are two big problems with cross-compiling: build system bugs, and not being able to use binary caches.
Build system bugs cause cross-compile taks to fail where they shouldn’t, usually because the possiblity of being cross-compiled was ignored or forgotten in some corner of a piece of software. These bugs can be fixed, but they are extremely tedious and best avoided if at all possible.
Not being able to use binary caches is perhaps more important for the “small pi running a webserver from the corner of the room” use case. Since cross-compiling uses a different compiler and a different set of build instructions than building “normally” does, and nix folds everything that goes into creating a package into the identity of that package, a cross-compiled package has a different identity than a natively compiled package. The binary caches provided by the nixos project only contain natively compiled packages. Both of these together mean that using this method requires a lot of time spent compiling packages that already exist in compiled form, but a form we can’t use when cross-compiling because what creates those packages differes between us and the cache.
best of both worlds?
We can however combine both approaches if most of the packages we’re using can be pulled from binary caches. To do this we’ll once more enable emulation on a fast machine:
boot.binfmt.emulatedSystems = [ "aarch64-linux" ];
users.users.builder = {
isNormalUser = true;
openssh.authorizedKeys.keys = [
"..." # ssh public key of root on the slow machine
];
};
nix.trustedUsers = [ "builder" ];
and enable remote builds on the slow ARM board:
nix.distributedBuilds = true;
nix.buildMachines = [ {
hostName = "fast-machine";
sshUser = "builder";
systems = [ "x86_64-linux" "aarch64-linux" ];
supportedFeatures = [ "big-parallel" ]; # allow big compile tasks
} ];
So far there is no cross-compilation to be found, everything is either pulled from binary caches or compiled “natively” (in emulation on the fast machine, if necessary).
Once we find a package that takes too long to compile like this (we will use the linux kernel as an example), we can use the cross-compilation support of nixpkgs to selectively cross-compile only that one package—although we will also have to compile (possibly cross-compile) every package that depends on anything we’ve cross-compiled so far.
Since we want to use a fast machine to assist a slow board we aren’t done with just specifying which architecture a compile task should run for (using crossSystem
), but we must specify which architecture the task should run on (using localSystem
).
The architecture we’ll compile for is chosen later, by taking it from pkgs.pkgsCross.<architecture>
instead of pkgs
as we normally would.
In the case of cross-compiling the kernel, we’d use this configuration snippet:
boot.kernelPackages = # was: pkgs.linuxPackages_latest
let p = (import <nixpkgs> { localSystem = "x86_64-linux"; });
in p.pkgsCross.aarch64-multiplatform.linuxPackages_latest;
If this seems rather contrived (since we can just use the kernel from a binary cache instead), here’s the snippet for cross-compiling a pinebook pro kernel using the pinebook pro overlay by samueldr:
boot.kernelPackages = # was: pkgs.linuxPackages_pinebookpro_lts
let
p = (import <nixpkgs> {
localSystem = "x86_64-linux";
crossSystem = "aarch64-linux";
# linuxPackages_pinebookpro_lts is defined by this overlay
overlays = [ (import ./modules/wip-pinebook-pro/overlay.nix) ];
});
in p.linuxPackages_pinebookpro_lts;
Another more general (and perhaps easier) way to specify which packages to cross-compile and which to take from a cache or compile natively is to create an overlay that takes specific packages from the cross-compiled set, but leaves the rest intact.
boot.kernelPackages = pkgs.linuxPackages_pinebookpro_lts;
nixpkgs.overlays = lib.mkAfter [
(self: super:
let
p = (import <nixpkgs> {
localSystem = "x86_64-linux";
crossSystem = "aarch64-linux";
overlays = [ (import ./modules/wip-pinebook-pro/overlay.nix) ];
});
in {
inherit (p)
linuxPackages_pinebookpro_lts;
});
Note the lib.mkAfter
in the definition of overlays.
This is necessary because including the pinebook pro overlay as a module per its instructions adds the overlay that defines linuxPackages_pinebookpro_lts
and we want to replace this package, so our overlay must be ordered after other overlays.
If we merely wanted to replace a package that already exists (eg firefox) ordering is not as important.
Replacing one package like this triggers a rebuild of all packages that depend on the replaced package (or packages that depend on that package, etc), but it doesn’t automatically cross-compile those depending packages.
Each package that is to be cross-compiled has to be listed in the overlay, what isn’t listed in such a manner will be compiled natively (or using emulation).
but nixops..?
All of these tricks also work within machine configurations in nixops!
When deploying across architectures with nixops it is necessary to ensure that all packages are compiled for the architecture being deployed to, which means that for such deployments either nixpkgs.localSystem
or nixpkgs.crossSystem
must be set.
The combined approach—in this case, setting localSystem = "aarch64-linux";
and using overlays to cross-compile specific packages—is probably easiest to get started and uses binary caches to save a lot of compilation time.
For nixops we don’t need remote builders either and can instead run everything on just a single machine (using binfmt
emulation where required), making things easier still.
The easiest case, where all machines nixops manages are aarch64 ARM board, can be as simple as setting
defaults = {
nixpkgs.localSystem = "aarch64-linux";
};
in nixops.nix
!