Nix State of the SBOM
A ‘Software Bill of Materials’ (SBOM) tells you the list of ‘ingredients’ of a software artifact. The idea is that you may need technology-specific tools to build this SBOM, and then can have technology-agnostic tools to analyze them. The canonical example is ‘vulnerability scanners’, that intend to give you an overview of which security advisories exist for the artifact.
In this post I give an overview of the current state of various available tools, with actual examples showing their output in action. My plan is to use this example to drive improving the tools, and updating this post as they improve. Do drop me a note if you see anything wrong or missing here!
Nix SBOM tools
In Nix, a bunch of .nix files defines a build. These .nix expressions can
be ‘instantiated’/’evaluated’ into .drv store derivation format, which can
then further be ‘realised’/‘built’ into store entries containing the build results.
Currently, the ‘meta’ block is only visible at the .nix level and lost at the .drv level.
SBOM tools that work at the .nix level typically don’t evaluate the nix
expressions themselves, but are partly implemented in the nix language,
which is mostly flexible enough to introspect the packages. The advantage
is that this means they can also see the ‘meta’ section. The big limitation
of this approach is that they cannot discover dependency relations defined
via string interpolation. The
workaround is to revert back
to parsing .drvs for these dependencies, which loses the advantages of
being at the nix level. SBOM tools in this category are
bombon and Genaelogos (which uses
nixtract.
SBOM tools that work on the .drv representation cannot access the meta
sections, but surprisingly much information can still be reconstructed here.
There are some
initiatives towards lifting
the restriction that the meta section does not make it into the .drv
representation.
sbomnix AFAICT is a tool in this
category. It can also attempt to enrich its SBOM with metadata extracted from
the nixpkgs top-level attributes (via nix-env), but since a typical system
will have many attributes that are not in that set, that is very limited.
There also an experiment
to adopt partly the same approach that bombon uses.
In practice
Let’s see what all this looks like in practice: given the derivation:
{
fetchFromGitea,
fetchFromGitHub,
google-fonts,
lib,
libpcap,
ncurses,
stdenv,
}:
let
assets = fetchFromGitea {
domain = "codeberg.org";
owner = "raboof";
repo = "mobius";
rev = "5c8876a4e886d39c072c63945a9d3e351aa6e668";
hash = "sha256-+m1T+1La+v2XC4TiROdRpfO8GXzxbCqMuUDboffdXuc=";
};
in
stdenv.mkDerivation rec {
pname = "nethogs";
version = "0.8.8";
src = fetchFromGitHub {
owner = "raboof";
repo = "nethogs";
rev = "v${version}";
sha256 = "sha256-+yVMyGSBIBWYjA9jaGWvrcsNPbJ6S4ax9H1BhWHYUUU=";
};
buildInputs = [
ncurses
libpcap
];
makeFlags = [
"VERSION=${version}"
"nethogs"
];
installFlags = [
"PREFIX=$(out)"
"sbin=$(out)/bin"
];
postInstall = ''
mkdir $out/fonts
cp ${google-fonts}/share/fonts/truetype/Akatab-Medium.ttf $out/fonts
cp ${assets}/mobius-bw-dithered.png $out/share
'';
meta = {
description = "Small 'net top' tool, grouping bandwidth by process";
longDescription = ''
NetHogs is a small 'net top' tool. Instead of breaking the traffic down
per protocol or per subnet, like most tools do, it groups bandwidth by
process. NetHogs does not rely on a special kernel module to be loaded.
If there's suddenly a lot of network traffic, you can fire up NetHogs
and immediately see which PID is causing this. This makes it easy to
identify programs that have gone wild and are suddenly taking up your
bandwidth.
'';
license = lib.licenses.gpl2Plus;
homepage = "https://github.com/raboof/nethogs#readme";
platforms = lib.platforms.linux;
maintainers = [ lib.maintainers.rycee ];
mainProgram = "nethogs";
};
}
I generated SBOMS and formatted them so you first get an overview of the components (in a hierarchy if the SBOM records it). You can click any element to see the details of the component in the SBOM:
- bombon:
- indeed it has information from the ‘meta’ section, such as licenses
- seems to be missing the ‘assets’ and ‘google-fonts’ references due to this nix limitation
- produces a flat list rather than a tree/graph
- genealogos:
- shows more hierarchy
- seems to be missing the ‘assets’ and ‘google-fonts’ references due to this nix limitation
- sbomnix:
- shows more hierarchy
- has the google-fonts reference
- while it technically has the ‘assets’ codeberg reference, it is an unrecognizable ‘source.drv’ reference without further metadata. This seems fixable, though: the metadata required for this is right there in the
.drv. - notably does have license and reference metadata - I should double-check how it collects those.
Pruning the Tree
You may have noticed the SBOMs above are pretty huge. This is because they are ‘build-time’ SBOMs, recording the full build requirements of the artifact (though we’ve seen even those may miss references…). When used for vulnerability scanning, this will give a huge amount of false positives: most advisories indicate problems along the lines of: ‘when untrusted input is passed in here, bad things may happen’. However, in all situations but the very most paranoid, you typically trust your build inputs anyway, and these advisories in build-time dependencies are useless to you.
To improve your signal-to-noise ratio, most SBOM tools allow pruning the tree to only the run-time references:
- bombon
- while smaller than the build-time report, it lists suspiciously many components
- genealogos
- sbomnix
Perhaps unsurprisingly, these all miss the fact that resources from google-fonts and assets were actually copied into this artifact.
Software Identification
If you want to actually associate security advisories with artifacts, you need to identify them somehow. In Nix, we have the luxury of extremely precise descriptions of what exactly was built, so in theory we could identify things like “X was built without feature Y so it is not affected” or “Z was built with patch P so it is not affected”. We should definitely keep sufficient information in the SBOM to make such precise statements.
Making such precise statements is also a lot of work. For components that are in nixpkgs, hopefully we can crowd-source this work in the security tracker.
Many components will not be in nixpkgs, so for those we cannot leverage this crowdsourced work, and it is up to the person looking at the SBOMs to associate components with advisories themselves. That can start out with some general fuzzy heuristics to do the initial matching of components to the identifications found in advisories, for example looking at Package URLs but also whatever you can get your hands on: CPEs (even if they’re somewhat weak, as they’re typically only registered after a software has published its first security advisory), project websites, you name it.
In other words: ideally the SBOM should include both precise Nix derivation/output paths (that can potentially be matched precisely to information found in the security tracker), but it should also include the less-precise identifiers to perform fuzzy 3rd-party matching on.
How Nix compares
How does the Nix SBOM landscape compare to ’traditional’ Linux distributions? It’s a mixed bag:
What we do badly
In traditional Linux distributions, it’s relatively easy to observe which packages are installed, and what the metadata for those packages is. In Nix, as we’ve seen in ‘Pruning the Tree’, it’s not so obvious to tell which derivations are ‘actually installed’ and which are ‘merely build inputs’: it’s much more common for data to be copied from a ‘build input’ derivation in Nix than it is for data to be copied from a build dependency into a target dependency on a traditional distro.
Also, while we do record a lot of metadata, it is often still incomplete or not easily accessible by our SBOM tools.
Many builds that use a language-specific dependency management tool will end up with derivations that contain ’the dependencies’. These should be analysed so that their SBOMs can be included in the result, and for many ecosystems that functionality is still missing.
What we do well
If we manage to find good solutions to the challenges above (which seems entirely feasible), we have superpowers: in Nix, often your whole system is defined as a Nix expression – both things that were imported from the nixpkgs ‘distribution’ as well as any additional software added on top.
Users of traditional distro’s will have a hard time adding all the stuff that they
curl | sh’d, Ansible’d or pip install‘ed onto their machines to their SBOM,
and will likely have to resort to tools such as syft
to look at the filesystem and ‘get the ingredients back out of the soup’.
Ideally in the future, Nix users will be much better-positioned to get an accurate and complete image of their dependencies.
What’s next?
I believe there’s a lot of low-hanging fruit: as seen above, the current tools still appear to miss a lot of information that should be readily available to them. The available information can also be presented in a more easily-consumable way.
We can further enrich the metadata in nixpkgs, for example by allowing maintainers to add explicit Package URLs to their packages that go beyond what can already be inferred from the structure. This PR goes in that direction, but also introduces a mechanism to automatically ‘propagate’ ‘inferred’ PURL information. Perhaps the latter is not necessary (the examples above show current SBOM tools can already infer a lot of this information themselves already), and we should start by providing a place to add manually-curated information. It might also be worth inferring or recording component type information.
As mentioned before builders that use language-specific dependency management tools creating bundles of dependencies (e.g. based on lockfiles) are generally not well-supported yet. bombon does have a feature to integrate those when they’re provided by a package, but as far as I know that is not common. Creating those will require additional ecosystem-specific work, which is nontrivial but seems within reach.
This pad has some further ideas on what would be good to improve.
If you’re interested in this topic, join the #nixpkgs-sbom:matrix.org Matrix channel
