The open-source community narrowly avoided a major security crisis with the discovery of a critical vulnerability (CVE-2024-3094) and backdoor that was discovered in XZ Utils, a data compression library used by many Linux distributions.
XZ Utils provides a collection of command-line tools and libraries for lossless data compression using the LZMA algorithm. The most prominent program within this suite is xz, a powerful compressor frequently used for tasks like archiving files, creating software packages, and compressing disk images. Due to its efficiency and reliability, XZ Utils is installed on many mainstream Linux distributions.
The discovery of the backdoor
Andres Freund, a Microsoft PostgreSQL developer focused on performance, first noticed something strange on his machine when developing new functionality for an upcoming PostgreSQL release. Andres noticed that his CPU would randomly spike and after some intense investigation was able to pinpoint the issue to XZ Utils and upon further review found that malicious code had been introduced. Andres notified the wider community about this backdoor by sending a mail to OpenWall’s security mailing list:
After observing a few odd symptoms around liblzma (part of the xz package) on Debian sid installations over the last weeks (logins with sshd taking a lot of CPU, valgrind errors) I figured out the answer: The upstream xz repository and the xz tarballs have been backdoored. At first I thought this was a compromise of debian’s package, but it turns out to be upstream.
This vulnerability was intentionally introduced into XZ as a backdoor to allow a remote attacker access to a targeted system via remote code execution. The malicious code that was introduced could tamper with SSH authentication.
This backdoor targeted popular Debian and RPM-based distributions, which includes e Ubuntu, Red Hat, Fedora, and CentOS. These distributions rely on package managers that download and install software. If a backdoored version of XZ Utils had made it into these repositories, a significant number of users could have unknowingly installed the vulnerable software.
The backdoor specifically affected xz versions 5.6.0 and 5.6.1, released in February 2024. Thankfully, due to its late discovery, the backdoored version hadn’t achieved widespread deployment.
How the backdoor was introduced
The entire internet is underpinned by infrastructure and services that rely on open-source software. Open-source projects thrive on collaboration where code lives on a central platform (Frequently GitHub) and developers from around the world can propose improvements by creating changes (code commits) and then pushing these proposed changes to the open-source software project. These proposals are then reviewed by other developers, often the maintainers of the project, for functionality, performance issues, style, and security. After discussion and additional tweaks, these changes are approved and then are merged into the code branches that eventually will find themselves into a release branch. Over time, these merged changes accumulate, and project maintainers decide when a new release, containing bug fixes, new features, or both, is ready. This updated code is then made available to everyone.
Open-source projects have a distributed trust model that can be subverted by determined threat actors as we have seen with the introduction of this backdoor. The backdoor is made possible by several Linux components including OpenSSH, systemd, liblzma and xz utils. OpenSSH and systemd do not have any vulnerabilities as this backdoor is made possible specifically from liblzma and the wider xz project.
The xz project was created and maintained by a developer named Lasse Collin and has been stable since 2010 with a few updates since then. The introduction of this backdoor was the result of a planned scheme.
In 2021, a GitHub user calling themselves “Jia Tan” (@JiaT75) was created. Tan consistently delivered valuable contributions as a developer, gaining the confidence of the project maintainers. In 2022, Tan submitted a patch for xz along with other commits that were strongly supported by several other users. After analysis it appears that these users who supported the patch were sock puppet accounts and did not exist elsewhere on the internet. This trust allowed Tan to introduce the backdoor code into xz versions 5.6.0 and 5.6.1. Efforts were then made to expedite the adoption of these backdoored versions into Linux distributions, essentially trying to push the compromised project onto millions of machines.
The backdoor
The backdoor enables a Remote Code Execution (RCE) exploit and grants attackers unauthorized access to vulnerable systems. To trigger the backdoor, a unique Ed448 private key could bypass the standard SSH authentication protocols. Once authentication was bypassed, attacks would be able to steal data, install malware or even use the affected system to move laterally to other networks or systems.
The stealthy activation mechanism of this backdoor made it hard to detect, however since the backdoor requires a specific key, it limits its exploitability.
Developer Sam Jones, has an excellent summary of the backdoor:
This backdoor has several components. At a high level:
- The release tarballs upstream publishes don’t have the same code that GitHub has. This is common in C projects so that downstream consumers don’t need to remember how to run autotools and autoconf. The version of build-to-host.m4 in the release tarballs differs wildly from the upstream on GitHub.
- There are crafted test files in the tests/ folder within the git repository too. These files are in the following commits:
- tests/files/bad-3-corrupt_lzma2.xz (cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0, 74b138d2a6529f2c07729d7c77b1725a8e8b16f1)
- tests/files/good-large_compressed.lzma (cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0, 74b138d2a6529f2c07729d7c77b1725a8e8b16f1)
- A script called by build-to-host.m4 unpacks this malicious test data and uses it to modify the build process.
- IFUNC, a mechanism in glibc that allows for indirect function calls, is used to perform runtime hooking/redirection of OpenSSH’s authentication routines. IFUNC is a tool that is normally used for legitimate things, but in this case it is exploited for this attack path.
- Normally, upstream publishes release tarballs that are different than the automatically generated ones in GitHub. In these modified tarballs, a malicious version of build-to-host.m4 is included to execute a script during the build process.
Akamai has a good explanation of the backdoor and cryptography expert Flippo Valsorda also did an analysis of the backdoor.
Who was behind this supply chain attack?
This attack was not the work of a lone person, but rather a targeted operation by a nation state actor. At the time of writing there is no definitive evidence of which nation was behind the attack. We will likely know the answer to this question once further investigations have been completed.
The intelligence agencies of the Five Eyes (United States, Canada, United Kingdom, Australia, and New Zealand) are unlikely to behind the attacks given the mechanism that was used. The United States can be excluded as they would not get the legal authority to undermine a crypto subsystem.
Russia and China are the likely culprits behind this attack. Russian actors have already performed supply chain attacks such as the NotPetya Attack in 2017 and the SolarWinds attack in 2020. In the SolarWinds attack, code was added to the build pipeline that could allow unauthorized access to systems. Linguistical analysis of the language used by the person pushing the commits suggest that the person behind it is Russian due to the language mistakes that were made that are commonly made by Russians trying to speak English. Time zone analysis shows that who ever was behind this, was likely working in the Asia Pacific time zone which would include Russia and China.
China is could also behind the attacks as they have been targeting critical infrastructure around the world in preparation for their invasion of Taiwan. One of these operations was Volt Typhoon, which targeted critical infrastructure. The FBI and intelligence agencies shut down infrastructure associated with Volt Typhoon earlier this year. It is also likely that China hired Russians to carry out this attack.
The calculated approach employed by Tan highlights the potential dangers of social engineering within open-source projects, especially with regards to advanced supply chain attacks. Working and building trust over years to introduce a backdoor is a significant operation. While we got lucky that a developer spotted this backdoor, what other backdoors could have been introduced without anyone knowing?