Trojan Source: Invisible Vulnerabilities
We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers.
'Trojan Source' attacks, as we call them, pose an immediate threat both to first-party software and supply-chain compromise across the industry. We present working examples of Trojan-Source attacks in C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We propose definitive compiler-level defenses, and describe other mitigating controls that can be deployed in editors, repositories, and build pipelines while compilers are upgraded to block this attack.
Additional details can be found in our related paper (also on arXiv) and at trojansource.codes.
This repository is divided into per-language subdirectories. Each subdirectory contains a series of proofs-of-concept implementing various Trojan-Source attacks as well as a README describing the compilers/interpreters with which these attacks were verified. The source code for the website publishing these attacks can is located in the website/
subdirectory.
We include a summary of the languages evaluated in the table below:
Language | Vulnerable to Early Return |
Vulnerable to Commenting-Out |
Vulnerable to Stretched Strings |
Tool Evaluated |
---|---|---|---|---|
C | ~ | ✓ | ✓ | GNU gcc v7.5.0Apple clang v12.0.5 |
C++ | ~ | ✓ | ✓ | GNU g++ v7.5.0Apple clang++ v12.0.5 |
C# | ~ | ✓ | ✓ | .NET 5.0 via dotnet-script |
JavaScript | ~ | ✓ | ✓ | Node.js v16.4.1 |
Java | ~ | ✓ | ✓ | OpenJDK v16.0.1 |
Rust | ~ | ✓ | ✓ | rustc v1.53.0 |
Go | ~ | ✓ | ✓ | go v1.16.6 |
Python | ✓ | ✓ | ✓ | Python 3.9.5 via clang Python 3.7.10 via gcc |
SQL | ✓ | ✓ | ✓ | SQLite v3.39.4 |
Bash | ~ | ✓ | ✓ | zsh v5.8.1 |
Assembly | ✓ | ✓ | ~ | x86_64 gas on Apple clang v14.0.0 |
Solidity | ✓ | ✓ | ~ | Solidity v0.8.16 |
✓ means the rendered code visually matches common style for that language, while ~ means visual renderings adhere to language syntax but deviate from common style (e.g. the multiline comment terminator */ is written as /*/). The proofs-of-concept included in this respository provide explicit examples for clarity.
We note that this list of affected languages is non-exhaustive, and welcome community contributions to expand to further languages.
We further note that some of the above tools have been patched since the disclosure of Trojan-Source attacks, and therefore include the versions of each tool evaluated. For example, rustc
now throws errors for unterminated Bidi control characters.
Finally, in addition to the Bidi attacks shown above, we evaluated each language against the Homoglyph and Invisible character attacks also described in the related paper. These evaluations can be found in the README files of each language subdirectory.
We include a summary of the code viewers evaluated in the table below:
Bidi Attack (Windows) | Bidi Attack (MacOS) | Bidi Attack (Ubuntu) | Homoglyph Attack (Windows) | Homoglyph Attack (MacOS) | Homoglyph Attack (Ubuntu) | |
---|---|---|---|---|---|---|
Visual Studio Code (v1.61) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Atom (v1.58.0) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
SublimeText (v4121) | Bidi unactioned | Bidi unactioned | Bidi unactioned | ✓ | ✓ | ✓ |
Notepad++ (v8.1.9) | Displays control symbol | ✓ | N/A | N/A | N/A | N/A |
Eclipse (v4.21) | Mangled | Missing Glyph | ✓ | ✓ | ✓ | ✓ |
IntelliJ (v2021.2.3) | Displays control char | Displays control char | Displays control char | ✓ | ✓ | ✓ |
Visual Studio (v16.11.5/v8.10.11) | Mangled | ✓ | N/A | ✓ | ✓ | N/A |
Xcode (v14.0.1) | N/A | ✓ | N/A | N/A | ✓ | N/A |
vim (v8.2.1790) | Mangled | Displays codepoint | Displays codepoint | Misrendered | ✓ | ✓ |
emacs (v27.2) | ✓ | Displays underscores | ✓ | ✓ | ✓ | ✓ |
GitHub (patched Oct '21) | ✓ | ✓ (except Safari) | ✓ | ✓ | ✓ | ✓ |
Bitbucket (patched Nov '21) | ✓ | ✓ (except Safari) | ✓ | ✓ | ✓ | ✓ |
GitLab (patched Oct '21) | ✓ | ✓ (except Safari) | ✓ | ✓ | ✓ | ✓ |
✓ means that the code viewer is vulnerable to the attack on that platform. N/A indicates that the code viewer is not available on that platform. All web-based products were tested on October 2021 releases of Google Chrome, Microsoft Edge, Mozilla Firefox, and Apple Safari. Any visualization deviations on non-vulnerable platforms are described.
We note that many of these code viewers have since been patched, and for patched versions Trojan Source defenses may need to be disabled in settings to visualize these attacks as described in the related paper.
To maximize reproducability, we note that all evaluations were performed on the following operating systems:
- Windows: Window 10 build 19043
- MacOS: MacOS Big Sur
- Ubuntu: Ubuntu 20.04
As noted, many of the compilers, code editors, and repository frontends examined in this work has since been patched with Trojan Source defenses. To reproduce the results, we recommend installing the known-vulnerable versions of software listed above, or disabling any defenses in the settings of later versions.
To validate our results, we recommend opening each of the proofs-of-concept in a vulnerable code viewer, confirming that the code is displayed as depicted in the related paper, and validating that the program executes the hidden logic rather than the visualized logic when compiled/executed with a vulnerable compiler/interpreter. Example compiler or interpreter commands are provided in the subdirectory README for each vulnerable language included in this repository.
To ease reproducability, we provide a Dockerfile that pre-installs and compiles the POCs in this repository using vulnerable tooling. The following commands will build the image, launch a container, and attach a terminal to the container for faster reproduction of our findings:
docker build -t trojan-source .
docker run --name ts -d -it trojan-source
docker attach ts
Note that the Solidity and Assembly POCs are exluded from the Docker image because they target different platforms than the Ubuntu base image. Reproduction instructions for these two platforms are given in Solidity/README.md and Assembly/README.md.
Interested in analyzing source code files for the presence of Trojan Source attacks? Check out this repo, which visualizes bidirectional overrides.
If you use anything in this repository, in the Trojan Source paper, or on trojansource.codes in your own work, please cite the following:
@inproceedings{boucher_trojansource_2023,
author = {Nicholas Boucher and Ross Anderson},
title = {Trojan {Source}: {Invisible} {Vulnerabilities}},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
address = {Anaheim, CA},
publisher = {USENIX Association},
month = aug,
url = {https://arxiv.org/abs/2111.00169}
}