David A. Wheeler’s Page on Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers

Here’s information about my work to counter the “Trusting Trust” attack. The “Trusting Trust” attack is an incredibly nasty attack in computer security; up to now it’s been presumed to be the essential uncounterable attack. I’ve worried about it for a long time, essentially since Ken Thompson publicly described it. After all, if there’s a known attack that cannot be effectively countered, should we be using computers at all? Thankfully, I think there is an effective countermeasure, which I have named “Diverse Double-Compiling” (DDC).

This page notes my 2009 PhD dissertation and its preceding 2005 ACSAC paper, a little about citing my work, and detailed data (to duplicate the experiments), It then has sections on countering misconceptions, what about applying this to hardware?, Software patents and application programmer interface (API) copyrights, credit where credit is due, and who’s talking about it?. We then have a section on real-world application of DDC, specifically discussing GNU Mes. It includes a large section on some related material.

2009 PhD dissertation

Fully Countering Trusting Trust through Diverse Double-Compiling (PDF version, HTML version, OpenDocument text version, Perma.cc link to PDF, arXiv:1004.5534 of PDF, GMU Mason Archival Repository Service (MARS)) is my 2009 PhD dissertation explaining how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique. This dissertation was accepted by my PhD committee on October 26, 2009.

The video of my official public defense is also available; this presentation was given on November 23, 2009, 1-3pm (podcast/RSS available). The presentation materials are also available in PDF and OpenDocument (ODP) formats. The public defense was held at George Mason University, Fairfax, Virginia, Innovation Hall, room 105 [location on campus] [Google map].

Here’s the abstract of the dissertation:

An Air Force evaluation of Multics, and Ken Thompson’s Turing award lecture (“Reflections on Trusting Trust”), showed that compilers can be subverted to insert malicious Trojan horses into critical software, including themselves. If this “trusting trust” attack goes undetected, even complete analysis of a system’s source code will not find the malicious code that is running. Previously-known countermeasures have been grossly inadequate. If this attack cannot be countered, attackers can quietly subvert entire classes of computer systems, gaining complete control over financial, infrastructure, military, and/or business system infrastructures worldwide. This dissertation’s thesis is that the trusting trust attack can be detected and effectively countered using the “Diverse Double-Compiling” (DDC) technique, as demonstrated by (1) a formal proof that DDC can determine if source code and generated executable code correspond, (2) a demonstration of DDC with four compilers (a small C compiler, a small Lisp compiler, a small maliciously corrupted Lisp compiler, and a large industrial-strength C compiler, GCC), and (3) a description of approaches for applying DDC in various real-world scenarios. In the DDC technique, source code is compiled twice: once with a second (trusted) compiler (using the source code of the compiler’s parent), and then the compiler source code is compiled using the result of the first compilation. If the result is bit-for-bit identical with the untrusted executable, then the source code accurately represents the executable.

The dissertation includes a section explaining how it extends my previous 2005 ACSAC paper. The dissertation generalizes the ACSAC paper (now compilers don’t need to self-parent), includes formal proofs, and includes demonstrations with GCC (to demonstrate scaleability) and with a malicious compiler.

If you read the dissertation you should also look at the dissertation errata (the errata are trivial and do not impact the fundamentals of anything in the dissertation).

My thanks go to the committee members, who were very helpful. A special thanks go to Dr. Ravi Sandhu; I wanted to do a PhD dissertation that was completely off the beaten path, and he was flexible enough to let me do it. He also had some great advice for getting through the process. Dr. Daniel A. Menascé asked me to demonstrate the approach with a malicious compiler (which I did). Dr. Jeff Offutt asked me about its relationship to N-version programming (so I added material about how this is different than N-version programming). Dr. Paul Ammann had some interesting comments about the N-version programming material; it turns out that he was personally involved in that landmark study! Dr. Yutao Zhong asked me about T-diagrams (so I added material about why I did not use them). Everyone on the committee asked good questions, especially in the private presentations before the public defense; thank you!

2005 ACSAC paper

Here’s my 2005 paper, which was formally reviewed and published by ACSAC:

Countering Trusting Trust through Diverse Double-Compiling (DDC), David A. Wheeler, Proceedings of the Twenty-First Annual Computer Security Applications Conference (ACSAC), December 5-9, 2005, Tucson, Arizona, pp. 28-40, Los Alamitos: IEEE Computer Society. ISBN 0-7695-2461-3, ISSN 1063-9527, IEEE Computer Society Order Number P2461. If you cannot get that paper from ACSAC, here’s a local copy of Countering Trusting Trust through Diverse Double-Compiling (DDC) as posted by ACSAC. You can also get this alternative PDF of “Countering Trusting Trust through Diverse Double-Compiling (DDC)” and OpenDocument form of “Countering Trusting Trust through Diverse Double-Compiling (DDC)”. (I have the rights to publish it here as well.)

I’m honored to have been accepted by the ACSAC 2005 conference. They get lots of good submissions, yet in 2005 they rejected 77% of their submitted papers. One reason that I submitted to ACSAC is that I believe publication on the web is absolutely critical for widespread use of a result; ACSAC has been publishing on the web for a long time now, and is an open access conference.

There’s a minor change in notation between the ACSAC paper and the later dissertation:
ItemACSAC (2005)Dissertation (2009)
Trusted compilerTcT
Compiler under testAcA
Parent compiler-cP

I have a presentation based on the ACSAC paper. I gave the original presentation at ACSAC; I’ve since updated it a little based on various feedback I’ve received.

You can get the presentation in:

Note: The ACSAC 2005 paper “Countering Trusting Trust through Diverse Double-Compiling” has a typo. In the last paragraph of section 4, just ahead of the figure, it says: “if c(sA,c(sA,T)), A, and c(sA,T) are identical, ...”. The “c(sA,T)” should be “c(sA,A)”; you can confirm this because the figure clearly shows “c(sA,A)” not “c(sA,T)”. My thanks to Ulf Dittmer for pointing this out to me!

Citing my work (it’s David A. Wheeler, please)

If you cite my work, at least include my middle initial “A.”, and if at all possible please use “David A. Wheeler”. Please do not cite me as “David Wheeler” or “D. Wheeler” in any written work (including electronic media like the Internet). There are too many David Wheelers, so it’s like not giving me credit at all. If you are required by forces outside your control to use initials, at least use “D. A. Wheeler”. However, I would really appreciate it if you showed me the courtesy of using my name as I use it, instead of changing it. In general, please cite the names that people actually use; please don’t change them into someone else’s name. Thanks. This doesn't apply to talking in person, of course; usually there aren’t that many David Wheelers in the room that it’s confusing :-).

Detailed data to duplicate the experiments

I strongly believe that scientific work must be repeatable. Sadly, much of the so-called computational sciences are no longer a science, because it is increasingly not possible to reproduce work. This problem is no secret; it is discussed in papers such as “Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science” by Victoria C. Stodden (Computing in Science & Engineering, 2010). It's not just computer science either; there is a widespread replication crisis in science. See also the blog post on the paper Why most of psychology is statistically unfalsifiable. "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant" by Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn discusses some of dubious practices that allow practically anything to be "experimentally proven". Science is not the only source of truth, but if you're going to call it science, it needs to actually be science.

In contrast, I do provide the information necessary to reproduce this work. For the ACSAC paper, see my Tiny C Compiler (tcc) page for how to duplicate the ACSAC experiment, as well as other tcc-related work too. For the PhD dissertation, see the separate page on detailed data for the PhD dissertation. These provide enough information to repeat or extend the experiments.

Countering misconceptions

Some misconceptions seems to be especially hard to shake, so let me counter them here (as well).

The DDC approach does not assume that two completely different compilers will produce the same binary output, given the same input.

I say it in the ACSAC paper, and again in the dissertation, but somehow it does not sink in, so let me try again.

Both the ACSAC paper and dissertation do not assume that different compilers produce equal results. In fact, both specifically state that different compilers normally produce different results. In fact, as noted in the paper, it’s an improvement if the trusted compiler generates code for a different CPU architecture than the compiler under test (say, M68000 and 80x86). Clearly, if they’re generating code for different CPUs, the binary output of the two compilers cannot always be identical in the general case!

This approach does require that the trusted compiler be able to compile the source code of the parent of the compiler under test. You can’t use a Java compiler to directly compile C code.

For the pedants: Yes, sometimes it’s possible to write machine code that runs on multiple yet radically different CPU architectures, depending on the architectures. You may even be able to devise code that determines which architecture it’s running on, and then jumps to the “right” code for that architecture. These would exploit the exact values of various machine codes, and are certainly clever hacks. But if you want to do that, fat binaries with multiple segments (each for a different architecture) are a better approach — they’re designed to do that cleanly. In any case, that’s not the point; the point is that the compiler-under-test and the trusted compiler are not required to generate identical code as output.

Non-deterministic hardware is okay in DDC

DDC does require that the parent compiler must be deterministic when it compiles the compiler under test. That’s not the same as assuming that two different compilers always produce identical results. A compiler is deterministic if, when run twice on identical input (with all the same option flags, etc.), it produces the same output. You can use a random number generator, as long as you give the user control over the random number generator seed (gcc, for example, has a command line option for setting the seed). For example, on a Unix/Linux system, you should be able to do this:

  $ mycompiler input.c     # Compile, store result in "a.out".
  $ mv a.out a.out.saved   # Save old result.
  $ mycompiler input.c     # Do it again
  $ cmp a.out a.out.saved  # If always identical, it's determinstic.

This is a relatively easy constraint, and one that most compiler authors want to be true anyway (since non-deterministic compilers are hard to debug). Compilers generally are deterministic, with the possible exception of embedded timestamps — and I discuss how to handle embedded timestamps in the paper. Sometimes you may need to use a flag (e.g., to set a random number generator seed as in the GCC C++ compiler).

The parent compiler may internally use constructs that are individually non-deterministic (such as threads with non-deterministic scheduling), but if it does it must use those mechanisms in a way that ensures that the output will be the same on each execution given the same input. Today’s underlying CPUs have all sorts of non-deterministic properties (e.g., from threading multiple cores, or timing variances); “modern CPUs are inherently random and a complex general purpose OS on top amplifies this inherent randomness substantially” [“Analysis of inherent randomness of the Linux kernel” by Nicholas Mc Guire, Peter Okech, and Georg Schiesser]. But if the CPU were so non-deterministic that you could not reliably write data in a particular order, you couldn’t get a compiler or any other program to run. So the parent compiler simply needs to be written in way that ensures that these effects will not impact its results. For example, the parent compiler could use locks to ensure that thread scheduling variation does not cause variation in the results. In practice, developers tend to do this anyway.

The trusted compiler (“compiler T” in the ACSAC paper, and “compiler cT” in the dissertation) doesn’t need to be deterministic.

See assumption sP_portable_and_deterministic in the dissertation if you want more details.

DDC’s use of trusted compiler(s) fundamentally increases trustworthiness

Some past approaches used a second compiler, but they basically just switched which compiler you had to trust completely. Indeed, you might make things worse, if you switch from an unsubverted compiler to a subverted compiler.

DDC, in contrast, uses additional compilers as a check on the first. This fundamentally changes things, because now an attacker must simultaneously subvert both the original compiler, and all of the compilers used in DDC. Subverting multiple compilers is much harder than subverting one, especially since the defender can choose which compilers to use in DDC and can choose the compilers used in DDC after the attack has been performed.

Why not use the trusted compiler for everything?

Using a different trusted compiler greatly increases the confidence that the compiler executable corresponds with its source code. When a second compiler is used as part of DDC, an attacker must subvert multiple executables and executable-generation processes to perform the “trusting trust” attack without detection. If you only used the trusted compiler, you’re back to the original problem, which I view as total trust on a single compiler executable without a viable verification process.

Also, as explained in section 4.6, there are many reasons the trusted compiler might not be suitable for general use. It may be slow, produce slow code, generate code for a different CPU architecture than desired, be costly, or have undesirable software license restrictions. It may lack many useful functions necessary for general-purpose use. In DDC, the trusted compiler only needs to be able to compile the parent; there is no need for it to provide other functions.

Finally, note that the “trusted” compiler(s) could be malicious and still work well well for DDC. We just need justified confidence that any triggers or payloads in a trusted compiler do not affect the DDC process when applied to the compiler-under-test. That is much, much easier to justify.

Does applying DDC by itself guarantee the compiler isn't malicious?

No, applying DDC by itself does not guarantee that the compiler isn't malicious, or that the compiler is not doing something surprising to you, or that the compiler has no defects. For example, in 2016 it was discovered that Microsoft Visual Studio 2015 Update 2 was quietly inserting telemetry calls into compiled programs by default, even though this was not well documented and could harm privacy. That's not the sort of thing that DDC could typically detect.

Passing the DDC test simply means that you can read compiler source code to see what the compiler does, instead of having to review executable (binary) code. But that's a difference that matters: Developers are used to looking at source code, since that's what they normally do. DDC turns an intractable challenge into a normal review process.

What does “fully” mean?

By “fully” I mean that “the trusting trust attack can be detected and effectively countered” (as I say in the thesis). A little background may help illustrate why I use the word “fully”.

First, complaining that people trust others is a waste of time. You must trust others in a modern world. No one grows all their own food, builds their own shelters from their own materials, and provides all their other needs by themselves; we all trust others. However, there is a serious systemic problem if you cannot independently verify what you trust. You should strive to “trust, but verify”.

I believe the fundamental problem caused by the trusting trust attack was that it was impractical to independently verify that what you depended on (the executable) corresponds to its human-readable representation (the source code). This is because program-handling programs can subvert the relationship between what humans review and what is actually used. Ken Thompson’s paper is not titled “Reflections on trust”; it is “Reflections on trusting trust”. Again, I believe problem was not trust, but the lack of a meaningful process for independent verification.

With DDC, we now have a practical process to independently verify that source code and executable correspond. DDC fully counters the problem that we lacked a practical independent verification process for program-handling programs (like compilers).

I believe it’s important that we understand the limitations of any result. Section 8.14 explains, in detail, how an attacker can subvert DDC. Because DDC has been proven using a formal mathematical proof, the only way to counter DDC is to falsify one of the proof assumptions. A defender can make such falsification very difficult. For example, the defender, not the attacker, gets to choose the compiler(s) used as the trusted compiler(s); the defender can even write one himself. It’s true that an unwise defender can depend on components that are not really diverse, but section 6 describes how to get that diversity. Once the defender knows that diversity is a goal, the defender can come up with all sorts of ways to provide it.

My goal was to create a process for independent verification. DDC provides an independent verification process, and one that can be practically applied. I applied the DDC process to four different compiler executables, and one of them was the widely-used gcc. Therefore, DDC fully meets the need for an independent verification process that can be practically applied.

So why did I put the word fully in the dissertation title at all? Well, I needed to find some way to diffentiate the titles of the ACSAC paper and the PhD dissertation. I realized that my older ACSAC paper had an important limitation: it only applied to self-parented compilers. Many compilers are not self-parented, and thus, the older ACSAC paper process could not apply to many compiler executables in use. In contrast, the 2009 dissertation can address all compilers, self-parenting or not. Thus, the dissertation “fully” provides a process for verifying compiler executables, whether they are self-parented or not. I should note that even if I wanted to, I cannot change the title now :-).

What about applying this to hardware?

I mentioned applying this DDC approach to hardware in the dissertation and at the ACSAC conference. Obviously, if your software is okay, but the hardware is subverted, you’re still subverted. The ACSAC presentation and dissertation talk about this in more detail. DDC can be applied to hardware as well as software. As I also mentioned, there are two problem areas: legal and technical.

The legal problem is that increasingly chip designers and chip manufacturers cannot legally know what is supposed to be on the chip. For example, developers of the various “IP cores” used on chips typically forbid chip designers and manufactureres from obtaining or using this information.

The key technical problem is creating a meaningful “equality” test in hardware. I speculate that various techniques, such as scanning electron microscopes, could be used to help implement an equality test. Other hardware validation mecahnisms (e.g., see Semiconductor IP Validation Gets Faster), might also play a role. But it is fundamentally harder to implement equality tests for hardware (compared to software). I cited several papers in my dissertation about this. You can learn more about the challenge from papers pubished since then, such as “Stealthy Dopant-Level Hardware Trojans” by Georg T. Becker, Francesco Regazzoni, Christof Paar, and Wayne P. Burleson (Bruce Schneier briefly discusses this), as well as “Integrated Circuit Security Threats and Hardware Assurance Countermeasures” by Karen Mercedes Goertzel (CrossTalk, November/December 2013) [alternate URL]. "A2: Analog Malicious Hardware" by Kaiyuan Yang, Matthew Hicks, Qing Dong, Todd Austin, and Dennis Sylvester show "how a fabrication-time attacker can leverage analog circuits to create a hardware attack that is small (i.e., requires as little as one gate) and stealthy (i.e., requires an unlikely trigger sequence before effecting a chip’s functionality)." Researchers at University of Michigan demonstrated in 2016 a sabotaged processor called A2; it could be planted by a single employee at a chip factory.

However, while there are challenges, there is also hope. As noted in "X-Ray Tech Lays Chip Secrets Bare" by Samuel K. Moore (IEEE Spectrum, 2019-10-07, researchers in Switzerland and the U.S. have a non-destructive technique that has the poential to reverse engineer an entire chip without damaging it. This approach, called ptychographic X-ray laminography, could possibly "be used by integrated circuit designers to verify that manufactured chips match their designs."

Countering subverted hardware is definitely an area for potential future research. More generally, I think there's a growing problem in logic hardware verification. The Growing Semiconductor Design Problem is an especially easy-to-understand summary about the problem of hardware verification, aka the verification gap.

Software patents and application programmer interface (API) copyrights

The approach described here only works when you can create alternative implementations of computer languages (compilers). There is no technical problem in doing so, but some organizations are trying to make it difficult to legally create alternative implementations.

Any limitation on creating or distributing alternative implementations of a computer languages creates a dangerous threat to any user of that computer language. It also creates a threat to any user of programs developed (directly or indirectly) with that language.

Computer application programmer interfaces (APIs) and languages are generally held to be outside the scope of copyright. Specific implementations and their documentation can be copyrighted, but APIs and languages are fundamentally ideas and not just fixed expressions. This was long understood, but many rulings in 2012 (in the US and Europe) make this even clearer... though there are some stormclouds that threaten this. The Oracle v. Google “Order RE Copyrightability of Certain Replicated Elements of the Java Application programming Interface” of 2012 found that “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API. It does not matter that the declaration or method header lines are identical. Under the rules of Java, they must be identical to declare a method specifying the same functionality” even when the implementation is different. When there is only one way to express an idea or function, then everyone is free to do so and no one can monopolize that expression. And, while the Android method and class names could have been different from the names of their counterparts in Java and still have worked, copyright protection never extends to names or short phrases as a matter of law. ... This command structure is a system or method of operation under Section 102(b) of the Copyright Act and, therefore, cannot be copyrighted.” (Groklaw has this as text.) Similarly, the Court of Justice of the European Union found in SAS Institute v. World Programming Ltd., Judgment in Case C-406/10, that “The functionality of a computer program and the programming language cannot be protected by copyright.” (Here are the actual judgements of C-406/10.) Copyright, under U.S. law, specifically does not cover any “idea, procedure, process, system, method of operation, concept, principle, or discovery”; the history and justification of this (note that the list is much more than just ideas) is given in “Why Copyright Law Excludes Systems and Processes from the Scope of Its Protection” by Pamela Samuelson. However, on May 9, 2014, the Federal Circuit partially reversed the district court ruling, ruling in Oracle's favor on the copyrightability issue, and remanding the issue of fair use to the district court. I hope this will be construed narrowly; if broadly interpreted, then copyright might effectively prevent all future competition. As a practical matter, software must work with each other; if a a company can prevent compatible implementions, then that company can effectively prevent meaningful competition and verification measures like DDC. See Computer Scientists Ask Supreme Court to Rule APIs Can’t Be Copyrighted for more information about APIs and copyright.

Sadly, the risk from patents is still significant, as discussed in the dissertation. See my page on software patents for more.

Other thoughts on the dissertation

I used the word “trusted” when referring to the “trusted compiler”. I should note that there is a big difference between the words “trusted” and “trustworthy”. Something is trustworthy if there is evidence that it is worthy of trust; something is trusted if someone trusts it (hopefully because they have determined that it is trustworthy). If you use DDC, you need to use a trusted compiler — since you are trusting its results, by definition it is trusted. You should choose a trustworthy compiler as the trusted compiler, however.

The good news is that you do not need to use a totally perfect, never-makes-a-mistake compiler; such compilers are rare. Instead, you just have to use a compiler that meets the conditions described in the paper, which are much easier conditions to meet.

I tried to summarize some lessons learned on how to use tools to prove things in my short paper “How to prove stuff automatically”.

After my paper was published I learned of another subverted compiler example (in the trusting trust sense) in Mike Stute's answer to "What is a coder's worst nightmare?". He tried to modify a program but found he couldn't do it successfully. After 15 days of work, "I suddenly realize it's in the compiler... every time you compile the original code and run it puts in the subliminal message code into the source code... Several days later.. we recompile the compiler from the source. That solves it... Except it didn't... The ex-grad student had poisoned the compiler to poison itself when it was recompiled... We also found that if /sbin/login is compiled it puts in a backdoor allowing anyone who uses a specific password to login in as the root user. This computer is accessible by modem and Tymnet. Finally this gets the computing center's attention. Genius! But put to a horrible cause."

Credit where credit is due

As I clearly note in the paper, I didn’t come up with the original idea for the DDC countermeasure. The original idea was dreamed up by the amazingly bright Henry Spencer. However, he never pursued it; in fact over time he’d forgotten about it. I took his few sentences describing his idea and greatly expanded on it, including a much more detailed and analyzed description of it, as well as justifying and demonstrating it. For example, his original approach presumed self-parenting, a limitation my PhD dissertation removes. My thanks to him for his original idea, and for his helpful comments since.

I also want to credit those who made the world aware of the problem in the first place: Paul Karger, Roger Schell, and Ken Thompson. Paul Karger and Roger Schell’s groundbreaking analysis of Multics was the first time that this issue was identified. A key step in fixing a problem is knowing there’s a problem in the first place! I had several great conversations with Paul Karger, who was very enthusiastic about this work and provided several helpful comments. Sadly, Paul Karger died in 2010, and that is a loss for the world; the good news is that when he died, he knew about my solution and was quite happy about it. I also talked with Roger Schell about it. I also want to thank Ken Thompson, who (among his legion of accomplishments) demonstrated this attack and made far more people aware of the problem.

Who’s talking about it?

The first syllabus that included my ACSAC 2005 paper as required reading is CSC 593: Secure Software Engineering Seminar, a Spring 2006 class taught by Dr. James Walden at Northern Kentucky University. He paired my paper with Ken Thompson’s classic 1984 paper Reflections on Trusting Trust. It was also a subject of a class session at George Mason University (GMU)’s “Advanced Topics in Computer Security: Cyber-Identity, Authority and Trust” (IT962) taught by Ravi Sandhu. I had the honor of visiting for the day and giving the presentation myself for their Spring 2006 session. Technische Universitat Dortmund’s Lehrstuhl Informatik VI (Dr. Ulrich Flegel and Dr. Michael Meier) (WS 2007/2008) include it, too. It's specifically noted in Linux Luddites podcast #21 (August 2,2014) starting at 1:41 as well.

The ACSAC paper is cited in various places, including “Increasing Open Source Software Integration on the Department of Defense Unclassified Desktop” by Steven Anthony Schearer (June 2008), a Naval Postgraduate School (NPS) thesis, “How Practical Are Intrusion-Tolerant Distributed Systems?” by Obelheiro et al. (Sep 2006), Department of Informatics, University of Lisbon, and the PhD thesis “Tamper-resistant Peer-to-Peer Storage for File Integrity Checking” by Alexander Zangerl, Bond University, School of Information Technology (August 2006).

The ACSAC paper has been noted or discussed at many locations, including Bugtraq, comp.risks (the Risks digest), Bruce Schneier’s weblog (the source for Crypto-Gram), Lambda the ultimate, SC-L (the Secure Coding mailing list), LinuxSecurity.com, Chi Publishing’s Information Security Bulletin, Wikipedia’s “Backdoor” article, Open Web Application Security Project (OWASP) (mailing list), and others.

Bruce Schneier’s page in particular includes a lengthy commentary about it, and both his site and Lamba-the-Ultimate have various blog entries. The article Open Source is Securable discusses the paper and its ramifications -- in particular, it’s finally possible to make very strong claims through source code analysis.

BartK’s “Defeating the Trust Attack” summarized the PhD dissertation; this triggered a spirited reddit discussion in September 2013.

There was a lively discussion of the dissertation on Y Combinator's "Hacker News" in October 2016.

Diverse Double-Compiling to Harden Cryptocurrency Software by Niklas Rosencrantz, 2023, KTH, School of Electrical Engineering and Computer Science (EECS) ) is follow-on work. Its abstract is: "A trusting trust attack is a special case of a software supply-chain attack. The project in this report, named diverse double-compiling for cryptocurrency (DDC4CC), demonstrates and explains a defense for cryptocurrency software against trusting trust attacks. DDC4CC includes a case study that implements a trusting trust attack and the defense applied to a hypothetical theft of cryptocurrency on the Bitcoin blockchain. The motivation for such an attack is easy to understand: An adversary can acquire significant monetary funds by manipulating economic or decentralized financial systems. For a supply-chain attack in general, the outcome is potentially even more severe. An adversary can control entire organizations and even the systems belonging to the organization’s customers if the supply chain is compromised. Attacks are possible when targets are inherently vulnerable due to trust in their suppliers and trust in the supply chain, i.e., the hardware constructors and the software authors, the upstream development team, and the dependencies in the supply chain." This is a Master's thesis.

Isn’t this dissertation unusual?

Sure. In particular, this dissertation brings together technical areas that aren’t often combined. The practical demonstrations involved analyzing machine code (not just assembly code!) produced by C compilers, as well as S-expressions generated by Lisp. To prove that this really worked, I ended up using first-order predicate logic (a mathematical logic notation) and various tools to help automate its use. My mathematical models ended up having to account for stuff like different text encoding systems, because I wanted the models to accurately model the real world enough to really counter the attack. Some dissertations go deeply into the technical details of machine code, while others fly into the abstractions of mathematical proof; far fewer do both. Frankly, I think that unusual combination makes the result more interesting; I hope you do too.

A lot of people were sure that what I’m doing could not be done, so I did everything I could to prove it correct. I don’t just provide a mathematical proof; I provide a formal proof, where absolutely every step is spelled out (most proofs in math books “skip the details” but I do not). I presented the proof in Hilbert (3-column) style, giving justifications for absolutely every step. I directly used the output of a prover tool; I could have massaged it for clarity, but by using the output directly, I avoid the charge that I made a mistake in its transformation, and even more importantly I could use a separate tool (ivy) to double-check the proof. A lot of people do not have a background in this area of mathematics, so I give references to where the various steps come from, and I explain in detail each of the starting mathematical statements.

Real-world application of DDC

This section discusses the application of DDC in the real world. I know of at least one real-world application of DDC, specifically for the GNU Mes C compiler. So let's discuss that!

DDC Application: GNU Mes C Compiler boostrap

GNU Mes is "a Scheme interpreter and C compiler for bootstrapping the GNU System. Since version 0.22 it has again helped to halve the size of opaque, uninspectable binary seeds that are currently being used in the Reduced Binary Seed bootstrap of GNU Guix. The final goal is to help create a full source bootstrap as part of the bootstrappable builds effort for UNIX-like operating systems."

Rephrased differently, the idea is that GNU Mes, if trustworthy, can be used as a starting point to rebuild everything else from source code, giving you much greater confidence that the source code is an accurate representation. GNU Mes can in turn be used to bootstrap an entire distribution, and in particular is used to bootstrap GNU Guix (a complete operating system). But how can you be sure that the executable of GNU Mes is trustworthy? That seems like a chicken-and-egg kind of problem. One way to gain confidence is DDC.

The post Reproducible bootstrap of Mes C compiler describes their application of DDC to gain confidence in the Mes C compiler's binary. They used three different distributions (GNU Guix, Nix, and Debian) with three different major versions of GCC to recompile GNU Mes. They later used the tcc compiler as well (though details about that are sketchy). In all cases they recreated a bit-for-bit identical result of the GNU Mes C compiler!

As with all real-world uses there are limitations. DDC must be executed using only trusted processes and programs, and "trusted" in this case means that there is "justified confidence that it does not have triggers and payloads that would affect the results of DDC" (DDC dissertation section 6).

The application described here shows that several different distributions with different executables produce the same underlying result. However, three of these applications are using the same compiler, specifically GCC (albeit different versions). These tests use similar and highly related distributions; they even use many of the same underlying components like glibc, the Linux kernel, and so on (though again, with different versions).

So while this does use DDC, and it does increase confidence, it increases confidence so only to a limited extent because the checking systems are relatively similar. They hope to attempt to use an even more diverse set of compilers in the future, which would give even greater confidence.

This is an awesome application of DDC, and I believe it's the first publicly acknowledged use of DDC on a binary (beyond what I did). Basically, GNU Mes is designed to provide a safe starting point, and DDC can be used to verify that GNU Mes is providing a safe starting point. This application of DDC can be improved over time, and that is wonderful.

Some related material

Many people have worked in related areas, in particular, to implement reproducible (deterministic) builds (which enable exact recreation of executables given source code) or proving that programs do what they say they do. I mention some of the issues, and counter-measures, in "Countering Development Environment Attacks" at the 2015 RSA Conference in San Francisco (this was a presentation by Dan Reddy and me).

Here are some pointers.

Toolchain attacks: Real and claimed

It's important to protect the development environment, including its development toolchain. Ken Thompson demonstrated an attack on the toolchain in the 1980s, and it was a full-blown "trusting trust" attack. My dissertation also discussed an attack on the Delphi compiler.

Ken Thompson Really Did Launch His "Trusting Trust" Trojan Attack in Real Life provides a copy of the full Usenet message discussing the subversion demonstrated by Ken Thompson. More sources available via mail-archive.com and Google Groups. It's important to note that "the compiler was never released) outside", but that was because Ken Thompson was ethical, not because it could not be done.

Perhaps more importantly, today the source code for Ken Thompson's attack is now available and annotated. "Running the 'Reflections on Trusting Trust' Compiler" by Russ Cox (2023-10-25) shares that source code, along with a detailed discussion of it. This version is the corrected version that didn't keep generating slightly longer versions of the generated compiler. You can experiment with this attack on a web-based simulator.

I also want to correct the record on a minor point about Thompson's work. At one time I thought the subverted compiler wasn't detected. I no evidence it was detected as being an attack, but due to a bug in how Ken Thompson implemented his trusting trust implementation, it did eventually fail. The post by Ken Thompson on 2021-09-20 titled "[TUHS] Thompson trojan put into practice" noted that every time it was recompiled it grew by another byte. They noted this oddity, and after that "they played with it until they broke the quine part" (how it worked). Ken noted that "the extra byte was my bug" and said "I'm not sure ... if they ever realized what was going on." This was discussed in Hacker News. It would be unwise to assume that future attacks would be buggy, especially since it would be easy to detect this kind of bug (make sure the subverted compiler generates the same bytes each time). The now-posted version of the attack doesn't include this bug.

"The Octopus Scanner Malware: Attacking the open source supply chain" by Alvaro Muñoz (2020-05-28) discusses the Octopus Scanner malware, "designed to enumerate and backdoor NetBeans projects, and which uses the build process and its resulting artifacts to spread itself."

Operation ShadowHammer was disclosed in March 2019 by Kaspersky Lab, and one thing it did was sabotage developer tools. As noted in "Operation ShadowHammer Exploited Weaknesses in the Software Pipeline" by Fahmida Rashid, IEEE Spectrum, 1 May 2019, "Attackers also targeted at least three gaming companies... the attackers made a one-line change to their targets’ integrated development environment (IDE), a software program that developers use to write code. The effect was that whenever Microsoft Visual Studio compiled code with a specific Microsoft-owned library, the IDE used a similarly-named library file instead. Compilers and development platforms are at the core of the software supply chain... One infected compiler on a few developers’ machines can result in thousands of Trojanized software applications installed on millions of end-user computers. “It’s a poisonous seed. Plant your poisonous seed in a safe place, and it will turn into the poisonous tree with fruit.” ... Since the compiler pulls in relevant pieces of code from linked libraries and other components, using the tampered library meant code the developer did not intend to include was added to the application. A source code review won’t find the issue because the problem isn’t anywhere in the original code and the developer doesn’t know about the alternate library. “When your compiler lies to you, your product always contains a backdoor, no matter what the source code is,” Kamluk said."

In 2015 it was revealed that over 4,000 Apple iOS applications were subverted and got into the Apple app store through an attack called XCodeGhost. This attack convinced developers to use a subverted version of Apple's XCode development environment. Many popular applications were infected via XCodeGhost, including Angry Birds 2 and WeChat. FireEye estimated that XcodeGhost added malicious code to over 4000 apps. CNBC reported that Apple was "cleaning up its iOS App Store to remove malicious iPhone and iPad programs identified in the first large-scale attack on the popular mobile software outlet. The company disclosed the effort after several cyber security firms reported finding a malicious program dubbed XcodeGhost that was embedded in hundreds of legitimate apps... The hackers embedded the malicious code in these apps by convincing developers of legitimate software to use a tainted, counterfeit version of Apple's software for creating iOS and Mac apps, which is known as Xcode, Apple said." Reuters carried a similar report. Wikipedia has an article on XcodeGhost Manish Goregaokar's "Reflections on Rusting Trust" demonstrates an implementation of the "trusting trust" attack in the Rust programming language. This isn't an attack, it's a demo of the attack, but it's a nice demo. There's a Hacker News and Reddit discussion of it.

"How a VC-funded company is undermining the open-source community" (The Outline, 2017) makes a number of claims about actions of Kite, a venture capital-funded startup. The article states that that Kite has been modifying developer tools for Kite's benefit. It's not the same as the trusting trust attack at all, but it does suggest that tools for developers are a potential target.

Safe builds

It's important to build software in a safe way.

Security challenges for the Qubes build process has an interesting discussion. They state their goals as:

  1. "We want to build (and distribute) non-backdoored software.
  2. We don’t want the build process itself to be able to compromise the developer’s machine."

To do this, they focus on these tasks:

  1. "To perform verification of all the input sources, git repo commits, and other components (such as the stock RPMs and DEBs we also use), i.e. that they have proper digital signatures created by the select keys that we chose to trust."
  2. "Provide strong sandboxes for building the less trusted parts of the Qubes OS, such as the various templates, so that even if the (properly signed) sources or other components turn out to be malicious*, the rest of the generated system, such as the Xen hypervisor and dom0, are not affected (nor is the developer’s machine)."

They include this important footnote: "* Of course, one should understand that the mere fact that packages or sources are properly signed, even with key(s) we have decided to trust, doesn’t guarantee that the code has not been backdoored. This could happen if one of the developers turned out to be malicious or was somehow coerced to introduce a backdoor, e.g. via some kind of a warrant or blackmail, or if their laptop were somehow compromised. We would like to defend against such potential situations."

Reproducible (deterministic) builds

Creating reproducible builds (aka deterministic builds) is an excellent way to detect many development-time attacks, and is a precondition for applying DDC. The reproducible-builds.org web site has some great information on the topic. The video Reproducible builds: Two years in the trenches (2017) summarizes their work. They developed a tool called diffoscope that I wish I'd had! See below for more about the related topic semantically reproducible builds.

Here are a few pointers you may find useful.

The Tails Operating System image is reproducible. Verifying a Tails image for reproducibility describes the process to independently reproduce and verify an image.

"Preventing Supply Chain Attacks like SolarWinds" (Linux Foundation blog post) by David A. Wheeler (me!) discusses how to counter subverted build processes, like that in SolarWinds, by using verified reproducible builds. Basically, use reproducible builds to independently verify that your build result is correct. For a detailed technical discussion on how SolarWinds Orion was subverted, including a discussion of SUNSPOT (the malware that inserted the backdoor) and SUNBURST (the backdoor itself), see "SUNSPOT: An Implant in the Build Process" by the CrowdStrike Intelligence Team (2020-01-11).

Telegram supports reproducible builds so that others can verify that its open source code is the same as the code available in the Apple App Store and Google Play. As of early 2021 it's considered "somewhat experimental". Telegram notes that Reproducible Builds are especially hard for iOS due to Apple's current policies and MacOS limitations. As they say: (1) "Apple insists on using FairPlay encryption to “protect” even free apps from “app pirates” which makes obtaining the executable code of apps impossible without a jailbroken device." and (2) "macOS doesn't support containers like Docker." It's still possible, but challenging.

"Source and Executable" by Mike Lai (Microsoft) discussed reproducible builds and was given at the NIST/DHS Software and Supply Chain Assurance (SSCA) Forum in late 2019.

I've been told that the the Russian “Non-Documented Functionality” (NDF) certification regime may have specifically required demonstrating a reproducible build, and that at least at one time Microsoft did meet these NDF requirements. However, I have not been able to confirm this. One problem is that I don't speak Russian. Maybe someone can look at sites such as this Echelon site to confirm this?

"Improving Trust in Software through Diverse Double-Compiling and Reproducible Builds" by Yrjan Skrimstad (University of Oslo, 2018) is perhaps the closest paper to my work (since it builds on it). In my work I noted that it was possible to use more than one "trusted" compiler (that is, more than 2 grandparent compilers) to increase the difficulty for the attacker. This work by Yrjan Skrimstad expands that further, discussing implications when there are more than two such compilers. It demonstrates the trusting trust attack (using a quine) in the Go programming language, and then and demonstrates how to use DDC (with 3 grandparents) to detect the attack. It also discusses the relationship between DDC and reproducible builds. This work by Yrjan Skrimstad is demonstrated in the GitHub repo yrjan/untrustworthy_go.

The Tor project is very concerned about reproducible (deterministic) builds:

The Yocto project not only has reproducible builds, but its CI/CD pipeline verifies that every build is reproducible.

“Is that really the source code for this software?” by Jos van den Oever (2013-06-19) posts about the problems of trying to recreate executables from source code. Sometimes it’s not so bad, e.g., for Debian, “The binary package that was built from a Debian source package was not identical to the published binary package, but the differences are limited to timestamps and the build id in the executables.” But sometimes it’s very difficult, just as I had found years earlier, because you often need a lot more build information than you can get. You need much more than the source code and build script; you need to know the exact versions of all relevant build software, its configuration, and so on. But it is possible to record all that information, so that the process can be repeated... and you can repeat the process to make sure that you got it all. If you record that information, then you have the problem of “how do I know that my build tools are not malicious?” At that point, DDC comes to the rescue... because DDC can help you verify that.

"On business adoption and use of reproducible builds for open and closed source software" by Simon Butler, Jonas Gamalielsson, Björn Lundell, Christoffer Brax, Anders Mattsson, Tomas Gustavsson, Jonas Feist, Bengt Kvarnström & Erik Lönroth , 2022-11-29, Software Quality Journal discusses reproducible builds from a business point of view. "Through interviews with software practitioners and business managers, this study explores the utility of applying R-Bs in businesses in the primary and secondary software sectors and the business and technical reasons supporting their adoption. We find businesses use R-Bs in the safety-critical and security domains, and R-Bs are valuable for traceability and support collaborative software development. We also found that R-Bs are valued as engineering processes and are seen as a badge of software quality, but without a tangible value proposition. There are good engineering reasons to use R-Bs in industrial software development, and the principle of establishing correspondence between source code and binary offers opportunities for the development of further applications." This is an open access paper, so anyone can read it.

The Debian ReproducibleBuilds project has the goal of making it possible to reproduce, byte for byte, every build of every package in Debian. They have made a lot of progress, and I am really delighted to see it. Their Overview of known issues related to reproducible builds shows what commonly causes problems; these include embedded generated timestamps from various causes (this is a big one) and random/unsorted ordering. For example, SOURCE_DATE_EPOCH specification provides a simple mechanism to turn complicated timestamp issues into something simple. Also, the sources.debian.net site provides convenient browsing access to the Debian source code. How Debian Is Trying to Shut Down the CIA and Make Software Trustworthy Again also discusses this.

Reproducible Builds for Fedora is a similar project to deterministically reproduce the packages of Fedora.

F-Droid and The Guardian Project are working on reproducible builds for Android. For more information, see LWN.net, info on the first reproducible build by Guardian (a developers' tool), their success with the utility app Checkey.

How I compiled TrueCrypt 7.1a for Win32 and matched the official binaries describes a deterministic build (with explanable differences) was achieved for TrueCrypt. This is an encryption software capable of on-the-fly encryption on file-, partition- or disk-based virtual disks, yet its authors are anonymous, leading some to worry that the executables were backdoored. Note: Though its source code is visible, it does not use a standard OSS license and it imposes restrictions that probably mean is it is not OSS; it is not considered FLOSS by many major Linux distributions including Debian, Ubuntu, Fedora, openSUSE, and Gentoo. More recently, the TrueCrypt developers have stopped development, and its lack of a real OSS license may inhibit anyone else supporting it.

Gitian is a “secure source-control oriented software distribution method [so] you can download trusted binaries that are verified by multiple builders.”

Vagrant is designed to "create and configure lightweight, reproducible, and portable development environments". Seth Vargo discusses it briefly.

The paper "Reproducible Containers" by Omar S. Navarro Leija et al will be presented in March 2020 at the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2020 (this is a conference of the Association for Computing Machinery (ACM)). This paper "describes DetTrace, a reproducible container abstraction for Linux implemented in user space." This looks really promising. The implementation is OSS (MIT license).

Buildroot is a simple mechanism for creating embedded Linux systems through cross-compilation.

Byzantine Askemos Language Layer (BALL) is an implementation of the Askemos Distributed Virtual Machine. It creates an “autonomous virtual execution environment for applications” which unlike traditional cloud environments is specifically designed to provide fault tolerance and to be tamper-proof. It executes the code on several different machines, runtime libraries, compilers, operating systems and so on in parallel and compares cryptographic signatures. Thus, this tries to counter subversion of various lower-level components.

Christophe Rhodes has discussed the problems of reproducing builds of Steel Bank Common Lisp (SBCL) on different systems in Still working on reproducible builds and Reproducible builds - a month ahead of schedule. While his notes are specific to SBCL, they illustrate more general issues. He notes that one of the reasons that SBCL separated from its parent CMCL was to "make the result of its build be independent of the compiler used to build it." His goal was not primarily to counter attack, but to eliminate hard-to-find bugs: "... how do we know there aren't odd differences that depend on the host compiler lurking, which will not obviously affect normal operation but will cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments). I’ve been working intermittently on dealing with this, by attempting to make the Common Lisp code that SBCL!Compiler is written in sufficiently portable that executing it on different implementations generates bitwise-identical output. Because then, and only then, can we be confident that we are not depending in some unforseen way on a particular implementation-specific detail...". Here are some of the issues that he (and perhaps other the SBCL developers) found and fixed, as an example of what to look for:

  1. The Common Lisp specification permits implementations to compute (write-to-string '(quote foo) :pretty nil) as either "(QUOTE FOO)" or "'FOO". To create a reproducible build they had to use a workaround (involving function name counters) so that the difference would not matter.
  2. A related problem happens with backquote: the Common Lisp specification allows implementations to determine if values coalesce, but this can produce different results; the solution is to change the code so that its results are guaranteed.
  3. The various set-related functions (e.g., set-difference, uniion, intersection, etc.) do not return sets in an order, which can result in differing builds. The solution is to sort the result of set operations, to force them to a specific known order.
  4. A call to maphash was used to affect the Lisp image directly. In general, hash tables do not guarantee any particular order when you walk their contents, so you need to force an ordering if you iterate over their contents.
  5. Implementation-defined constants, especially most-positive-fixnum and most-negative-fixnum, but also array-dimension-limit and internal-time-units-per-second.
  6. Some functions like random and sxhash are understably different, which cause access patterns to differ.
  7. The sort routine in Lisp is not specified to be stable, so when trying to make it deterministic with multiple stages, stable-sort should be used instead.
  8. Initial values of arrays are undefined, so don't depend on their value!

The key thing to note is that creating compilers that can easily have reproducible (deterministic) builds on other compilers typically takes work in the real world... but it is very doable.

There are various tools that can help you create reproducible builds. For example, if build paths are embedded, you can force fixed directory values to make them reproducible. There are tools that enable this without requiring root permission, including my tools user-union and auto-destdir, as well as tools like proot.

LF-Edge EVE has worked hard on reproducibility. See EVE image reproducibility and EVE image sources for an interesting example of how to implement higher standards.

Semantically equivalent builds

Reproducible builds are great for showing that a package really was built from some given source, but sometimes they're hard to do. A useful backoff is something called a "semantically equivalent build".

As explained in the documentation for the oss-reproducible tool (part of OSSGadget), "A project build is semantically equivalent if its build results can be either recreated exactly (a bit for bit reproducible build,or if the differences between the release package and a rebuilt package are not expected to produce functional differences in normal cases. For example, the rebuilt package might have different date/time stamps, or one might include files like .gitignore that are not in the other and would not change the execution of a program under normal circumstances."

A semantically equivalent build has very low risk of being a subverted build as long as it's verified to be semantically equivalent. Put another way, verifying that a package has a semantically equivalent build counters the risk where the putative source code isn't malicious, but where someone has tampered with the build or distribution process, resulting in a built package that is malicious. It's quite common for builds to produce different date/time stamps, or to add or remove "extra" files that would have no impact if the original source code was not malicious.

It's much easier (and lower cost) for software developers to create a semantically equivalent build instead of always creating a fully reproducible build. Fully reproducible builds are still a gold standard for verifying that a build has not been tampered with. However, creating fully reproducible builds often require that package creators change their build process, sometimes in substantive ways. In many cases a semantically equivalent build requires no changes, and even if changes are required, there are typically fewer changes required.

OSSGadget includes a tool that can determine if a given package is semantically equivalent. It's still helpful to work to make a package a fully reproducible build. A fully reproducible build is a somewhat stronger claim, and you don't need a complex tool to determine if the package is fully reproducible. Even given that, it's easier to first create a package that's semantically equivalent, and then work on the issues remaining to make it a fully reproducible build.

The intended use case for semantically equivalent builds (instead of fully reproducible builds) really to help people make risk decisions when they're thinking about bringing in external software. I'm primarily trying to deal with the case where the developer has decided to not provide a reproducible build, and I have to estimate the likelihood of it being maliciously built (presumably as a part of decideing whether or not the package is safe to install). I'm primarily thinking of applying this process to mostly-unmanaged repositories like npm, PyPI, and RubyGems, *not* to managed repositories like most Linux distributions' repositories (which have other mechanisms to counter malicious builds). The problem is that I cannot make the developer provide me a reproducible build (I can beg, but that's not the same thing). I'm trying to make good decisions with the information I have, not the information I *want* to have.

The threat model is a little different, too. The assumption isn't that "it is impossible for these differences to cause damage". The assumption is that "the original source code was benign, reasonably coded, and did not do damage". The question is, "is this non-reproducible package likely to have been generated from it, even though it's not a reproducible build?"

Here's an example that might clarify the threat model. It's possible that a program could look for the file ".gitignore" and run it if present. The source code repo might not have a .gitignore file, but the malicious package might add a .gitignore file and fill it with a malicious application. That would cause malicious code to be executed. However, it would also be *highly* suspicious for the source code to run a ".gitignore" file (that's *not* what they are for), so it's reasonable to assume that the source code didn't do that. If an attacker can insert a file that *would* cause malicious code to execute in a reasonably-coded app, then that *would* be a problem. "What's reasonable" is hard to truly write down, but ignoring date/time stamps and ignoring a whitelisted list of specific filenames seems like a reasonable place to start.

Sure, ideally everything would have a reproducible build. Since that day isn't here, what can we do to take piecemeal steps towards that?

In short, making packages at least semantically equivalent (and verifying this) is a great countermeasure against subverted builds.

Bootstrappable builds

Bootstrappable builds focuses on minimizing the amount of bootstrap binaries. They're not just interested in the direct "bootstrap" code to boot a computer, but also what is necessary to generate the direct bootstrap code. The problem bootstrappable builds is trying to address is a real one, namely, they are worried about subverted bootstrap code.

In the ideal sense they want to build up "compilers and interpreters and tools from nothing" - but of course you can't really build up from nothing - there has to be a starting point. DDC provides a great complement to bootstrappable builds - bootstrappable builds tries to limit the binaries you depend on, and DDC can be used to verify the those binaries that you depend on.

One example of this approach is stage0.

Another example is builder-hex0, a kernel "for bootstrapping compilers without having to trust a prebuilt binary". They are now able to bootstrap a POSIX kernel from Hex0 and use it to bootstrap a cross-platform C compiler with it.

Formal methods / proofs

Coq is being used by Xavier Leroy (main developer of OCaml) to write a certified compiler, compcert, that guarantees that semantics of a C source program is kept up to PowerPC assembly. The *specification* (unfortunately not the Coq proofs) of the compiler back-end is available as GPL software.

You might also be interested in the results of the MITRE Vlisp project. Vlisp README says: “The Verified Programming Language Implementation project has developed a formally verified implementation of the Scheme programming language, called Vlisp... An overview of the project is presented in the Vlisp Guide. More accessible PDFs about Vlisp are available too. Another paper that you may find interesting is Jonathan A. Rees. “A Security Kernel Based on the Lambda-Calculus”. PhD. Thesis. February 1995.

Underhanded code / Malicously-misleading code

In my PhD dissertation I note the problem of code that intentionally written to look (to a human) like it does one thing, but actually does another. In my dissertation I called this "maliciously-misleading code"; more recently the term "underhanded code" seems to have become more common. Either way, it's not a good thing. There are contests such as the Obfuscated V Contest and Underhanded C Contest showing that it's very possible, and a 2003 attempted attack on the Linux kernel shows that it is not merely an academic issue.

If you are interested in the topic of underhanded code, I suggest looking at my paper “Initial Analysis of Underhanded Source Code” by David A. Wheeler, IDA Document D-13166, April 2020 (here's a Perma.cc link to the PDF of D-13166).

Building (more) trusted hardware

A related issue is developing (more) trusted hardware. Since it's much harder to determine if hardware is "equivalent" there are many ways an attacker can subvert hardware that aren't a trusting trust attack. It's especially easy to subvert an ASIC mask, and unfortunately it's hard to detect. Those who control foundries can easily insert attacks that are hard to detect. I can't cover this huge field, but I thought I should make a few notes about it.

First, it's well-known that it's possible to insert attacks on hardware that are hard to detect later. E.g., "A2: Analog Malicious Hardware” showed "how a fabrication-time attacker can leverage analog circuits to create a hardware attack that is small (i.e., requires as little as one gate) and stealthy (i.e., requires an unlikely trigger sequence before effecting a chip’s functionality)." It won “best paper” at the 37th IEEE Symposium on Security and Privacy and ComputerWorld had article about it.

There are some countermeasures. E.g., "Detecting Hardware Trojans with Gate-Level Information-Flow Tracking” (IEEE Computer August 2016) uses information flow tracking to discover hardware Trojans As noted in a UCSD article, it works by assigning a label to important data like a cryptographic key. That has some value, but it's not a good general-purpose technique for countering arbitrary Trojans - what if the Trojan isn't trying to violate that specific property?

Obviously, what we'd like to have is computers that we can really trust for important missions, either directly or for use as checks on "normal" computers. One approach I've suggested for decades is possibly using FPGAs to implement some computers. FPGAs can be subverted also, especially to add "kill switches", but at least the FPGA manufacturer has less information about the chip's use.

A really interesting set of work is ""A Trustworthy, Free (Libre), Linux Capable, Self-Hosting 64bit RISC-V Computer" by Gabriel L. Somlo (be sure to see the FOSDEM 2023 presentation "Self-Hosting (Almost) All The Way Down"). Gabriel L. Somlo has built an entire hardware and software system using an FPGA. He managed this by implementing RISC-V, an open instruction set that is supported by today's Linux systems. I'm especially delighted that he points to my DDC work (smile).

As of 2020 Somlo's FPGA-implemented system's speed was about 65MHz. That's obviously much slower than current systems which are measured in GHzr Still, this is a usable speed if you're patient; it's far faster than computers of decades past. I also believe that future versions could be much faster if resources were poured into it. This work used many off-the-shelf components; I expect that those components could be optimized to be much faster. Its speed is also seriously limited by the lack of capacity in current FPGAs (the 2020 implementation lacks hardware floating point and has a limited L1 cache). Future FPGA systems could be developed with more capacity (and more RAM) to improve its performance.


Attackers can exploit compiler bugs by intentionally writing code that triggers the bug in a way that subverts the program. This is yet another way to write maliciously-misleading programs (a general topic I discuss in my dissertation). This attack is not the same same as the "trusting trust" attack that DDC counters, but it is certainly related. The paper "Deniable Backdoors Using Compiler Bugs" by Scott Bauer, Pascal Cuoq, and John Regehr, Pastor Manul Laphroaig’s Export–Controlled Church Newsletter, June 20, 2015, demonstrates how a compiler defect (publicly known or found via techniques such as fuzzing) can be exploited. In their case, they demonstrated how sudo could be exploited via innocent-looking code. John Regehr's blog post about "Defending Against Compiler-Based Backdoors" (2015-06-21) points this out, noting that "the advantages of this kind of backdoor include subtlety, deniability, and target-specificity" - and he then makes some great points. In particular, he notes that compiler developers need to fix known miscompilation bugs as rapidly as possible and use fuzz tools; compilers are security-sensitive in a way that is often not appreciated. Maintainers of open source packages need to be suspicious of baroque patch submissions and consider rewriting patches. Such attacks are much more fragile than the traditional "trusting trust" attack, but they can occur in any program, are potentially very dangerous, and are currently difficult to detect. In the short term, it might be best to focus on detecting and eliminating defects in widely-used compilers. No one will complain about getting rid of compiler defects, we have lots of techniques today that we can use, and if compiler defects become more difficult to trigger these backdoor attempts would typically become more obvious. But that short-term strategy is not enough; I hope that people will develop longer-term strategies too.

“Some Remarks about Random Testing” by B A Wichmann, National Physical Laboratory, Teddington, Middlesex, TW11 0LW, UK, May 1998, discusses creating random tests for compilers.

Kegel’s building and testing gcc/glibc cross toolchains has lots of good information.

GCC explorer interactively shows the assembly output from GCC (given various inputs).

Compiler explorer interactively shows the assembly output from a variety of compilers and languages.

Aquine is a "computer program which takes no input and produces a copy of its own source code as its only output." Compilers written to attack themselves are highly related to quines (depending on your definition, you could consider them a special case of quines). "Quines (self-replicating programs)" by David Madore discusses quines in more detail. "How to write your first Quine program" by David Bertoldi (2019-07-26) also explains quines. Perhaps the most amazing quine mame quine relay, an Ouroborous quine that starts as a Ruby program to generate a Rust program, which generates a Scala program, which generates a Scheme program, and so on through 128 languages.

The RepRap Project is developing inexpensive 3D printer designs that will hopefully (eventually) be able to create themselves. Very interesting, and in the future, possibly quite relevant.

The Open proofs web site encourages the development of “open proofs”, where the implementation, proofs, and required tools are all open source software.

Mark Mitchell’s “Using C++ in GCC is OK” (Sun, 30 May 2010 17:26:16 -0700) officially reported that “the GCC Steering Committee and the FSF have approved the use of C++ in GCC itself. Of course, there’s no reason for us to use C++ features just because we can. The goal is a better compiler for users, not a C++ code base for its own sake.” Mark Mitchell later explains that he expects that GCC will use C++ cautiously. For DDC, this means that applying DDC to the GCC code base will require a C++ compiler (at least one that supports the parts that GCC uses), not just a C compiler. I used Intel’s icc, which was a C++ compiler anyway, so that would not have especially affected my example... and it certainly does not change the validity of the approach.

This paper has a number of connections back to the halting problem. Proof That Computers Can't Do Everything (The Halting Problem) is a delightful video that shows the traditional proof of the halting problem, but in a clever way. You might also want to see Beyond Computation: The P vs NP Problem - Michael Sipser.

Build tools like make are important for any large system. Improving make describes my efforts to improve the POSIX standard for make as well as make implementations, in particular to support the insights in Peter Miller’s Recursive Make Considered Harmful.

The Juniper backdoor is interesting - it appears that a crypto backdoor was itself backdoor'ed. There are interesting comments by Matthew Green and rpw.

"Hacking DNA: The Story of CRISPR, Ken Thompson, and the Gene Drive" by Geoff Ralston (April 3, 2017) discusses CRISPR, and in passing discusses Ken Thompson's work.

"Cory Doctorow: Demon-Haunted World" in Locus Magazine (2017-09-02) notes that "none of the world’s problems are ours to solve ... so long as the computers we rely on are sneaking around behind our backs, treating us as their enemies."

Compiler fuzzing, part 1 discusses some results from fuzzing compilers.

Breaking the Solidity Compiler with a Fuzzer discusses how they enhanced the American Fuzzy Lop (AFL) fuzzer to fuzz C-like languages and find defects in the Solidity compiler.

The Science Fiction story Coding Machines by Lawrence Kesteloot (January 2009) is based on these kinds of attacks.


The Open Trusted Technology Provider Standard (O-TTPS), Version 1.1: Mitigating Maliciously Tainted and Counterfeit Products from the Open Group may be of interest to you. Per its website description, "The O-TTPS is an open standard containing a set of organizational guidelines, requirements, and recommendations for integrators, providers, and component suppliers to enhance the security of the global supply chain and the integrity of commercial off-the-shelf (COTS) information and communication technology (ICT). This standard if properly adhered to will help assure against maliciously tainted and counterfeit products throughout the COTS ICT product life cycle encompassing the following phases: design, sourcing, build, fulfillment, distribution, sustainment, and disposal. The Open Group Trusted Technology Forum (OTTF) is a global initiative that invites industry, government, and other interested participants to work together to evolve this document and other OTTF deliverables."

TUF (The Update Framework) helps developers secure their new or existing software update systems. Between the system-level package managers, programming language specific package managers/repositories, and application-specific update systems, there are a lot of software updaters around.. and they all need to be secure.

How to get key previous papers

This paper cites key previous papers, here is how to get those (I provide multiple links to increase the likelihood you can get them):

Hints on using OpenOffice.org/LibreOffice and OpenDocument

I used OpenOffice.org to write the dissertation, and it worked out very nicely. OpenOffice.org was a great program for writing larger documents. LibreOffice has replaced OpenOffice.org as its successor; LibreOffice started from the same code and has a far more active community. So definitely check out The Document Foundation’s LibreOffice Productivity Suite.

I developed an OpenDocument template for George Mason University (GMU) that did nearly all the formatting for me automatically. That made it easy to concentrate on the text instead of the formatting.

The most important rule for writing large documents using LibreOffice or any other word processor is to automate everything you can, and in particular, always use styles. Never set the font size, font type, etc., for a range of characters or a paragraph (one exception: using italics/bold to emphasize a word or phrase is okay). Instead, all formatting information should be attached to a paragraph style, and then make sure that each paragraph has the right paragraph style. Use “Text body” (not “Default”) for normal text, and the various “Heading 1”, “Heading 2”, and so on for headings. Similarly, use Insert > Cross-Reference to refer to other parts of the document; that way, the program can renumber things correctly.

OpenOffice.org gives you lots of control over how words break (or not) on a line; for more, see “Easy way to insert nonbreaking hyphen, etc. in OpenOffice.org Writer” (by Solveig Haugland). Basically, to get more control over hyphenation, go to Tools > Options > Language Settings > Languages and select the “Enabled for Complex Text Layout” option. Now you can use “Insert>Formatting Mark” menu to insert more control over formatting. The “no width no break” character, aka the “glue” character”, “glues” the characters it’s between to prevent line breaks there which would otherwise there. Similarly, the “no width optional break” character, when inserted, tells OpenOffice.org that it’s okay to insert a line break there where normally it would not do so. You can also insert non-breaking spaces, non-breaking hyphens, and optional hyphens.

In most cases, the paragraph styles should make paragraphs break across pages in the right way (e.g., the paragraph styles should have reasonable default “widow” and “orphan” settings, and header paragraph styles should have “keep with next paragraph” set). But in some cases the paragraphs won’t break across pages well because the program doesn’t “understand” text. For example, if you have text that leads into the next paragraph, you may need to right-click on that paragraph and set “keep with next paragraph”. In special cases you may want a different widow or orphan setting, too.

OpenOffice.org supports formulas, which I use quite a bit. Its “stack” and “matrix” options are sometimes very useful for multi-line formulas, for example. For in-line formulas, I recommend making formula borders 0. You can do this while editing formulas by selecting Format>Spacing, category Borders, and then making the all borders 0 (indeed, I suggest making this the default). Otherwise, there’s embedded extra space in formulas that looks odd when you try to combine formulas with punctuation.

For the final version, I used Tools > Update All (to update the table of contents, cross-references, etc.), moved to the beginning and saved, and then ran File > Export as PDF.


After doing endless numbers of tedious compiles, Xkcd’s cartoon about compiling made me smile. The big picture solution to the halting problem is also relevant :-).

Dilbert has mentioned long compiling times too: Dilbert 2013-06-22 Dilbert 2005-09-23 Dilbert 1998-06-21. Dilbert once noticed that “maybe there’s a bug in the compiler program itself!”. Dilbert also makes it clear why software single source strategies are a bad idea.

I gave a brief example of readable Lisp s-expressions; the readable Lisp s-expressions project has specifications and implementations for curly-infix-expresssions, neoteric-expressions, and sweet-expressions, which can make Lisp notation a lot easier to read.

Mortality.pvs is a short demo of how to express the “All men are mortal” example using PVS.

Here’s how to install gcc on SGI IRIX.

ERESI (ERESI Reverse Engineering Software Interface) is a “unified multi-architecture binary analysis framework targeting operating systems based on the Executable & Linking Format (ELF).”. developerworks has a nice article on ELF. Elfkickers was written by Brian Raiter, who also wrote A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux and Albert Einstein’s Theory of Relativity: In Words of Four Letters or Less. This old article explains ELF’s advantages.

I have tried to make sure that this paper will stick around into the future. Here’s the GMU page for my dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling”, as well as the arXiv.org copy of “Fully Countering Trusting Trust through Diverse Double-Compiling” and the UMI ProQuest copy of my PhD dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling” (via ProQuest search). Archive.org also has a copy. These are just additional copies, with the same information. The PDF file, as I submitted it, has these properties:
TitleFully Countering Trusting Trust through Diverse Double-Compiling
AuthorDavid A. Wheeler
DateFall Semester 2009 (actually 2009-11-30)
Length1,971,698 bytes
MD5 hash5320ff082ec060e7f58409b3877cb687
SHA-1 hash20c8b702dd4b7c6586f2 59eb98f577dbadd359dd
SHA-256 hash024bccc5254eaffe9466f12afe39f72b 154f63a6919f4e1add5d0513092b2052
SHA-512 hash 0004998431af5da486a87794969a5314 07cb607ffc411c966a23343a58636c20 72ceb85835ffe6eef727696ffc41b1dd d6d9e0fd090cbc85a33041c25acd2e55


An aside: At ACSAC 2005, Aleks Kissinger (from the University of Tulsa) also presented work that he and I had done on micro-tainting. Since that seems to have disappeared from the web, I thought I should briefly describe it here.

Aleks’ presentation was titled “Fine-Grained Taint Analysis using Regular Expressions,” which was part of the ACSA Works in Progress Basically, we noted that instead of assigning “taint” to a whole value, such as a string, you could assign taint on subcomponents, such as each character. Then you could assign rules that identified the input paths and what could come in -- typically zero or more tainted characters -- and rules on output paths. We concentrated on defining regular expressions for what is legal, though any other expression for patterns such as BNFs would be fine too. We noted that you could then check statically or dynamically. For the static case, when you work backwards, if the check “fails” you can even trivially derive the input patterns that cause security failures (and from that information it should be easy to figure out how to fix it). Aleks has recently made some good progress by transforming the regular expressions into DFAs. There was another ACSAC presentation on doing taint analysis with Java, but this was the traditional “whole variable” approach that is used in many languages, but through which many vulnerabilities slip by. We hope this micro-tainting approach will lead to improved tools for detecting security vulnerabilities in software, before that software is delivered to end-users.

There is related work that we know about that has been going on in the University of Virginia (UVA), though we only found out about it halfway through our work (via Usenix). More information about the UVA work is in “Automatically Hardening Web Applications Using Precise Tainting” by Anh Nguyen-Tuong, Salvatore Guarnieri, Doug Greene, Jeff Shirley, and David Evans. They focus on PHP, and only on the dynamic case; we were interested in both, but especially interested in the static case (where you can show that certain vulnerabilities never occur and thus don’t need any run-time overhead to deal with them).

Other related work includes the BRICS Java String Analyzer (GPL; uses the BSD-licensed dk.brics.automaton). Hampi might be able to implement this statically, which would be fantastic.

There is a long history of work on data flow, static typing, and security (such as work by Dennis Volpano et al). That’s good work, but not really focused on what we were looking at. Those works tend to view variables as a whole, while instead we’re tracking much smaller units of data. We’re also tracking sequences (like arrays) which contain data with different levels of security; most such works handled arrays like a single unit (a simplification that is fundamentally at odds with our approach).

You can also view my formal education timeline, my book on writing secure programs, FlawFinder, or my home page.