This paper discusses some relationships between high assurance software (for security or safety) and free-libre / open source software (FLOSS). In particular, it shows that many tools for developing high assurance software have FLOSS licenses, by identifying FLOSS tools for software configuration management, testing, formal methods, analysis implementation, and code generation. It particularly focuses on formal methods, since formal methods are rarely encountered outside of high assurance. However, while high assurance components are rare, FLOSS high assurance components are even rarer. This is in contrast with medium assurance, where there are a vast number of FLOSS tools and FLOSS components, and the security record of FLOSS components is quite impressive. The paper then examines why this is the circumstance. The most likely reason for this appears to be that decision-makers for high assurance components are not even considering the possibility of FLOSS-based approaches. The paper concludes that in the future, those who need high assurance components should consider FLOSS-based approaches as a possible strategy. The paper suggests that government-funded software development in academia normally be released under a GPL-compatible FLOSS license (not necessarily the GPL), to enable others to build on what tax dollars have paid for, and to prevent the vast waste of effort caused by current processes. Finally, developers who want to start new FLOSS projects should consider developing new high-assurance components or tools; given the increasing attacks and dependence on computer systems, having more high assurance programs available will be vital to everyone’s future.
This paper discusses some relationships between high assurance software (for security or safety) and free-libre / open source software (FLOSS). First, let’s define these key terms.
Free-libre / open source software (FLOSS) is software whose license gives users the freedom to run the program for any purpose, to study and modify the program, and to redistribute copies of either the original or modified program (without having to pay royalties to previous developers). It’s also called libre software, Free Software, Free-libre software, Free-libre / Open source (FLOS) software, or open source software / Free Software (OSS/FS). The term “Free Software” can be confusing, because there may be a fee for the Free Software; the term “Free” is derived from “freedom” (libre), not from “no price” (gratis). More formal definitions of open source software and free software (in the sense of libre software) are available on the web. Examples include the Linux kernel, the gcc compilation suite, the Apache web server, and the Firefox web browser. Many FLOSS programs are commercial, while many others are not.
For purposes of this paper, let’s define “high assurance software” as software where there’s an argument that could convince skeptical parties that the software will always perform or never perform certain key functions without fail. That means you have to show convincing evidence that there are absolutely no software defects that would interfere with the software’s key functions. Almost all software built today is not high assurance; developing high assurance software is currently a specialist’s field (though I think all software developers should know a little about it). To develop high assurance software you must apply many development techniques much more rigorously, such as configuration management and testing. You need to use implementation tools you can trust your life to. And in practice, I believe that you need to use mathematical techniques called “formal methods” for a product to be high assurance, for the simple reason that it’s usually hard to create truly convincing arguments otherwise. A significant fraction of this paper covers formal methods, since they are rarely encountered outside of high assurance. There isn’t a single universal definition of the term high assurance, and products have been labelled “high assurance” without having any formal methods applied to them. But this definition should be sufficient for my purpose. Other terms used for this kind of software are “high integrity” and “high confidence” software.
Usually high assurance software is developed because of serious safety or security concerns. Strictly speaking, software by itself has no safety or security properties -- it can only be safe or secure in the context of a larger system. A nice discussion of this issue from the safety point of view is in Nancy Leveson’s book Safeware (see section 8.3). But software tends to control safety and security systems, and such software is often called “safe software” or “secure software”. For this paper I’ll talk about the security or safety of the software, with the understanding that this only makes sense if you understand the system that the software will be part of.
For purposes of this paper, the identity of the software’s supplier is not part of the definition of high assurance. By supplier, I mean the provenance (origin) and pedigree (lineage -- who the software passed through) of the software. By keeping the supplier identity out of the definition of high assurance, I can concentrate on technological issues. In reality, there may be some people who you wouldn’t trust even if they’d “proved” their code correct... so in practice it’s quite reasonable to ask questions like, “Who developed or modified the software? Can I trust them?” For both FLOSS and proprietary software, provenance and pedigree can be considered in exactly the same way -- in both cases, you’d consider who originally developed the software (in terms of each change), and who controlled the software from development through deployment to you. In particular, you’d consider who has rights to modify the software repository, and whether or not you trusted them. You might also consider how well the development environment itself is protected from attack. Don’t be fooled into thinking that FLOSS is “riskier” than proprietary software because it can be legally modified by anyone. Anyone can modify a proprietary program with a hex editor, too -- but that doesn’t mean you’ll use that modified version. The issue with suppliers is who controls your supply chain, and FLOSS often has an advantage in provenance and pedigree (because it is often easier with FLOSS to determine exactly who did what, and who has modification rights). But provenance and pedigree issues have to be handled on a case-by-case basis, and trying to cover those issues as well would over-complicate this paper. For example, the whole issue of “who trusts who” varies depending on the organizations and the circumstances. In an ideal world this wouldn’t matter, because the proofs would be true and could be rechecked everywhere. Given the massive move to globalization, I think that would be worth trying to make who created the software irrelevant. In any case, let’s concentrate on the technical aspects in this paper.
More generally, assurance is simply the amount of confidence we have that the software will do and not do the things it should and should not. Sometimes the things it should not do, what I call "negative requirements" , are the most important. Any particular piece of software can be considered by someone to be low, medium, or high assurance. This is obviously a qualitative difference; two products could be in the same assurance category, yet one be more secure than another.
For purposes of this paper, let’s define medium assurance to be software which doesn’t reach high assurance levels, but where there has been significant effort expended to find and remove important flaws through review, testing, and so on. Note that when creating medium assurance software, there’s no significant effort to prove that there are no flaws in it, merely an effort to find and fix the flaws.
Medium assurance software must undergo testing and/or peer review to reduce the number of flaws. Such mechanisms can be really valuable in reducing flaws, and eliminate a great many of them, but the normal method of using these mechanisms won’t guarantee their absence. You can eliminate some types of flaws completely by some activities, e.g., you can completely eliminate buffer overflows by choosing almost any programming language other than C or C++... but doing so would not eliminate all flaws! In particular, testing by itself is impractical to prove anything about real software. After all, exhaustively testing a program that just adds three numbers would take 2.5 billion years (assuming each number was 32 bits, you could run the program a billion times per second, and you used 1,000 computers for testing). Real programs are much more complicated than this, which is why testing by itself can’t reach the highest levels of assurance.
The differences between medium and high assurance (as I mean the terms in this paper) seem to confuse people, so let me contrast them directly. When developing a high assurance program, the program is presumed to be wrong (guilty) until a preponderance of evidence proves that it’s correct. When a medium assurance program is being developed, it is spot-checked in various ways throughout its development to try to detect and remove some of its worst defects. Medium assurance software development normally leave some defects in the program afterward. Few like the presence of latent defects, but few people are willing to pay for (or invest the time) for high assurance development techniques today in most software.
It’s reasonable to think that as technology improves, high assurance programs will become more common. But even today there are some situations where medium assurance is not enough. Typically this is where people’s lives, or the security of a nation, is at stake. In such cases, some of today’s customers need serious evidence that there are no critical defects of any kind. They need something different: High assurance.
Ideally, all software would be high assurance, but ideally we’d all live in mansions. It’s very difficult to create truly high assurance software. The configuration management and testing requirements are usually more severe (and time-consuming) than those for other kinds of software. Applying formal methods requires significant mathematical training that most software developers don’t have, and can be very time-consuming.
Because of these challenges, high assurance software is usually only developed for critical security or safety components. When creating critical security or safety components, a number of regulations are often imposed.
High assurance software for security is the point of the Common Criteria for IT Security Evaluation (ISO standard 15408) when you select EAL 6 or higher -- and EAL 6 is really a compromise! For purposes of this paper, medium assurance software is in the EAL 4 to 5 range of the Common Criteria, so Red Hat Linux and Microsoft Windows would both be considered medium assurance products. I consider EAL 2 (or less) to be low assurance; EAL 3 is a compromise, but it’s basically low assurance.
Here are some other standards that are often mentioned in the security or safety world, which often impact this kind of development:
The rest of this paper looks at FLOSS tools that can be used to create high assurance components (there are many), and FLOSS components that are high assurance themselves (they are rare). It then contrasts this situation with medium assurance -- there are many medium assurance FLOSS tools, and FLOSS components with impressive results. The paper then speculates why this is the circumstance, and then concludes.
It turns out that there are a lot of FLOSS tools that can be used to help develop high assurance software. To prove that, I’ve identified a few important tool categories, and for each category I identify several FLOSS tools. The tool categories I discuss below are configuration management tools, testing tools, formal methods (specification and proof) tools, analysis implementation tools, and code generation tools.
There are many other categories of tools, and many other specific FLOSS tools, that are not listed below. But the discussion below should prove my point that there are many FLOSS tools that can be used to help develop high assurance components.
There are many FLOSS software configuration management (SCM) tools, indeed, I’ve written a review of several FLOSS SCM tools. (I’ve also written a paper discussing SCM and security.)
CVS is an old and still very widely-used SCM tool. I suspect that most software worldwide, both proprietary and FLOSS, is still managed by CVS as of 2006. Subversion (SVN) is the SCM tool rewritten as a replacement for CVS, and it’s widely used, too. But the list of FLOSS SCM tools is amazingly long, including GNU Arch, git/Cogito, Bazaar, Bazaar-NG, Monotone, mercurial, and darcs (see my paper for a longer list). Clearly, there’s no problem finding a FLOSS SCM tool.
All developers test their software, but high assurance software requires much more testing to gain confidence in it. But again, there’s a massive number of FLOSS tools that support testing. In fact, there are so many FLOSS tools for testing that there’s a website (opensourcetesting.org) dedicated to tracking them; as of April 2006 they list 275 tools! This ranges from bug-tracking tools like Bugzilla, to frameworks for test scripts like DejaGnu.
Many high assurance projects are required to meet specific measurable requirements on their tests. One common measure of testing is “statement coverage” (aka “line coverage”), the percentage of program statements that are exercised by at least one test. One problem with the statement coverage measure is that statements that have decisions, such as the “if” statement, can cause different paths. Thus, another common measure of testing is “branch coverage” the percentage of “branches” from decision points that are covered. Branch coverage has its weaknesses too, so there are many other test measures as well -- but statement and branch coverage are the two most commonly-used measures, so we’ll start with them.
Some experts believe that unit testing (low-level tests) should achieve 100% statement coverage and 100% branch coverage, with the simple argument that if you’re not even covering each statement and each branch, your testing is poor. Most others argue, however, that 80%-90% in each is adequate -- because the effort to create tests to meet the last percent is very large and less likely to find problems than by spending the effort in other ways. No matter what, in my opinion you should create your tests first and then measure coverage -- don’t write your tests specifically to get good coverage values. That way, you’ll often gain insight into what portions of the code are hard to test or don’t work the way you thought they would. That insight will help you create much better additional tests to bring the values up to whatever your project requires.
(Oh, and why measure both statement and branch coverage? It turns out it's possible to meet one without the other. For example, an "if" statement with a "then" clause but no "else" caluse might have all its tests yield true.. in which case all the statements are covered, but not all the braches are covered (the "false" branch is not covered). Normally, when you cover all branches you cover all statements, but there are special cases where that is not true. For example, if your program (or program fragment) doesn't contain any branches, or if there is an exception handler without any branches in its body, you can have all branches covered but not all statements covered. Exception handlers might be considered a branch, but that interpretation is not universal.)
There have been several recent developments in testing that improve test efficiency:
Even in the case of test case measurement, there are FLOSS tools that can meet this need. There are several FLOSS “test coverage” tools, such as gcov, that can report which statements or which branches were not exercised by your test suite.
Many software developers have no idea what “formal methods” are. Yet my definition of high assurance implies that we’ll usually need to use “formal methods” to create high assurance software. This section explains what formal methods are, shows that there are lots of FLOSS tools even in this area, and then discusses some of the implications.
Formal methods, broadly, are the application of rigorous mathematical techniques to software development (see An International Survey of Industrial Applications of Formal Methods for a lengthier definition and discussion). Ideally, we’d like a rigorous mathematical specification stating exactly what we want the program to do and not do, and then prove all the way down to the machine code that the software meets the specification. This is normally hard to do, so various compromises are often made. Many have identified three different broad levels of the use of formal methods, in order of increasing cost and time:
The “levels” are a little misleading, because you can actually do things partially (perhaps only a part of the software is formally specified), and level 1 is somewhat ambiguous. But these levels give the basic flavor; there is a trade between rigor and effort.
Now we come to the decision of where to draw the line, and this isn’t an easy decision. For purposes of this paper, to count as “high assurance” there needs to be some carefully-reasoned explanation as to why the running code meets its key requirements. How much effort is needed for this justification depends on the risk you’re willing to take, and where you perceive the risks to be. Thus, while level 0 may be less costly, that is often not enough, so high assurance development often moves into level 1 and uses a focused application of level 2 on the parts that cannot be shown correct otherwise. Almost no one tries to prove all code down to the machine code; typically developers with such concerns will check the machine code by hand to ensure that it corresponds with the source code. Some may prove down to the source code, or at least the parts of the source code that are most worrisome. Others may use proofs to a detailed software design, and then use other less rigorous arguments to justify the source code. You can even back off further, using formal methods only for the specification (level 0), or not at all. In all cases, though, there needs to be some careful reasoning that convinces others that the code actually meets the key requirements, typically by showing a stepwise refinement from specification through to the code. Mantras such as “correct by construction” come into play in these kinds of systems.
We would love to formally prove that every line of code, down to the machine code, is correct; doing so has lots of benefits. Why is it so costly? Simply put, creating proofs is incredibly hard to do; often tools and knowledgeable humans must work together to create them. To prove code correct, you generally must write the code and proofs simultaneously (so that the code is in a form that is easier to prove). Requiring proofs also creates limits on the size of programs (and thus their functionality), because our ability to do proofs does not scale that well. Years ago, the old historical rule of thumb for the largest amount of code that can be reasonably proven correct all the way down to the code level was about 5,000 lines of code. Cleanly-separated components can be verified separately (e.g., a computer’s boot and initialization programs might be separable from an operating system kernel), and that helps. This rule of thumb is (I believe) historical; the tools for verifying code have improved, and good tools (including languages designed for provability) can help today’s developers go significantly beyond this scale. SPARK Ada’s developers in particular claim they can go way beyond this. But it’s not clear where the upper bounds really are, and it’s clear that formally proving code gets harder as the software gets larger. Typical operating systems have millions of lines of code and are growing fast, so no matter what the upper bound is, there is a real gap between typical commercial demands for functionality and the ability of today’s formal methods tools to verify it. Don’t expect Windows, MacOS, the Linux kernel, or *BSD kernels to be formally proved down to their code level. Proving only general models of code (instead of the system itself) eliminates this problem, but as I noted above, this doesn’t show that the code itself is highly assured.
Note that all formal methods have a basic weakness: They must make assumptions, because you have to start somewhere. In any such system, humans have to check the assumptions very, very carefully. If you start with a false assumption, a "proof" could produce an invalid conclusion. This problem -- that your assumptions may be invalid -- is a key reason that testing and other activities are still needed for high assurance software, even if you use formal methods extensively.
Another trade-off in formal methods is between expressiveness and analyzability. Fundamentally, any formal method has some sort of language, a set of axioms, and inference rules (the rules that let you determine if something else is true). A language that is extremely flexible (expressive) typically tends to be harder to analyze. As a result, there are many different languages, each better and different things.
For more information about formal methods, you can see the Wikipedia information on formal methods (particularly the main article on formal methods, automated theorem proving, and model checking), the Formal Methods Virtual Library, the NASA Langley Formal Methods Site, and the Formal Methods Education Resources. QPQ (“QED Pro Quo”) is intended to be an “online journal for publishing peer-reviewed source code for deductive software components”, and has links to various tools and papers. Aleksey Nogin’s “A Review of Theorem Provers” has a nice short summary comparing theorem provers. The Seventeen Provers of the World compares 17 proof assistants for mathematics, and some of which are relevant to software development. Most surveys seem to be old, unfortunately. A quick overview is available from the CMU 1996 paper “Strategic Directions in Computing Research Formal Methods Working Group” as E. Clarke and J. Wing’s 1996 “Formal Methods: State of the Art and Future Directions” [PDF]. I can point out “Notes on PVS from a HOL perspective” by Michael J.C. Gordon (1995), An Analysis of two Formal Methods: VDM and Z (1997), and Vienneau’s 1993 “A Review of Formal Methods” (even the full version is incomplete for its time, but at least it is easy to read). The History of the HOL System explains some of the convoluted history of LCF, HOL, and their various derivatives, and also notes some other systems. "Formal Methods for IT Security" (May 2007) has quick overview of tool types (one quibble: I agree that automatic theorem provers like prover9 take less effort than interactive tools like PVS and HOL, but automatic tools don't give less assurance - it's just that they cannot be effectively used on certain classes of problems). The “Handbook of Automated Reasoning” (Edited by J. Alan Robinson and Andrei Voronkov) is a survey. A short 2003 survey of tools commented on many tools, and the Formal Methods Framework report (1999) summarizes many tools. DACS’ list of formal methods literature is old, but it’s nicely focused on key works. Ingo Feinerer's "Formal Program Verification: A Comparison of Selected Tools and Their Theoretical Foundations" (2005) is a much more recent comparsion of formal methods tools (in this case, of the Frege Program Prover, KeY, Perfect Developer, Prototype Verification System). Griffioen and Huisman's 1998 work compares PVS and Isabelle; Zolda's 2004 work compares Isabelle and ACL2. Formal Methods Europe is an independent association with aim of stimulating the use of, and research on, formal methods for software development; their website has some summaries (though it has a very anemic list of tools as of May 2006). The mathematically-oriented papers Information technology implications for mathematics: a view from the French riviera and Deliverable 4.1: Survey of Existing Tools for Formal MKM compares tools’ mathematical foundations (these have the informal look of notes not intended for the masses, but they are still interesting). Johann Schumann’s Automated Theorem Proving in High-Quality Software Design discusses integrating automated theorem provers into larger development approaches and tools. High-Integrity System Specification and Design by Bowen and Hinchey is a collection of older key essays. Bowen and Hinchey’s “Ten Commandments of Formal Methods ...Ten Years Later” (IEEE Computer, January 2006) discusses previously-identified lessons learned (their “ten commandments” of their IEEE Computer April 1995 article) and argues that they have generally stood the test of time. Two oft-referenced formal methods advocacy pieces were published in IEEE Software: “Seven Myths of Formal Methods” by Anthony Hall (Sep/Oct 1990) and “Seven More Myths of Formal Methods” by Bowen and Hinchey (July 1995). Richard Sharpe argues that formal methods may be increasingly used in the future. Palshikar’s “An introduction to model checking” is a gentle introduction to that topic. Tony Hoare and Jay Misra have proposed a “Grand Challenge” effort to speed maturation of formal methods -- for their pitch, see Verified software: theories, tools, experiments (July 2005). The VSETTE conference of October 2005 has a response to this proposal, focusing on systemic methods for specifying, building, and verifying high-quality software. Shankar’s presentation The Challenge of Software Verification gives interesting comments on this challenge. The paper Formal specification and verification of data separation in a separation kernel for an embedded system (Constance L. Heitmeyer, Myla Archer, Elizabeth I. Leonard, and John McLean of the Naval Research Laboratory) describes a very promising approach to proving all the way down to the code. These tools generally presume that you already understand the basics of formal logic; if you don't, books such as P. D. Magnus' "forall x" may be of use to you. (A serious problem in the U.S. is that many software developers have never studied discrete math, including logic, even though that's the basis of their field; few would allow a civil engineer to design a bridge without first learning calculus.) No doubt there are many other sources of information. Peter Gutman's article on "Verification Techniques" (a chapter of his thesis) is a much more pessimistic view of verification techniques, and has important insights on the limitations of formal methods and some other verification techniques.
After I wrote this paper, I discovered the very interesting list Free software tools for formal verification of computer programs by David Mentré. You should definitely take a look at this paper as well if you're interested in the topic! Trac's list of Theorem Proving systems identifies their licenses, many of which are FLOSS.
Here are some other lists of formal methods related tools:
Note - don’t treat “formal methods” as a checklist item for high assurance (oh look, some math, we’re done!). The point in high assurance is to identify the risk areas, and then use tools (like formal methods) to convincingly show that there isn’t a problem. There is more than a little overlap between those developing high assurance software and the research community; applying these techniques can be difficult for some domains, if you need to get really high levels of confidence for complex systems.
There are many different kinds of formal methods tools, which I will group into these categories:
Note that these are very rough and imprecise categories. All formal methods tools must support some kind of specification notation, tools often have multiple capabilities, and there is a general trend of combining these tools into larger interoperable capabilities. A general discussion about issues in integrating tools is in the paper “PVS: Combining Specification, Proof Checking, and Model Checking”. Thus, any categorization is imperfect, but hopefully this division will help.
There are some interesting competitions, particularly for the automated tools. Here are the CASC results of 2008 using TPTP, for example.
It turns out that there are many FLOSS tools that support using formal methods, in all of those categories, as the following sections will show.
All formal tools have some sort of specification language, but some languages are often focused on higher-level specifications -- helping users enter, syntactically check, and cleanly display the specifications with a minimum of effort. These are often used for level 0 and 1 type of work (though they can be used for more -- often by devising connections to other tools). Here is a partial list of specification languages, and FLOSS tools that support them:
Here are FLOSS theorem provers and checkers (increasingly they are combined with model checkers, in which case I list them here and not under model checkers):
ACL2 is powerful enough to be very useful, and has been used for many important commercial projects. ACL2 directly supports mathematical induction, meaning that ACL2 can directly handle computer programs with loops or recursion (something many other theorem-provers cannot handle as directly and thus must handle in other ways). ACL2’s defchoose and defun-sk abilities add the ability to handle “there-exists” and “for all” statements to ACL2, which the ACL2 developers say adds “all the abilities of full first order logic” (but see below for the limits in how they can be applied; ACL2’s support for the quantifiers (for-all and there-exists) is limited). ACL2 supports encapsulation (the “encapsulate” form lets you describe general properties of a function, instead of having to define all functions), so you do not have to define an executable function to use ACL2. ACL2 supports some capabilities of second-order logic (higher-order functions, though not variables).
ACL2 is one of the strongest of any theorem-prover in its support of executability; you can interactively enter runnable LISP functions, and begin proving properties about them. That is really powerful! There are good LISP compilers, so the execution can be really fast (especially if you use the usual LISP speedup tricks, such as tail recursion, arrays, and declaring numeric types). It also supports “single threaded objects”, which you can use to make models with state run much faster.
ACL2 has weaknesses (and perceived weaknesses) as well, though there is ongoing work to address many of them:
If you are thinking about using ACL2, I highly recommend getting two books by the tool’s creators. These are Computer-Aided Reasoning: An Approach, which describes how to use the tool, and Computer-Aided Reasoning: ACL2 Case Studies, which gives worked examples on various problems. They are absurdly pricey in hardback (around $215-$224 each!), so buy softcovers from the authors instead.
mCRL2
(BOOST license)
stands for micro Common Representation Language 2.
"It is a specification language that can be used to specify and analyse
the behaviour of distributed systems and protocols and is the successor
to μCRL.
It is a formal specification language with an associated toolset.
Using its accompanying toolset systems can be analysed
and verified automatically.
The toolset can be used for modelling, validation and verification of
concurrent systems and protocols.
The toolset supports a collection of tools for linearisation, simulation,
state-space exploration and generation and tools to optimise and analyse
specifications. Moreover, state spaces can be manipulated, visualised
and analysed.
"mCRL2 is based on the Algebra of Communicating Processes (ACP)
which is extended to include data and time. Like in every process algebra, a fundamental concept in mCRL2 is the process. Processes can perform actions and can be composed to form new processes using algebraic operators. A system usually consists of several processes (or components) in parallel."
This uses the BOOST license, which is OSI-approved.
SInE (Sumo Inference Engine) (GPLv3) is "a metaprover targeted on large theories, especially on SUMO". This is actually a support program designed to make other other first-order theorem-provers (like prover9 and E) much more effective on large problems. Programs like prover9 and E take the assumptions (axioms) and negated goal and try to derive everything that can be derived from them, looking for a contradiction. That's fine, but if there are a vast number of irrelevant assumptions, they get overwhelmed, and that's where SInE comes in. SInE selects only "relevant" axioms that "define" the meaning of symbols and then runs an underlying theorem prover.
SPASS (GPLv2) is an automated theorem prover for first-order logic with equality. It can be used for the "formal analysis of software, systems, protocols, formal approaches to AI planning, decision procedures, and modal logic theorem proving." SPASS+T (GPLv2) is an extension of SPASS that "enlarges the reasoning capabilities of SPASS using some built-in arithmetic simplification rules and an arbitrary SMT procedure for arithmetic and free function symbols as a black-box." Unfortunately, SPASS+T requires Yices (proprietary) or CVC Lite (license currently unacceptable to distributors due to a dangerous legal clause), so it cannot be included in the main repostitory of a typical Linux distribution.
I have not included some tools in this list because I can't confirm that they have a FLOSS license. MAYA (originally part of Inka, something that supports graphs and connects to various other useful components) has no license that I can find; its "mathweb" component is clearly GPL'ed, but it's unclear it's entirely GPLed, and it depends on the proprietary Allegro Common LISP. RRL has no license I can find, and I can't download it. The lesson here is that if you develop a tool, you need to clearly identify its license so that others can use it.
Here are tools that are model checkers that at least say they are FLOSS:
However, although the front page of the Spin project says it has an open source license, and I believe that was their intent, there are significant concerns that suggest it may not be a FLOSS license at all. Spin created their own unique license, an unwise practice that is broadly discouraged (because it's so easy to get it wrong). When people create their own licenses but are serious about making them FLOSS, they generally submit it to opensource.org or the Free Software Foundation, but neither the Opensource.org license list nor the FSF license list identify the Spin license as a FLOSS license. That's rather suspicious. What's worse, the Debian-legal team noted some very serious problems with the Spin license, suggesting that it's not a FLOSS license at all. Thankfully, there are other tools available now which do not have a cloud of licensing problems hanging over them.
The Satisfiability Modulo Theories (SMT) problem is an extension of the SAT problem (above). Basically, given expressions with boolean variables and/or predicates (functions that take potentially non-boolean values yet return boolean values), determine the conditions that would make it true (or conversely, show it's false). An SMT solver adds one or more "theories" for various predicates, e.g., it might add real numbers (adding predicates like less-than and equal-to), integers, lists, and so on. SMT solvers are sometimes implemented on top of SAT solvers.
Many other systems build on top of SMT solvers. The SMT-LIB: The Satisfiability Modulo Theories Library and SMT-COMP: The Satisfiability Modulo Theories Competition are important to many SMT solver implementors. Here is a list of SMT solvers (current and abandoned, FLOSS and not).
Examples include:
Here are FLOSS tools that are don’t easily fit into the above categories:
Alloy includes the Alloy Analyzer, “which is a model finder (not a model checker): given a logical formula, it finds a model of the formula. When an assertion is found to be false, the Alloy Analyzer generates a counterexample... Alloy Analyzer is essentially a compiler. It translates the problem to be analyzed into a (usually huge) boolean formula. This formula is handed to a SAT solver, and the solution is translated back by the Alloy Analyzer into the language of the model. All problems are solved within a user-specified scope that bounds the size of the domains, and thus makes the problem finite.” Because of its different approach, Alloy supports many higher-level structures (such as sets, relations, tables, and trees); “most model checking languages provide only relatively low-level data types (such as arrays and records)”. The tool is written in Java, and includes a GUI interface.
This tool looks like it’d be very useful for specifying in some medium assurance environments, and I think it would be useful for high assurance at level 0 (with a little more strength than usual at level 0). The notation is fairly clear, and the notation is specifically designed so that assertions can be analyzed in a completely automated way (unlike today’s theorem-proving). Those are big advantages, and thus this is a good example of a “formal methods light” tool.
However, note that it cannot prove that certain things can never happen; instead it can prove something like "X cannot happen within Y steps" (you can choose Y to be as large as you like). The phrase they use to describe it is a "model finder" approach; basically, they try to create a model that falsify the claims. Thus, while it can certainly give some confidence that the specification is right, it often cannot “prove” things to the strength that you usually want at level 1 or 2 for high assurance. Notationally, Alloy is very different from the tools called "model checking" tools; model checkers are typically designed to analyze compositions of state machines running in parallel, and usually only support arrays and records inside the state machines. In contrast, Alloy supports more abstract notations such as sets and relations. I can easily imagine this tool being combined with other tools (a theorem-prover or model checker)... this approach supports quick tests for sanity, and then you could prove in more depth if you needed to.
Brant Hashii’s “Lessons Learned Using Alloy to Formally Specify MLS-PCA Trusted Security Architecture” describes using Alloy to model security. Pamela Zave’s “A Formal Model of Addressing for Interoperating Networks” describes using Alloy to model network addressing. A number of places use Alloy as a teaching tool as well, because its ability to easily generate graphically-displayed examples seems to help people understand its analysis results.
"Why" is a software verification tool; it is a general-purpose verification conditions generator (VCG) for other verification tools (including Coq and PVS), which it can call on. It can be used as a front-end for many tools, including calling out to many automated tools (so it can actually combine the results of many different tools in a useful way). On top of "Why" are two very interesting tools:
The core Forge library... operates on programs constructed in the Forge Intermediate Representation (FIR), a simple, relational programming language. To analyze a program written in a conventional programming language, like Java or C, that program and its specification must first be encoded in FIR. We have built a command-line tool called JForge that analyzes Java code against specifications written in the Java Modeling Language (JML) by translating them both to FIR, and we have made this tool available for download as well. Others are working on a translation from C to FIR, and we welcome and encourage you to encode your own favorite language in FIR."
"Unlike many other safe C variants such as Cyclone and CCured, Deputy is incremental and thread safe. That is, programmers are free to add annotations and modify code function-by-function. This is possible because Deputy does not change the representation of the data visible across function boundaries, which allows “deputized” modules to interoperate with standard modules. While the initial version of the file may contain several blocks of trusted code, subsequent versions will gradually eliminate this trusted code in favor of fully annotated and checked code." [Beyond Bug-finding].
Another FoCaL overview adds some detail: "Focal, a joint effort with LIP6 (U. Paris 6) and Cedric (CNAM), is a programming language and a set of tools for software-proof codesign. The most important feature of the language is an object-oriented module system that supports multiple inheritance, late binding, and parameterisation with respect to data and objects. Within each module, the programmer writes specifications, code, and proofs, which are all treated uniformly by the module system. Focal proofs are done in a hierarchical language invented by Leslie Lamport. Each leaf of the proof tree is a lemma that must be proved before the proof is detailed enough for verification by Coq. The Focal compiler translates this proof tree into an incomplete proof script. This proof script is then completed by Zenon, the automatic prover provided by Focal."
As of May 2008 it looks more like early research work, but it will probably mature over time. They seem to have focused primarily on implementing computer-aided algebra (CAS) systems so far.
It's worth noting the integrating program SAGE (GPL and GPL-compatible licenses), a Python-based program that integrates several CAS and other mathematical programs with the goal of "Creating a viable free open source alternative to Magma, Maple, Mathematica, and Matlab."
Examples of specific CAS programs include:
Physical Object SelfConnectedObject ContinuousObject CorpuscularObject Collection Process Abstract SetClass Relation Proposition Quantity Number PhysicalQuantity AttributeSUMO maps to the widely-used Wordnet (next!).
These tools appear to be no longer maintained, but they may still be useful as basis for new work:
Not all formal methods tools are free-libre / open source software (FLOSS). I thought I should briefly note a few, so that people can save time instead of trying to track down their licenses.
"Proprietary" simply means that the program is not FLOSS, and thus users do not have the right to use the software for any purpose, view the source code, modify it, and redistribute it (modified or not). Many tools were developed at public expense at universities, but are nevertheless proprietary. Even if the software's development was completely paid for using public funds, do not assume that software will be released as FLOSS to the public. In some cases there are "demo" binaries of the tool available to the public, but the source code is not distributed and/or there are significant limitations on the tool's use.
The following are proprietary tools: Barcelogic, Boolector, HySAT, Spear, Yices, and Microsoft's Z3. ESC/Java and Simplify are not FLOSS; they were once widely distributed, but since they were never FLOSS, when their company's direction changed they became abandoned, legally risky to use, and impractical to maintain. Similarly, Z/Eves was once widely distributed, but it was never FLOSS, so when its distributor stopped distributing it, users had no legal recourse.
MathSAT is not FLOSS, even though it has the text of the GPL and LGPL licenses in its release. Alberto Griggio explained the real situation to me in 2008: "It is linked with the GNU multiprecision library GMP, which is covered by the LGPL. So, the sourcecode doesn't have to be available, and in fact at the moment it is not, sorry. The tarball includes a copy of the GPL as well as of the LGPL, because, as far as we know, you must ship a copy of the LGPL if you link against an LGPL'd library, and the LGPL itself requires that you also ship a copy of the GPL. Sorry if this was unclear,
Epigram has no clear license statement. I've sent an email requesting that this be clarified.
CBMC is a model-checker with an almost free license, so it's probably not FLOSS. It allows change, but requires notification before installation; that certainly fails Debian's requirements (e.g., it fails the "Desert Island" test). Once again, it's frustrating that people want to create new licenses, please STOP!
Bogor is non-commercial-use only, and thus not FLOSS.
HCMC is a model-checking tool, but I haven't found a clear license statment about it.
My point is simply that there are many FLOSS tools in this space, and I think I have proven that convincingly. This is certainly not a complete list; see the resources listed above, the Yahoda verification tools database, CoLogNET’s list of automated reasoning tools , the CASC entrants, Formal Methods Education (FMEd) Resources tools page, the Formal methods virtual library, Automation of reasoning net links, and Freek Wiedijk’s overview for even more programs. There are many non-FLOSS formal methods tools, just as there are non-FLOSS SCM and testing tools, but that does not invalidate my point.
Many of these FLOSS tools are considered very strong, innovative, and/or have been used for serious applications. Spin and the Boyer-Moore Theorem Prover (the basis of ACL2) each received the ACM’s prestigious Software System Award, which recognizes a “software system that has had a lasting influence”. Both have been used for extremely important applications, from checking space probe algorithms through microprocessor design. PVS is also widely considered to be one of the better tools of its kind. Otter has been used to find new proofs previously unknown to mathematics. HOL 4 and Isabelle are widely used among these kinds of tools. MiniSat, Paradox, and Vallst have won awards in recent competitions against other similar tools. Alloy is a new tool, but I think it’s pretty innovative.
One type of tool I’m not including are probabilistic / statistical / Monte Carlo model checkers, such as PRISM (GPL license). They appear to be valuable for medium assurance, but I am skeptical that they are appropriate for high assurance. One tool in this space I should note is GMC (GPL license, likely), which is a highly experimental Monte Carlo-based software model checker for the gcc compiler suite. Open-Source Model Checking explains more about GMC. GMC is very experimental, and does not appear suitable for development use at this time; I note it here because it embodies some very interesting ideas, for those who are interested in the up-and-coming research. Monte carlo model checkers have great promise for medium assurance, but because they only cover statistical likelihoods (not all possible situations), I would be nervous about using any of them for high assurance.
Sadly, many tools have completely disappeared from the world because they were not released under FLOSS licenses. What’s particularly galling is that many governments pay for academics to develop code, yet fail to require releases of that code (that they paid for!) under FLOSS licenses. In my mind, this is shameful; if my taxes paid for the tool, then I should have the right to see, use, and improve it, unless there is strong and specific evidence that an alternative license would be better for that particular circumstance. FLOSS licenses allow others to study, use, improve, and release those original or improved versions, and are thus a much better vehicle for making continued research possible. Instead, often a tool is released (if it’s released at all) only as a proprietary binary file (which is unmodifiable and will eventually be unrunnable) or possibly as restricted source code; in either case its license often has lots of anti-FLOSS restrictions (such as you can’t modify it or you can’t use it for a commercial purposes). The tool often cannot be used for commercial purposes except through special licensing deals; that might sound fine, but in practice this generally squelches research through trial commercial use, and it also prevents distributors (such as Linux distributors) from making the tool widely available. Then either (1) the academic loses interest, or (2) a proprietary company builds on the tool and tries to build a business based on it. If the original academic loses interest, then no one else (even other academics!) has the rights to build on that code. The second option (proprietary commercialization) sounds good, but remember, we've had decades of almost uniform failure in trying to sell formal methods tools (Most restaurant starts fail, never mind niches like this where there are few users and the tools take time to learn.) In the end, the project usually fails. In either case, the software ends up being unavailable to all -- even if government funds were used to develop it. This is incredibly wasteful, and in my opinion this is one of the primary reasons people don’t use formal methods as often: most of the research work is locked up in software with proprietary licenses that is eventually thrown away. A company can do what it likes with its own money, of course, but if it will not sell it as a proprietary product, I think they should at least release it as FLOSS so others can build on it and improve the field. Governments have no such excuse; it is their citizens’ money they are squandering. I believe that, with a little creativity, governments could ensure that such projects can continue and grow.
Obviously, it is possible to have proprietary tools with long support, but investigate any such vendor very carefully. If you use such tools, depending on proprietary versions can be a big risk; if the company goes out of business (which is historically very likely), your largest cost suddenly depends on an unsupportable tool. In large markets, open standards help -- by having multiple competing implementations which you can switch between, competition lowers costs and incentivizes improvement. There are a few standards in this area (e.g., those for Z and VDM). But depending on competition often fails when there is a thin market, as in the case of formal methods tools. You might consider requiring the vendor to escrow their code as FLOSS if they decide to stop selling or supporting their product, so at least you can have a support option if the vendor leaves the business (as most do). This isn’t an empty concern; here are some examples where a proprietary tool license has caused problems for its users:
In contrast, some of the tools that have been released as FLOSS have resulted in incredible benefits to the world, and lower risks for their users. ACL2, Isabelle, HOL4, and splint (for example) come from very long lines of research, and their continued use today demonstrates that releasing software developed during academic research under FLOSS licenses can have tremendous, long-lasting benefits. (The computer algebra system Maxima has demonstrated the same thing; it’s been around since the late 1960s and is still actively maintained.) The NuSMV project specifically re-implemented the SMV tool, so they could get the benefits of being a FLOSS project (permitting extensions like TSMV, an extension of NuSMV to deal with timed versions of circuits). Any company doing research would be wise to consider releasing its code as FLOSS -- if it’s research, they can often receive far more than they release. I think it would be much wiser to require that government-funded software development in academia be released under a FLOSS license under usual circumstances. That way, anyone can start with what was developed through government funds and build on it, instead of starting over.
Indeed, in some ways, FLOSS is an ideal way to commercialize formal methods tools. Formal methods tools require people to learn and apply new skills, so for bigger projects you generally need someone to help you understand how to apply the tool. Thus, the FLOSS commercial model of “give away the code and sell support services” is especially easy to apply in this area. And if the commercial company flops, the work is still available for future research or for combining with other components.
I do not think that within a few years suddenly everyone will be using formal methods, for a variety of reasons. But I do think that over the next many years we will see a very gradual increase in use of these tools in very critical areas.
Of course, one challenge is that assurance tools are often not assured themselves. Assurance tools could even be maliciously undermined; see the discussion under compilers for more about “trusting trust” types of attacks. Here are few items related to assuring the assurance tools:
Tools can’t do everything for you; humans have to help create proofs, and they often have to try many different paths to find the proof. This haiku by Larry Paulson expresses some of the challenges of that process:
Hah! A proof of False.
Your axioms are bogus.
Go back to square one.
(Yes, many tools are designed to counter bogus axioms, but the basic point is still true. Namely, anyone trying to prove properties of real systems often struggles and has to restart several times, going through many different approaches, before succeeding.)
A different issue is how to run the various formal methods tools listed above. A few of them are implemented in widely-available languages also used for other purposes (like C, C++, or Java). Obviously, they’ll need an operating system, and usually need other common tools like text editors. (Warning: Most analysis tools run on Linux/Unix and are either not available for Windows, or only work on Windows with an emulation tool like Cygwin -- making the tool slower.) But there are already well-known FLOSS implementations for these, so we don’t need to discuss them in more detail.
However, specification and proof systems are often built “on top” of other (less common) programming languages. These other languages are often specialized themselves, and in some cases using the specification or proof tools also involves interacting directly with the underlying implementation tools as well (e.g., to control/“drive” the proof system).
Programming languages which are functional programming languages, or have a functional programming subset, are very common for these purposes. A functional programming language is simply a language where assignment is not (normally) used, and thus there are no “side-effects” -- instead, functions accept values and return values (like a spreadsheet does). There are many arguments for the advantages of such systems, but one reason is simply that such systems make it possible to use much more of the arsenal of mathematics. Functional languages usually have built-in support for lists and other constructs useful for the purpose. J Strother Moore’s position is that all highly-assured software should be written in a functional programming language, because it is much easier to prove properties about programs written in them. (Most widely-used languages are “imperative” languages, including C, C++, Ada, Java, C#, Perl, Python, PHP, and Ruby. Techniques for proving programs in imperative languages are known; C. A. R. Hoare’s 1969 paper on Floyd-Hoare logic did so, as did Edsger Dijkstra’s weakest precondition work that was part of his 1975 work on predicate transformers. Moore argues that their complexities are not worth it, and that using a functional approach makes proofs much easier.)
In some cases it’s hard to figure out where to place some language. In particular I’ve placed Maude here, but Maude could arguably be considered an analysis tool (and be placed above). In any case, there are many useful FLOSS implementations of these tools too:
A good FLOSS Common Lisp implementation is GNU Common LISP (GCL) (LGPL license); this implementation isn’t quite ANSI compliant, but since it compiles to efficient machine code it’s often used for proof-checking work (because this task is very compute-intensive). Another high-performance Common Lisp implementation is CMUCL (public domain). GNU CLISP (GPL license) also implements Common Lisp; it has a bytecode implementation which makes it a little slower, but it's very capable.
There are many FLOSS implementations of the Scheme variant of Lisp as well. These include GNU guile (LGPL; small and quick to start up, used in many GNU programs as an extension language), Bigloo (compiler and tools GNU GPL, library is GNU LGPL; this has a higher-performance compiler and other components focused on bigger development efforts, including a compile-time type checking system), umb-scheme (public domain), and Gauche (command-line "gosh") (BSD license). Practical Scheme is a useful place to start looking for libraries. Teach yourself Scheme in Fixnum days is a reasonable intro to Scheme.
I should note that although Scheme and Common Lisp have a lot of shared history, as languages Scheme and Common Lisp are basically incompatible with each other. Common Lisp has multiple namespaces (versus Scheme's one), Scheme uses the special values #t and #f for true and false, with distinct values for #f, NIL, and the empty list'(); Common Lisp uses the older Lisp convention of using the symbols T and NIL, with NIL also representing the empty list, and non-nil being considered true. In practice, Scheme programs use recursion where a Common Lisp program would not (because Scheme guarantees tail-call optimization while Common Lisp does not). Most of the built-in functions have different names (because of different naming conventions), and many have subtly different semantics (because of their fundamentally different notions about lists and boolean values). The program scmxlate by Dorai Sitaram translates Scheme into Common Lisp; the license in the package itself isn't FLOSS, but Sitaram has separately released scmxlate and other packages under the LGPL, so it is FLOSS even if at first it does not appear to be.
Both Common Lisp and Scheme have formal standards, so as long as you stay with the standards, you can generally port from one to another. Although Lisp's usual programming notation is different most other languages (strictly prefix s-expression), and it has a history of being an "AI" language, studies have found that both development time and performance can be quite good. Ron Garret (nee Erann Gat) did a study in writing a test program in Lisp; the resulting Lisp programs ran faster on average than C, C++ or Java programs (although the fastest Lisp program was not as fast as the fastest C program), and the Lisp programs took less development time than the other languages. Norvig's "Lisp as an Alternative to Java" adds some commentary about this.
A new and interesting variant of Lisp is implemented in Qi (GPL license). This is another Lisp variant implemented on top of Common Lisp; it supports pattern-matching, optional static type checking, and is “lambda-calculus consistent” (supporting partial things, like partial applications). The authors claim that the type system of Qi is more powerful and flexible than the very powerful capabilities of ML or Haskell, because it is based on sequent notation. Their manual is interesting in its own right, they even discuss implementing simple automated reasoning systems. One problem with Lisps is that their usual programming notation (s-expressions) is painful for most people to read; see my Readable s-expression work (including sweet-expressions) for techniques to overcome that.
Clearly, you need to have a way to execute the highly-assured source code... which requires either a compiler or interpreter.
If you write code in C or C++, it quite common to use the gcc compiler suite (GPL license), which is FLOSS. Developers of high-assurance software who choose to use C or C++ often use gcc as well, so clearly there are FLOSS tools covering these languages.
However, programming languages often used for other assurance levels don’t work as well for high assurance. Here are comments on some programming languages if you are interested in high assurance:
One additional problem, too: many high assurance programs must also be hard real-time systems (with guaranteed execution times), and/or must prevent “covert channels” of communication. In practice, this often means that runtime memory heap allocation can’t be used, including automated garbage collection. Without automated garbage collection, many languages -- including Java and C# -- are impractical; their designs fundamentally require it.
BitC is conceptually derived in various measure from Standard ML, Scheme, and C. Like Standard ML, BitC has a formal semantics, static typing, a type inference mechanism, and type variables. Like Scheme, BitC uses a surface syntax that is readily represented as BitC data. Like C, BitC provides full control over data structure representation, which is necessary for high-performance systems programming. The BitC language is a direct expression of the typed lambda calculus with side effects, extended to be able to reflect the semantics of explicit representation.”
“In contrast to ML, BitC syntax is designed to discourage currying [because it] requires dynamic storage allocation... Since there are applications of BitC in which dynamic allocation is prohibited, currying is an inappropriate idiom for this language. In contrast to both Scheme and ML, BitC does not provide or require full tail recursion [but restricts it in a way that] preserves all of the useful cases of tail recursion that we know about, while still permitting a high-performance translation of BitC code to C code. Building on the features of ACL2, BitC incorporates explicit support for stating theorems and invariants about the program as part of the program’s text. As a consequence of these modifications, BitC is suitable for the expression of verifiable, low-level ‘systems’ programs. There exists a well-defined, statically enforceable subset language that is directly translatable to a low-level language such as C.”
An implementation is being developed (BitCC) which generates C code (which is then compiled), as part of the Coyotos project. Currently BitC uses a Scheme (LISP)-like syntax, though a C-like syntax may be built eventually. BitC development work is being done as part of the Coyotos project.
But choosing Ada doesn’t prevent the use of FLOSS. The FLOSS GNAT Ada compiler (GPL license) is also part of the gcc toolsuite that compiles Ada code, and commercial support is available for GNAT from AdaCore. GNAT is one of the very best Ada compilers around; it is widely used. In any case, you don’t have to use Ada for high assurance software development, but if you choose it, you can use a very good FLOSS implementation. Also, check out SPARK (discussed separately).
Sometimes the best approach is to create domain-specific language, use that to define at least part of your system, and then create a code generator for your language. This makes it much easier to use languages you might not be able to use otherwise. This can be very effective, but you must still decide how to implement the code generator (and the rest of the code, if applicable), and you must somehow verify that the generated program does what you want.
There are a lot of guidance documents out there on how to use various languages to implement high assurance software, or have more general guidance on development. ISO/IEC JTC 1/SC 22 has a list of some related documents. If you're interested in how to develop secure software, see my book developing secure programs. Some relevant documents include:
In all cases, you want to use the tools in a way that tries to catch as many mistakes as possible before the code gets out to the user. For example, you would normally turn on essentially all warning flags, and you would normally set up guidelines on how to use the language (to avoid things that are known to be problematic).
Compilers are hard to verify too. There’s been some research progress for formally verify that a compiler correctly implements its language, but it’s not ready for typical use yet with real compilers; current work focuses on tiny toy languages, and even these are difficult. The article Qualification of Software Development Tools From the DO-178B Certification Perspective notes some of the challenges in qualifying tools directly (from the DO-178B perspective). An approach more commonly used today is to have the compiler generate code, and then do hand-checking to make sure that the generated code matches the source code. This is obviously not ideal, but it easily beats writing the assembly code by hand, and if the compiler directly supports it (the GNAT Ada compiler does) this is not to hard to do. Using a widely-used compiler, and avoiding “stressing” the compiler, reduces the risk of accidentally inserting an error into the final machine code to a very small amount, and hand-checking can detect such errors.
Intentional malicious subversion of compilers turns out to be much more difficult to counter. An Air Force evaluation of Multics, and Ken Thompson’s famous Turing award lecture “Reflections on Trusting Trust” showed that compilers can be subverted to insert malicious Trojan horses into critical software, including themselves, and it’s shockingly hard to counter the attack. Thompson even performed the attack under controlled experimental conditions, and the victim never discovered it. For decades it’s been assumed that this was the “uncounterable attack”. Thankfully, my own academic work on Countering Trusting Trust through Diverse Double-Compiling shows how to counter subversive compilers, so even this nasty problem is now solvable.
There are very few high assurance FLOSS components available. A later section will discuss this, but for now, here are the closest to high assurance FLOSS components that I’ve been able to identify (some are quite a stretch):
To be fair, though, Halfs isn't really high assurance as far as I'm concerned. Yes, they implemented a filesystem in Haskell, but you can implement junk in any language (including Haskell). Haskell does provide some type guarantees, but that's not enough. It's true that transformations to machine code can be checked thoroughly, but on the other hand, it's difficult to analyze resulting Haskell machine code (in traditional procedural languages the correspondance is clearer, and thus easier to check). What's potentially interesting from a high assurance viewpoint is that Haskell can be deeply analyzed in a way many other languages can't, and so you could verify many interesting properties of Halfs. Unfortunately, I don't see much evidence that this has actually occurred with Halfs.
More generally, Galois's whole business model rests on developing high-assurance software (including heavy use of formal methods). They do most coding in Haskell, and they do contribute to many Haskell FLOSS efforts (eg, the Glasgow Haskell Compiler which is under a BSD3 license), although often the core software they develop is proprietary. John Launchbury's talk at CUFP may be of interest.
Unfortunately, the TCX project doesn’t seem to be working to ensure that the project can live on or be supported as a FLOSS project. Many FLOSS developers will only work on projects if they can be supported with FLOSS tools (see The Java Trap for a discussion). Yet the project chose to use the proprietary CM tool Perforce, and was seriously considering the use of a tool (PVS) that was proprietary at the time instead of using a FLOSS tool (such as ACL2). PVS is now FLOSS, but that is not the point. The point is that the project failed to even consider licensing as a selection factor when it performed its survey of tools. If the TCX fundamentally depends on non-FLOSS tools, it will probably be mostly ignored by FLOSS developers, because it will essentially be unusable to them (after the Java Trap and the Linux kernel’s Bitkeeper crisis, few will be interested in depending on non-FLOSS tools). There’s also no evidence of outreach to the FLOSS community; if FLOSS were being considered as a future maintenance strategy, that’s a curious omission. Also, since TCX is focusing on being a demonstration, it is unclear if the results will be usable as an implementation in real projects. And besides all this, there is no plan for it to undergo a formal Common Criteria evaluation or any other independent evaluation, even though that is the specification they are using as their baseline. And finally, I suspect that the project has been halted; I did not get any responses from email queries, and I’ve heard separately that the project stopped.
There’s much more available if you are simply looking for a real-time operating system (RTOS) kernel, and aren’t really looking for high assurance. eCos is a FLOSS RTOS, for example, and there are several projects that create “real-time” versions of the Linux kernel. Also, if you want a hypervisor, Xen is very popular and is FLOSS. But none of these are high assurance.
Recently several proprietary Multiple Independent Levels of Security (MILS) separation kernels, have become available. It’s not clear that there are FLOSS implementations becoming available as well, though clearly it’s possible (the TCX project might do so, but it’s not clear that the code will be FLOSS, and it’s not clear that it will be sufficiently capable to be useful... it’s currently not available). The July 2004 draft separation kernel protection profile, which defines the requirements for the separation kernel, is a Common Criteria-based document requiring EAL 6 plus augmentation. More information is about the MILS architecture is available in the August 2005 Crosstalk article about the MILS architecture, and you can look at John Rushby’s papers that discuss this in more technical detail. Greve, Wilding, and vanFleet have defined a formal specification for a MILS separation kernel using ACL2 [code]; Alves-Foss and Taylor comment on it. I could imagine a variant or re-implementation of Xen meeting this, however, Xen is designed for performance instead of trying to meet these requirements.
There are a lot of FLOSS tools to help achieve medium assurance, and there are also many FLOSS programs that achieve medium assurance. Let’s look at them, so we can contrast the situation of FLOSS tools and components in medium assurance against those in high assurance.
There are a number of FLOSS tools to help find defects in programs. These include splint and flawfinder; my Flawfinder home page identifies many of these program analysis tools, such as PMD for Java. The Linux kernel developers even developed their own static analysis tool to examine the kernel (kernels are very different from application programs, and their common failure modes are different than applications’). But note that these tools are generally designed to to look for specific common defects and defects that are easy for tools to find. These could have some value in high assurance applications as well, since these tools could quickly filter out problems before bringing out the “big guns”. But these tools aren’t enough; a program can pass every such tool and still have serious flaws. It’s also worth noting that these tools often have many false positives (“bug” reports that aren’t really defects at all).
I should note that it’s been widely and repeatedly proven that the most efficient method for finding and repairing defects, in terms of found and repaired defects per person-hour spent, are peer reviews -- groups of people actually looking at the code. There are various processes for this, going under names such as “software inspections” (Fagan and Gilb have different processes sharing that name). Peer reviews are low-tech, and often not exciting to do... but they’re so effective that anyone leading a software project should be interested in them. Andy German’s 2003 paper “Software Static Code Analysis Lessons Learned” reports that “independent code walkthroughs are the most effective [static code analysis] technique for software anomaly removal [, finding] up to 60 percent of the errors present in the code.” I co-edited and co-authored an IEEE Computer Society Press book, “Software Inspection: An Industry Best Practice”, which if you can get it (it’s now out of print) presents lots more information about it. The Formal Technical Review Archive lists many other texts. For high assurance, 60% is not 100%, but for medium assurance it’s fantastic. The known effectiveness of peer review explains why the “many eyeballs” idea of FLOSS really can work.
Several FLOSS implementations have undergone traditional medium assurance evaluations. There are at least two Linux distributions that have undergone Common Criteria evaluations (Red Hat and Novell/SuSE) at what I would term medium assurance levels. Similarly, the cryptographic library OpenSSL has undergone FIPS evaluation. To be fair, a program that has not undergone a Common Criteria evaluation process might be more secure than one that has, and a program that underwent a lower EAL Common Criteria evaluation might be more secure than one at a higher level. The Common Criteria is a standardized set of processes for evaluating the security of software, and higher EAL values add more evaluation processes. A product that has not gone through the higher levels might still be very secure -- all we know for sure is that no one has paid for the higher-level evaluation to be performed.
Unsurprisingly, many other popular FLOSS programs and systems have undergone many tool-based evaluations searching for flaws as well:
Looking more broadly, it is clear that there are many FLOSS projects which take significant steps to search for and remove defects.
Many experts have concluded that FLOSS has a potential advantage over proprietary software when it comes to security or reliability, though not all FLOSS programs are more secure than their competitors. This is borne out by many studies of actual FLOSS programs, which show that FLOSS does very well in terms of security and reliability (often much better than their proprietary competition).
Several papers and presentations discuss the general issue of security and FLOSS, noting some potential advantages of FLOSS. Examples of such sources include “Issues of Dependability in Open Source Software Development” (ACM SIGSOFT, Software Engineering Notes, May 2002) the section on open source software in my book Secure Programming for Linux and Unix HOWTO, and my presentation on open source software, security, and software assurance. There’s no guarantee that a FLOSS program will be more secure than a proprietary program -- do not mistakenly think otherwise. However, there are potential advantages to FLOSS, for example:
There’s certainly lots of evidence that FLOSS programs can be more reliable and secure than their proprietary competition. My paper “Why Open Source Software / Free Software (OSS/FS or FLOSS)? Look at the Numbers!” has lengthy lists of studies showing that various FLOSS programs do very well in terms of reliability and security; see them for more. For example:
In addition, many FLOSS programs have now built-in countermeasures to counter security problems in other components. "Security Enhancements in Red Hat Enterprise Linux (beside SELinux)" by Ulrich Drepper describes many mechanisms in Red Hat Enterprise Linux that counter or limit damage even if another component has a vulnerability. I'm a strong advocate of this belt-and-suspenders approach to security; by all means, eliminate vulnerabilities in components, but also deploy defensive measures so that unfound vulnerabilities are less serious.
In short, there is every reason to believe FLOSS components can be at least as reliable and secure as proprietary components, if not more so.
So now we come to an interesting question -- why are high assurance FLOSS components so rare? There are very few real high assurance components available, period, so we would expect there to be few FLOSS components. Yet FLOSS still seems under-represented.
Jeffrey S. Norris and Poul-Henning Kamp’s “Mission-Critical Development with Open Source Software: Lessons Learned” (IEEE Software) clearly shows that people do develop mission-critical systems at NASA using FLOSS components; they relied heavily on FLOSS components, reporting that they kept the project within budget and resulted in a more robust and flexible tool. The text suggests that these were medium assurance applications, but the lessons are worth considering at any assurance level.
Key vendor-differentiating software is usually not FLOSS. If a company or government uses a particular piece of software as part of their competitive advantage, they should usually not release it at all (as either proprietary or FLOSS). But most software is actually not company or government differentiating, and even if the software as a whole is, some of its pieces are still typically commodity components. A company that sells software licenses to others will often not choose to release that program as FLOSS -- but if they have customers that use the software, sometimes those users find that using a FLOSS program instead has its advantages. Components that are commodities shared among many devices, such as a separation kernel or real-time operating system kernel, would make sense as FLOSS.
There’s fragmentation in the high assurance market (e.g., different detailed requirements in different standards and different circumstances) which probably doesn’t help. But this harms other approaches for developing software too, and increasingly this problem is being recognized and addressed.
Clearly high assurance components are normally required to go through expensive independent evaluations (due to various regulations). But again, that is a hurdle that has been overcome before, many times.
FLOSS is widely represented in other areas. There are many medium assurance FLOSS programs that have better security or reliability records than competing proprietary programs. FLOSS is certainly well-represented by tools to create medium and high assurance components as well. A vast number of FLOSS tools are available, in fact, for creating high assurance components (and many have already been used for the purpose, including ACL2, SPIN, and GNAT).
It’s even more bizarre when you compare software proofs with the way normal mathematical proofs are made. “Normal” mathematicians publish their proofs, and then depend on worldwide peer review to find the errors and weaknesses in their proofs. And for good reason; it turns out that many formally published math articles (which went through expert peer review before publication) have had flaws discovered later, and had to be corrected later or withdrawn. Only through lengthy, public worldwide review have these problems surfaced. If those who dedicate their lives to mathematics often make mistakes, it’s only reasonable to suspect that software developers who hide their code and proofs from others are far more likely to get it wrong. Joyner and Stein's "Open Source Mathematical Software" (Notices of the AMS, Nov. 2007) notes how well FLOSS matches the methods in mathematics for producing high-quality results, and includes some interesting quotes. Andrei Okounkov (2006 Fields medalist) notes that "I think we need a symbolic standard to make computer manipulations easier to document and verify... An open source project could, perhaps, find better answers to the obvious problems such as availability, bugs, backward compatibility, platform independence, standard libraries, etc... I do hope that funding agencies are looking into this." Neubüser notes that with proprietary software "two of the most basic rules of conduct in mathematics are violated: In mathematics information is passed on free of charge and everything is laid open for checking.”
The nonsense that FLOSS is more vulnerable to insertion of malicious code doesn't wash, either. Any FLOSS or proprietary program can be modified - just get a hex editor. The trick is to get the maliciously-modified program into a customer's supply chain, and that's much harder: You have to get the malicious code into the FLOSS component's trusted repository, and not noticed (else it will be removed). "Application Security: Is the Backdoor Threat the Next Big Threat to Applications?" by Scott Berinato (in CSO Online) interviewed security researcher Chris Wysopal of Veracode. "As detection and scanning technology gets better at finding the accidental coding errors like buffer overflows, Wysopal believes the malicious will turn more and more to using backdoors--holes in programs usually intentionally programmed in to allow access to an application." Wysopal goes on to note that "The lifetime of a backdoor in open source is very short. It’s measured in weeks. The lifetime of a backdoor in closed source is measured in years. The many eyes concept of open source is working to detect backdoors. We found that in most open source cases, the malicious or accidental opening was detected in a matter of days, sometimes a few weeks. But every backdoor in the binary of proprietary software was there for years or an indeterminate length of time. It never happened that closed source backdoors were discovered in months. With an old one, Borland Interbase, we saw seven years worth of versions where a backdoor was there." The interviewer was surprised, and appears to have been unaware of FLOSS trusted repositories and how they work. He asked, "with so many people manipulating open source code, the number of backdoors to detect must be exponentially higher than proprietary systems, and the potential virulence, of spreading backdoors, must be much higher with open source?" Rather than explain this, Wysopal went immediately to the measure that mattered: "when we looked at special credential backdoors, the four biggest were all closed source products." The notion that FLOSS always makes you more vulnerable to backdoors is contrary to real world experience.
Yet the source code and proofs for high assurance programs are almost never published publicly at all (never mind being released as FLOSS). This means that many "high assurance" programs fail to exploit the methods used by mathematics to strengthen claims. Thus, there’s a good case to be made that high assurance FLOSS programs would tend to be much higher assurance than proprietary programs, because there could be a worldwide review of the proofs. At least for safety-critical work making FLOSS (or at least world-readable) code and proofs would make sense; why should we accept safety software that cannot undergo worldwide review? Are mathematical proofs really more important than software that protects people’s lives?
Several potential explanations come to mind, and I suspect only the last two are really true:
Commercial organizations often support FLOSS projects when it’s in their financial interest to do so. Essentially all work in major FLOSS projects like the Linux kernel and Apache website are done through software developers paid to do so. Even in high assurance tools there are examples; ACL2 is financially supported by many corporations; AMD, Rockwell Collins, and Sun Microsystems are three of them, and they’ve supported ACL2 because they’ve used it in microprocessor development. Not all FLOSS projects are financially supported like this, of course, but today’s corporations are willing to do so when it’s in their own interests.
Certainly, there’s a big problem that there is a very high “initial fee” to create and evaluate an initial product, but that’s true for other FLOSS products too. For years it was assumed that modern compilers and operating system kernels were “too hard” for FLOSS; that myth has been completely blown. And besides, these also mitigate against proprietary suppliers; proprietary suppliers have difficulty raising enough money in these cases too. These initial fees could be addressed by having several large integrators or users (who use lots of high assurance components) pool their funds to create such a component, with the idea that they’ll reduce their overall costs. Such organizations manage to create consortia in other areas; there’s little reason they can’t handle this too.
Of course, such organizations could claim that they’re worried about “free riders” who use the component without paying for initial development, but that’s missing the economic point. Nearly all costs for software are not in development -- they are in maintenance. One reason FLOSS projects do so well is that many people copy the software (starting out as “free riders” ), but a small fraction eventually contribute valuable information (bug reports, patches, etc.)... because they must to cost-effectively use the product. As a result, releasing a component as FLOSS allows the maintenance costs -- the majority of all costs -- to be shared, even among those who did not pay for initial development. This means that a proportion of those who start as “free riders” eventually help share the maintenance costs, reducing the maintenance costs for all. Since maintenance is the primary cost, there is a reasonable economic rationale for releasing software this way (in some cases). The initial investors have an additional reason to invest: they will decide what will be implemented first, and how, and many have decided that this additional control is worth the initial investment. That doesn’t mean that FLOSS is always the best economic approach, but it does mean that it’s an approach worth exploring.
The only obvious true difference I see are liability issues, but this could help FLOSS implementors. Some people may think that the liability costs of high assurance software will make it “impossible” to use FLOSS -- yet this misses the point. Liability is something customers are willing to pay for, and even better for a FLOSS business, customers would typically have to pay for liability protection and support. AdaCore, who support the GNAT Ada compiler, depend on this and seem to be doing well. So this makes high assurance unusually tempting for a FLOSS business -- customers can try out the product (making it more likely to be considered), but will be required to pay the supplier through laws and contracts requiring liability protection.
Many different economic models could be devised; let’s imagine just one. A consortium could be established to create such high assurance components, and give a liability price break to its founders -- with others paying a higher (but reasonable) price to gain liability protection and support (perhaps the costs go down after paying double the costs of the initial founders). Such a consortium could encourage code and monetary contributions through a variety of means (e.g., by using the LGPL license, or using a dual-license approach with the GPL license being one of them and a for-pay license being the other). Organizations like major government integrators might be interested in such a consortium, because it would be a way to reduce their expenditures... and they would not want to have to compete against those in the consortium, if they were outside it, since those inside might have lower future expenditures. Many variations of these ideas are possible, of course. There seems to be little evidence that there’s no economic model for FLOSS high assurance components.
Halloran and Scherlis’ paper “High Quality and Open Source Software” notes that in the medium assurance realm, the “quality and dependability of today’s open source software is roughly on par with commercial [proprietary] and government developed software”, and then asks, “what attributes must be possessed by quality-related interventions for them to be feasibly adoptable in open source practice?” They then note several attributes, which would presumably apply to high assurance as well. They note that FLOSS projects generally bootstrap on top of other FLOSS tools, in part due to ideology, but also because it lowers barriers for new participants and enables developers to fluidly shift their attention from tool use to tool development/repair. As noted above, there are FLOSS tools available. But in their conclusions they note these criteria: “(1) an incremental model for quality investment and payoff (e.g., incrementally adding analysis support, test cases, measurement, or other kinds of evidence collection), (2) incremental adoptability of methods and tools both within the server wall and in the baseline client-side tool set, (3) a trusted server-side implementation that can accept untrusted client-side input, and (4) a tool interaction style that is adoptable by practicing open source programmers (i.e., that does not require mastery of a large number of unfamiliar concepts).” Based on this list, they conclude that “With the exception of testing technology and some code analysis technology, these requirements suggest that some adaptation will be required before adoption is possible for tools that embody, say, lightweight formal methods approaches or advanced program analysis approaches. Clearly, any technique or tool is not feasibly adoptable if it requires a major (client-visible) overhaul of a project web portal, collaboration tools, development tools, or source code base. Discernible increments of benefit from increments of participant effort is key to adoptability.” In short, formal methods require a lot more training before the benefits can accrue -- and because they aren’t incremental (and are rare), it’s harder to do.
I think the difficulty of acquiring the necessary skills before being able to do any work is a real and valid issue. This is a real problem for using these techniques to develop proprietary software, too! But there is such experience, so while valid I don’t think that’s the primary issue.
People tend to do what they know how to do, and repeat approaches they've used before. In many cases, the people who are interested in developing a small high-assurance component have done this before and used a proprietary approach... and since that is what they did before, they do not think to consider an alternative.
For example, I went to a detailed economic review in 2006 of one particular high assurance effort, where the big contrast was between “government-owned” and “commercial proprietary software vendors”. The possibility of a FLOSS approach (e.g., establishing a consortia to create a FLOSS implementation) simply never entered any decision-maker’s mind! When an alternative is never considered, it’s not surprising when it isn’t chosen. The developers of high-assurance software are quite familiar with FLOSS, and in fact are developers of it themselves (just look at all the FLOSS tools for high assurance!). But the decision-makers about high-assurance software are not the same people, and in fact tend to be very conservative people because they’re worrying about security and safety. Conservatism about technology makes sense for these stakes, but that doesn’t mean that valid economic alternatives should be ignored.
The TCX research project is even more bizarre, because it specifically notes that it hopes to aid both the proprietary commercial and the “open-source” sectors. The TCX project even mentions planning to release its results “using a philosophy similar to the open source approaches”. But the information published so far does not suggest that the materials will ever be released under a FLOSS license, nor have I found evidence that this option was considered. Licensing was not even a criteria in its tool survey, yet depending on a language only implemented by a proprietary tool would be certain to drive away many FLOSS developers. The TCX published works so far include a mathematical model, which could be used as a basis for further work, but it is only released as a file in (uneditable) PDF format, with no rights to make improvements. There is no FLOSS license granting rights to improve any of the works released as of April 2006, even though the TCX text seems to imply that the MIT and BSD-new licenses might be appropriate for them. There’s also no published rationale for why the works will not be released under a FLOSS license; given the many pages discussing other project decisions, the failure to discuss the licensing of the results (where dissemination is the whole point of the project) is jarring. The reader is left with the impression that the ideas of using FLOSS licensing terms to release their work, and working with the larger FLOSS community, never occurred to the project leaders.
In short, I think this is the primary explanation: the FLOSS option does not seem to be even considered as a potential strategy. I don’t think every high assurance component of the future will be FLOSS, in fact, I expect there to be a continued number of proprietary high assurance products. But there also seems to be no special reason that it should be ignored. I suspect that the number of FLOSS high-assurance products will grow as decision-makers start to consider FLOSS options as well. In the long term, I expect a mixture of proprietary and FLOSS components, just as this is already true for compilers, operating system kernels, web browsers, web servers, and so on.
There are a large number of FLOSS tools available for creating high assurance components. There are a vast number of tools supporting configuration management, testing, formal methods, and code generation. This paper focused on formal methods, since this is distinctive for high assurance, and showed that there are many tools in this space (both research and industrial-strength). Some of these tools are ACL2, HOL 4, Isabelle, Otter, ProofPower, ZETA, CZT, Spin, NuSMV, BLAST, and Alloy. (I find ACL2, Otter, Spin, BitC, and Alloy especially interesting, though with different reasons for each.)
In contrast, few high assurance components are FLOSS. After looking at the options, the most likely reason for this appears to be that decision-makers are not even considering the possibility of FLOSS-based approaches. Decision-makers should consider FLOSS-based approaches as future high assurance components are needed, including the possibility of creating consortia, so that a FLOSS-based strategy can be chosen where appropriate.
Governments should require that government-funded research normally release all software it develops under a FLOSS license, unless the government is convinced that in this particular case there is a better alternative. This is amply demonstrated by the many discarded and underused proprietary projects in these fields; the waste and lost opportunity alone is enough to justify this. More fundamentally, governments use money from their people to do research; it is only fair to ensure that all the people (who pay for the research) can reap the benefits, unless they will be better served some other way. If software research results are FLOSS, then anyone can start with what was developed and build on it, instead of starting over. The Open Informatics Petition text goes farther and suggests that governments simply mandate this no matter what. I will not go that far; I think there are good reasons for cases where government-funded research should not be released as FLOSS, but the petition does explain in more detail the advantages of this approach. Most detractors are concerned about this being required on all research. If you simply say that FLOSS is a wiser “default”, I think a much better balance is struck than the current system in many countries. The Open Informatics web site has more information. Requiring FLOSS release does not prevent commercialization; there are many FLOSS-based businesses, and many FLOSS licenses permit adding extensions and making the result proprietary. The many success stories from FLOSS-based approaches (e.g., ACL2, Security-Enhanced Linux, etc.) suggest that releasing software under FLOSS licenses is a very effective way to improve tech transition and establish sustainable research. Since the GPL is the most common FLOSS license (in formal methods tools and in general), whatever FLOSS license is used in this case should at least be GPL-compatible -- that way, research efforts can be combined into larger works as needed. For the same reason, I would recommend that one of the “classic” FLOSS licenses be used in most cases (i.e., MIT, BSD-new, LGPL, or GPL) -- they are well-understood, and can be combined as necessary.
Finally, developers who want to start new FLOSS projects should consider developing or improving high-assurance components or tools (including tools that combine other tools). Improving the user interfaces, capabilities, or integration of tools would be very valuable. Sample assured components, especially ones that are useful (like separation kernels or RTOSs) would be of value too, both for potential users and for others developing such programs (because there are few publicly-available examples that people can experiment with and learn from). These are technically interesting, and given the increasing attacks and dependence on computer systems, having more high assurance programs available will be vital to everyone’s future.
You can see David A. Wheeler’s home page, https://dwheeler.com
You should also see the Open proofs web site.