When designing software, if something looks large or complicated, try to think of a way of making it simple... possibly by entirely eliminating components. It may be much faster to implement, more reliable, and easier to maintain as a result. In short: Rethink your problem, and try to make a simple solution.
That’s easy to say, but hard to put into practice. So here’s a story from the mid-1990s that most software developers have never heard... it may inspire you.
My story involves Robert Dewar, who in the early 1990s was trying to implement a compiler for the Ada programming language. Not just any compiler - he wanted to create a compiler that could be freely used, shared, and modified. This was generally called Free software; the phrase open source software (OSS) would not be invented until years later.
At the time Ada compilers were large, slow, and complex... and as a result, they were outrageously expensive. Alsys (the company founded by Ada’s designer) sold special extended memory cards for x86 PCs just so people could run an Ada compiler. There were a variety of reasons for this. One was that the language specification seemed to imply that implementations should have a centralized library for the object files it generated. In particular, the language specification required type-safety across an entire program.
The language specification authors weren’t crazy; this requirement prevented some really dangerous hard-to-debug errors. It’s quite possible in C or C++ to modify a header (.h) file, recompile only some of the files that needed compiling, and them link them all together into an executable; the result can be mysterious defects. The central library according to the language specification was only a model... it was generally agreed that there was no technical requirement that the compiler really have a central library. However, it was expected that compilers would work this way at the time, since that was the obvious way to ensure type-safety across an entire program. However, having a central library meant that Ada compilers had to generate, store, and manage a central library in some complex intermediate format.
Instead of building an entire compiler from scratch, Robert Dewar decided to build on an existing good compiler: gcc. Dewar then had lengthy conversations with Richard Stallman, gcc’s original developer. Richard Stallman believed that Ada’s type-safety across-the-program rule did not need to create order of compilation dependencies [Gasperoni2012]. After these discussions Robert Dewar carefully looked through the language specification to figure out how to meet the specification by using a more conventional compilation model... and found a much simpler way to implement the language specification.
The GNAT compiler that Dewar co-developed, unlike previous Ada compilers, simply read source files and generated a single object file. That was conventional for other languages, but something new for Ada because of its program-wide type-safety requirement. There was no centralized library information of any kind; instead, the “central library” was the source code itself (along with some library information generated by the compiler and distributed into different files). The developers still needed to ensure that the type-safety was maintained across an entire program, but they instead did this using a tiny “binder” program that checked timestamps - implementing the type-safety requirement in a much simpler manner [Dewar1994]. This design made the entire compiler much simpler - all the code for managing a central library (including saving it, loading it back, and dealing with race conditions) completely disappeared. This simplicity also made parallel compilation trivial, and the race conditions from previous compilers simply couldn’t happen.
One problem with this design is that a whole lot of “extra” compilation had to be done compared to how other Ada compilers did the job. In this design Ada spec files (which are similar to C/C++ header files) had to be processed far more often than in a traditional Ada design. In previous Ada compilers, an Ada spec file would only be processed once; in this new design cooked up by Dewar, spec files had to be processed many times. Instead of giving up the simplicity of the design, Dewar and his co-developers looked for ways to keep the simplicity.
One of their approaches, which I’m not sure is documented elsewhere, is that they hand-optimized the lexer. The lexer was a small piece of code that directly processes source code. A lexer is a relatively small routine, so hand-optimizing a small routine was not a big deal. What’s more, the hand-optimization didn’t make anything else complex; the hand-optimization was entirely limited to one tiny part of the compiler. By hand-optimizing a very small part of the program, they could completely eliminate many other components while maintaining high performance - a very worthwhile tradeoff. They also used a hand-coded recursive decent parser for the rest of the front-end; this did improve performance, but the main reason they did this was to create dramatically better error messages. Page 53 of [Schonberg1994] (which describes the compiler design) notes states that in the end, “given the speed of the front-end [both lexer and parser], our approach is no less efficient than the conventional [Ada] library mechanism, and has... important advantages”. In short, instead of caching compilation results in a complex central data store (a central library), they made the compiler front-end so fast that it was just as fast to recreate the necessary data from the original sources. You can see more information on the GNAT Wikipedia page.
Their result was a compiler that took much less time to implement, was very fast, and was generally easy to maintain. Dewar personally told me that when implementing the GNAT compiler, whenever they looked at the language specification and thought it might be complicated to implement, they assumed they didn’t understand the language specification and looked for a simpler way to do it. This mindset that “there is probably an easy way to do this” seems to have been different from others, who seemed to assume that complexity was always an inevitable part of the job.
Dewar went on to found the AdaCore, a company that over 20 years later still sells service and support for the Ada compiler and related tools that are Free/Libre/Open Source Software (aka “Freely Licensed Open Source Software”). The GNAT Ada compiler is part of the gcc toolsuite, and is included in the packages of Red Hat and Debian. Even today, the current version of the compiler he helped write produces faster code than many other compilers. And of course, Robert Dewar did other things; he was a long advocate for strong skills in software development, including knowing multiple programming languages [Dewar2008].
Robert Dewar died from cancer on June 30, 2015, as noted by both the ACM and AdaCore. The world is lesser place without him, but his ideas live on. In a way, I’m writing this essay as a tribute to him.
My point is that whenever you’re about to write some software, stop and think about how you can make it simple. Stop-and-think is especially true if some part seems complicated, works in an annoyingly different way than other systems, or is incompatible with other tools you would like to use. If you’re creating a whole separate system for managing a database or fileset, for example, see if there’s a way you could eliminate it entirely. If you have to create a special subsystem to manage adding, renaming, and deleting intermediate components, and that’s not the primary purpose of the program, check to see if there’s an easier way.
By rethinking things, you may find a simpler way that is faster to implement, and results in software that is more reliable, easier to maintain, and in all other ways better.
Unfortunately, many of the important sources are paywalled and thus inaccessible to most people, but they’re the only sources I could find. I am tempted to call paywalled documents “unpublished”, because for most people “not freely available on the Internet” has the same effect. Hopefully in the future more of these materials will become truly publicly available.
[Dewar1994] Dewar, Robert. “The GNAT compilation model”. Proceedings of the conference on TRI-Ada ‘94. Pages 58-70. ACM New York, NY, USA. 1994. ISBN:0-89791-666-2. http://dl.acm.org/citation.cfm?id=197708
[Dewar2008]). Dewar, Robert and Edmond Schonberg. “Computer Science Education: Where Are the Software Engineers of Tomorrow?” Crosstalk. 2008. http://www.crosstalkonline.org/storage/issue-archives/2008/200801/200801-Dewar.pdf
[Gasperoni2012] Thoughts on Ada and Language Technology in Our Times. Franco Gasperoni (AdaCore). 2012. http://www.ada2012.org/files/Thoughts_on_Ada.pdf
[Schonberg1994]. Edmond Schonberg and Bernard Banner. “The GNAT project: a GNU-Ada 9X compiler”. Proceedings of the conference on TRI-Ada ‘94. Pages 48-57. ACM New York, NY, USA. 1994. ISBN:0-89791-666-2. http://dl.acm.org/citation.cfm?doid=197694.197706
Feel free to see my home page at https://dwheeler.com. You may also want to look at my paper Why OSS/FS? Look at the Numbers! and my book on how to develop secure programs.
(C) Copyright 2015 David A. Wheeler. Released under Creative Commons Attribution-ShareAlike version 3.0 or later (CC-BY-SA-3.0+).