Improving make

David A. Wheeler

2014-10-27 (original 2014-10-21)

This article describes how to improve the standard and the implementations of make, and some of the progress already made. Make is a widely-used software build tool, but the POSIX standard that covers it lacks key functions. As a result, make is difficult to use portably in many common cases. This article is primarily inspired by Peter Miller’s paper Recursive Make Considered Harmful (1997). This article is intended for those who already understand make and software build processes.

Introduction

The make tool is a very widely-used build tool for software development. The make tool was first created by Stuart Feldman in April 1976 at Bell Labs, and was originally described in “Make - A Program for Maintaining Computer Programs” by Stuart I. Feldman (Software - Practice and Experience, Vol. 9, 255-265, 1979). In 2003 Dr. Feldman received the ACM Software System Award for creating make because it is such an important tool. There are certainly other build tools including Apache Ant (often used for Java), Apache Maven (also often used for Java as long as you can accept its preset structure), Gradle (which uses its own language, Groovy, for actions), Rake, Scons (a Python-based system), ASDF 3 (for Common Lisp), and Jake. There are other tools that layer themselves on top of make, such as Cmake and automake (automake is part of the autotools). Wikipedia even maintains a list of build automation software. However, for a variety of reasons, make is still very widely used directly, especially when you’re writing code where performance really matters (such as most C and C++ code). Make is standardized (it’s part of the POSIX standard), widely available, and widely understood. Make is also baked into a huge number of larger systems. Yes, make has some weirdnesses (in particular the standard requires tabs in certain places), but it is still a handy tool. Recent articles like Build Tools - Make, no more, Using GNU Make as a Front-end Development Build Tool, and The Ultimate Frontend Build Tool: make all state that make is often a very useful tool, even today. Improving the standards for make also helps tools that layer on top of it, including automake and Cmake.

The #1 Programmer Excuse for Legitimately Slacking off: 'My code's compiling'.  From XKCD.com #303, Randall Munroe, CC-BY-NC 2.5

For example, most make implementations support parallel compilation, so using make typically means that the parallel capabilities of the underlying machine are easily used. This saves a lot of developer time. On one system I measured a from-scratch build of the Linux kernel version 3.10.5 using make; it took 166.85 minutes with 1 CPU and 28.5 minutes (optimizable to 23.86) with 16 CPUs (Parallel Compilation on Virtual Machines in a Development Cloud Environment by David A. Wheeler, IDA Document D-4996, September 2013). That saves about two hours each time you compile, and many developers constantly do a small edit followed by a compile! This speedup through parallelism is enabled by exploiting dependency information, information that make is designed to support. Compilations are typically faster because not everything needs to be done, but again, tools like make can quickly determine what can be skipped, resulting in even faster results in practice.

However, the POSIX standard version of make is extraordinarily feature-poor; it lacks many basic functions that people need in practice. As a result, most people create makefiles (the input files to make) that require specific implementations instead of just following the POSIX standard. Typically that specific implementation is GNU make; since GNU make runs on nearly every Unix-like system, and is FLOSS, this is an easy requirement to meet. GNU make is a great tool, but it’s absurd that even common uses of make essentially require non-standard extensions and cannot be used on most other make implementations. One of GNU make author’s rules of make is “Use GNU make. Don’t hassle with writing portable makefiles, use a portable make instead!”. Today many people do follow this advice, because using only portable make can be painful even for common cases. GNU make includes a lot of excellent functionality, and if you need its advanced features, enjoy! But many people do not need advanced features. Today there are oft-needed capabilities lacking in the standard, and they are often available only in syntactically-incompatible ways. I would like to see the POSIX standard for make extended, and widely implemented, so that in typical cases people who want to use make can stick to a standard portable subset that easily handles common cases.

Peter Miller’s paper Recursive Make Considered Harmful (1997) is rightfully considered a really important work about software development. In that paper he notes that the common way of using make is a bad idea, and that developers should use make in a non-recursive way instead. Peter Miller’s paper and the related paper Implementing non-recursive make (by Emile van Bergen) discuss how to implement efficient makefiles. Sadly, Peter Miller died on 2014-07-27, but his good ideas live on. I’ve been trying to get the POSIX standard extended so that standard make is rich enough to use in typical cases, and in particular, that it properly supports non-recursive make. If something is too hard to do correctly, then people won’t do it correctly. In particular, if something commonly done is hard to do portably, then people won’t do it portably.

Specific improvements

So here are a few extensions to make, and their status in both the POSIX specification and in implementations. In most cases at this time they are proposed improvements to the standard make.

Immediate evaluation, not just deferred evaluation

Peter Miller noted the need for immediate evaluation (sections 5.1 and 5.2). Standard “variables” in make are not variables at all, but macros. Every macro reference must be transitively evaluated, leading to an exponential growth in execution time as projects get large... and today’s projects are often very large. What’s more, this transitive evaluation is rarely useful. In most cases people using make want immediate evaluation of variables, not deferred evaluation; immediate evaluation is more familiar to most developers and does not lead to exponentially-growing overhead.

I proposed fixing this by adding immediate evaluation support to the POSIX specification as bug#330. I am happy to report that the POSIX committee accepted this proposal and has added support for immediate evaluation. (This change also added the widely-used “+=” and “?=” assignment operators, which is more good news.) POSIX uses the syntax “::=”, which eliminates the exponential growth in time and does not interfere with the syntax of existing systems. GNU make has added support for the POSIX syntax, so users of GNU make can use the standard POSIX syntax today (hooray!).

I’m delighted by this addition. Software systems are getting bigger and bigger; this capability eliminates a common reason that build times grew exponentially.

However, we are not done. People who control other make implementations, such as the *BSDs, need to add support for the syntax to their make implementation so that this capability is truly portable. (The *BSDs have a somewhat-similar functionality using “:=”, but they do not currently support the POSIX standard syntax and semantics.) Tools like automake should support it as well in some way. This capability will only be truly portable if it is widely implemented, and people are more likely to use this important capability if it is portable.

Shell invocation and function calls

Peter Miller’s recommendations relied on GNU make’s function call notation. Standard makefiles have no way to do function calls anywhere macro substitutions are allowed, nor do they provide a way to execute shell programs and then put the result in a macro variable. This makes it hard to create makefiles that easily adjust to their environment. For example, here’s part of section 5.2 showing the recommended rewrite (using “:=” instead of the POSIX “::=”):

SRC := $(shell echo ’Ouch!’ \
  1>&2 ; echo *.[cy])
OBJ := \
  $(patsubst %.c,%.o,\
  $(filter %.c,$(SRC))) \
  $(patsubst %.y,%.o,\
  $(filter %.y,$(SRC)))

I think two mechanisms need to be added to standard POSIX make. We need both a simple mechanism for shell invocations, as well as a general function call notation:

  1. make should support a simple mechanism for shell invocation. Shell invocation is so common that there should be a simple, easily-read mechanism specifically for it. In particular, shell invocations can involve complicated expressions; having a mechanism that doesn’t conflict with a character like a closing parenthesis reduces error and increases readability. POSIX bug report #337 from me proposes adding standard support for “!=” (macro shell assignment). The “=” hints that “this is an assignment”, the “!” hints that “this is a shell”. At the time of this writing this proposal has neither been accepted nor rejected by the POSIX committee, but I am hopeful it will be accepted. This capability has been present in the *BSDs for many, many years. GNU make added support for “!=” in 2011 (I wrote the patch). As a result, if you need shell invocation to work relatively portably, “!=” is the closest available today (it works on GNU make and the various *BSD makes). This is great, because it creates a simple mechanism for doing a common thing. For example, the SRC line above from Peter Miller’s paper could be written simply as:
         SRC != echo 'Ouch!' 1>&2 ; echo *.[cy]
    
  2. make should support macro function calls for the general case where you need a more flexible way to invoke functions within make and possibly combining them with results from calls to external programs (like the shell). POSIX bug #512 by me proposes adding them, with a small set of common ones such as patsubst (pattern substitution).

In theory you could get the necessary capability with only one or the other, but I think that kind of minimal support is a bad idea. Most make implementations support both “!=” (for easy shell invocation) and some sort of function call or mini-language system. Each syntax is better in different situations, and we want it to be easy to write portable makefiles.

Automatic dependencies

Make works well once you tell it the dependencies; it uses the dependency information to figure out what needs to be done, and in what order. However, in many cases you want dependency generation automatically generated.

It is easy to do automatic dependency determination with make, but you have to use various non-standard extensions to completely get there. That said, various changes in the POSIX specification of make have resulted in this being much more portable than it used to be, and I hope for further improvements in the future.

This section first discusses current approaches for automatic dependency handling. It then discusses the two key required functions necessary for it: lists of possibly non-existent include files, and compiler-generated dependency information. This section ends with a few other comments.

Current approaches for automatic dependencies

Historically a common way to do automatic dependency generation was to have a special target called “depend”, and then run “make depend” to compute it. As noted in Advanced Auto-Dependency Generation, this is a simple approach but it has serious problems. Dependencies typically go out-of-date (because developers forget to re-run make depend), and this inefficiently requires rechecking of many files that have not changed. It also creates an extra step for later users. We can do better; I do not recommend using a “make depend” approach today.

There is a related issue: in many cases “make depend” would invoke a separate program, such as makedepend, that would parse the source code (typically C) and generate the dependency information. If you use a separate tool (like makedepend) other than the compiler to determine dependencies, there is always the risk that they will compute different dependencies. This difference can create subtle problems. Thus, where possible, it is best if the compiler actually being used reports the dependencies (since it has the true information).

Current recommended approaches for automatic dependency generation are discussed in places such as history of automake, the GNU make manual, Miller’s paper section 5.4, Implementing non-recursive make (by Emile van Bergen), and Advanced Auto-Dependency Generation. Tom Tromey developed the overall approach, which is simple and ingenious. Basically, have the compiler generate dependency information when it is run, in a make-compatible format. Where possible, generating this information should be a side-effect of compilation. Each compilation unit should produce a separate file of dependency information. This dependency information, as stored in all these files, is then “included” by the makefile. See the citations for more details if you’re curious.

For this approach to work:

  1. We need make to load many include files, without failing if the files do not (yet) exist. We need this mechanism so that make can load the dependencies that are automatically generated during a compilation. If no automatic dependency information is available, we do not need to know more; by definition we need to compile it! This avoids re-execution of make using “make depend”.
  2. We need to actually generate the dependency information that make will read. The compiler should do this, really. If possible, it should usually generate dependencies even for files that are probed but do not exist; that way, if the file is added later it will be automatically detected and used.
  3. All generated prerequisites need to be listed as targets with no commands or prerequisites; otherwise, removing or renaming prerequisite will cause make fail with “No rule to make target...” errors. This can be easily done by the compiler or by postprocessing whatever generates the dependency information. Note that we don’t need to change make for this to work.

Thus, we need to add two basic capabilities to the older POSIX standards: make must be able to include a list of include files (some of which might not exist), and we need to convince compilers to provide this dependency information that make will use. The next two subsections discuss each one.

Lists of possibly non-existent include files

The POSIX specification for make has long included a standard method for including files. However, originally POSIX was not quite powerful enough to support automatic dependency generation. For years I’ve been working with the POSIX standards committee to address this. I am happy to report that these issues have been fixed in the POSIX specification, as I describe here.

The original make include mechanism had a key weakness: it would cause make to fail if the file didn’t exist. This is a problem when are trying to use make to create that file in the first place, which is always the case for dependency information generated through compilation. I proposed POSIX bug#333 to support “silent include” as “-include”; this is like include, but does not cause a crash if the file does not exist. This change has been accepted by the POSIX committee, and both GNU make and NetBSD make already implement it. I do not know how many other make implementations currently support it; I hope that those that lack it will soon add it.

I also proposed, in POSIX bug #518, that POSIX require make implementations to support multiple files in its “include” directive. The include mechanism is key for automatic dependency handling; many compilers can generate dependency information, but the makefile needs to easily read that information from many files (with one dependency file for every source file). Without this, you would have to regenerate a single large dependency file, which would slow down compilation for no reason. As a result, it becomes much easier to create a list of include files to include (they can be automatically determined from the list of files to compile). This proposal has also been accepted by the POSIX committee, and I know that GNU make (at least) supports this.

The result: standard make now has the functions necessary for including dependency information that is automatically generated, and that is a key requirement for automatic dependency handling. However, we also need to automatically generate the information that make will read.

Compiler-generated dependency information

In many common cases tools should be able to automatically determine the dependencies, typically as a side-effect of compilation. Most compilers (e.g., for C and C++) can provide this information, though there is no standard way to request it. For example, the GNU C compiler has various flags that generate dependency information, including “-M”, “-MM”, “-MF”, “-MG”, “-MT”, “-MP”, “-MQ”, “-MD”, and “-MMD”. The LLVM/clang compiler also implements some of these. Automake (part of autotools) includes a tool, depcomp, that automatically determines the compiler flags to make the compiler generate dependency information, and directly generates optimized makefiles (for speed) when using gcc version 3 or later. As an example, here is a snippet of an automake-generated makefile when using gcc version 3 or greater:

    include src/$(DEPDIR)/hello.Po

    .c.o:
            depbase=`echo $@ | sed 's|[^/]*$$|$(DEPDIR)/&|;s|\.o$$||'`;\
            $(COMPILE) -MT $@ -MD -MP -MF $$depbase.Tpo -c -o $@ $< &&\
            $(am__mv) $$depbase.Tpo $$depbase.Po

Sadly, there is no standard way to request that a compiler do this. POSIX bug#328 by a bird-loving participant has proposed that broader automatic dependency functionality be added to make. This would require some way of standardizing how to ask compilers to generate this information. Back in 2011 I was more bullish about using a separate makedepend tool, because that would be easier to standardize, but today I have more concerns about subtle inconsistencies between a separate tool and the compiler.

It is not easy to standardize this capability, for a variety of reasons. In particular, turns out to be hard to standardize compiler option flags; it might be easier to standardize new environment variable values. I think it would be great to find standards for common cases if we can.

Automatic dependencies today

If you need this functionality immediately, most people would use non-standard extensions or a layered tool (often automake or CMake) to implement at least the part that generates the dependency information. It only takes a few lines in common cases (e.g., if you only use gcc).

As noted earlier, automake implements automatic dependencies, through a few scripts on top of make. The GNU make manual explains how to add a few makefile lines that implement automatic dependency generation (what they call automatic prerequisites). GCC has flags to generate dependency information (e.g., -MD and -MF), and some other compilers (notably clang) support them. Since these are very widely used, many people have already solved this by requiring a few GCC extensions or by using automake.

But this should not be necessary; it should be easy to portably have automatic dependency generation for common cases just with the POSIX standard and have it work everywhere.

Variable indirection

A macro reference (aka a variable reference) is easy in make. Whenever you want to refer to the contents of FOO, you can just write $(FOO). A variable indirection occurs whenever a variable name itself includes another variable reference, e.g., $(FOO_$(d)).

Variable indirection is surprisingly useful in make. In particular, they are useful the same way that associative arrays are useful in many other languages. For example, they allow for easy selection between sets of values, such as selecting different file sets for different platforms.

Implementing non-recursive make shows a general approach for having Makefile fragments in each directory while using a non-recursive make as described by Peter Miller. His approach tweaks Peter Miller’s general approach by automatically handling directory names. His approach depends on variable indirection, using macro references such as $(TGTS_$(d)).

Variable indirection is not currently part of the POSIX standard, but I have proposed it as POSIX bug #336. As of this writing it has neither been accepted nor rejected, but I have high hopes. Eric Blake reports that automake would like this standardized as well.

Eric Blake also reports that a huge number of make implementations already support this in practice. These includes not only GNU make, but also those in NetBSD, FreeBSD, OpenBSD, and even the old ones in IRIX 6.5, AIX 4.3.3, Tru64 4.0D, Solaris 2.6, and HP-UX 10.20. Variable indirection is typically very easy to implement if you can find a make implementation that doesn’t already support it. In my mind, the POSIX standard should bless what is already common practice.

Conditionals

Practically all make implementations include an if-then-else conditional, generally of two kinds: an “is-defined” conditional (e.g., “ifdef”, “.ifdef”, or “#ifdef”) and a more general conditional (e.g., “if”, “ifeq”, or “#if”). However, there is no standard way to invoke it! Conditionals are important to make it easy to do different things depending on the platform. My POSIX bug#805 proposal recommends adding this functionality to the standard.

Again, if you need this functionality immediately, most people would use a specific make implementation (typically GNU make) or a layered tool (often automake). But that should not be necessary.

Pattern rules

Many make users will be surprised to know that pattern rules like “%.o: %.c” are not in the current POSIX standard. My proposal POSIX bug #513 recommends adding this widely-used extension to the standard.

Pattern substitution

I do not want to give the impression that POSIX has little and is unwilling to fix it. In POSIX bug #519 I proposed adding support for pattern substitutions, e.g., “$(foo:%.o=%.c)”. These are already widely-supported, they just weren’t in the standard. After deliberation, this was accepted by the POSIX committee.

VPATH (Virtual PATH)

Miller’s original paper discussed using virtual paths (VPATH builds) in section 6 of his paper. I proposed virtual paths (VPATHs) to be standardized in POSIX bug #766. GNU make, for example, supports VPATH builds, and automake works hard to support its version of VPATH builds.

However, a lack of consensus for any particular VPATH semantic led to rejection of this proposal by the POSIX committee. I am disappointed, but this is not such a serious problem. Modern systems typically have lots of storage space, reducing the need for VPATH support in the first place. You can always make another copy of the files every time you want to use them, and if that’s not desired, various tricks with hard and soft links also make the lack of standard VPATH support less of a problem. In practice, most people download their own personal copy of a program to work on (e.g., using git). Indeed, the design of git specifically assumes that storage space is not a problem, and it is widely used. Since the need for VPATH support is less pressing today, I have not tried to pursue VPATH standardization further.

Miscellaneous proposals

There are various other miscellaneous functions that are widely used or useful, but have not been in older POSIX standards.

I proposed POSIX bug#523 to add support for .PHONY. This functionality was supported in at least GNU make, NetBSD make, FreeBSD make, OpenBSD make, and fastmake... but it was not in the POSIX standard. Thankfully, this has been accepted by the POSIX committee.

I also proposed POSIX bug#514 for enhancing internal macros. Currently “$<” has odd limitations, and the POSIX standard lacks the common extensions “$^” and “$+”. At the time of this writing these are still under discussion.

Improvements in the POSIX standard for other tools can also help make, since make is typically used by calling out to other tools. I am happy to report there has been some progress there too. For example, I proposed that POSIX should be modified to support extended regular expressions (EREs) in sed (bug #528). This has been accepted in POSIX, it is already supported by GNU and FreeBSD, and I submitted patches for busybox to do this that have also been accepted. I also submitted a proposal to add an “ignore case” flag to sed’s “s” command in POSIX.

Of course, I'm most aware of proposals I made. There are other proposals for the specification or implementations that should help as well, and I welcome them.

Non-proposals

I have not tried to address how to modify make so it can fully deal with “unusual” filenames such as those including spaces or shell metacharacters like “&”. I also know of no one who has proposed a standard way to do it. It would be useful for make to be able to handle these cases. Sadly, changing make to deal with them is challenging. Supporting spaces in filenames turns out to be particularly difficult, because nearly all makefiles and related tools use spaces as separators between filenames. Escape mechanisms have their own problem, especially when dealing with Windows (where the backslash character is a directory separator). The GNU make developers are interested in dealing with spaces in filenames, but this effort is non-trivial. In practice, software development simultaneously develops code and build environment changes, so it is usually easy to limit filenames for software development to only characters that do not cause problems (such as Latin alphanumerics, periods, underscores, non-leading dashes, and commas). Thus, while I think this should be fixed, it’s difficult to fix, and there’s typically no pressing need to fix it.

I also have not tried to simplify handling of make metacharacters in filenames like $. It is possible to handle them in make, but in some cases it is painful. The typical advice, though, is to just not $ in filenames processed by make. Again, there’s typically no pressing need to fix it.

I certainly have not tried to help make deal with the absurdly permissive Unix/Linux/POSIX filename rules such as allowing newline, tab, leading dash, and non-UTF-8 filenames. For these cases I think it would be better for Unix-like systems to forbid filenames from having constructs such as control characters at all, as proposed in POSIX bug#251. After all, common patterns in shell do not work with all Unix/Linux/POSIX filenames; it can be hard to write shell scripts to fully handle them, and make actions use shell.

Conclusions

If you are interested in this area, encourage the POSIX committee to accept these proposals (or improve them), and get them implemented in various make implementations if they are not already in place. I think they are doing an important job. Of course, building software is only one part of the issue; if you are worried about countering attacks on the development and build environment, see my work on countering the trusting trust attack and related work on reproduceable (deterministic) builds.

Not all programs use make, but it is a widely-used tool. The list of proposals does not mean that these are the best possible ways to improve make, or only possible options. However, I think we should update our tools so they are easy and pleasant to use, especially when wise heads like Peter Miller have identified problems with the tools.

This paper is dedicated to the memory of Peter Miller, who died in 2014. Miller’s paper on recursive make has helped millions of developers around the world. Miller’s version control system Aegis inspired Monotone, which then inspired git and mercurial; I don’t think we would have tools like git without Aegis leading the way. Miller’s work on gettext() led to FLOSS support of people around the world in their native language. In short, we are all better off because Peter Miller was here. I am happy to note that he hoped to live to see his son’s graduation in June 2014, and he succeeded; he managed a post in July 2014, and he died on 2014-07-27. Peter: I did not know you well, but I know you will be missed.


Feel free to see my home page at https://dwheeler.com. You may also want to look at my paper Why OSS/FS? Look at the Numbers! and my book on how to develop secure programs.

(C) Copyright 2014 David A. Wheeler.