Secure Programming for Linux and Unix HOWTO
Prev	Chapter 10. Language-Specific Issues	Next

10.1. C/C++

It is possible to develop secure code using C or C++, but both languages include fundamental design decisions that make it more difficult to write secure code. C and C++ easily permit buffer overflows, force programmers to do their own memory management, and are fairly lax in their typing systems. For systems programs (such as an operating system kernel), C and C++ are fine choices. For applications, C and C++ are often over-used. Strongly consider using an even higher-level language, at least for the majority of the application. But clearly, there are many existing programs in C and C++ which won't get completely rewritten, and many developers may choose to develop in C and C++.

One of the biggest security problems with C and C++ programs is buffer overflow; see Chapter 6 for more information. C has the additional weakness of not supporting exceptions, which makes it easy to write programs that ignore critical error situations.

Another problem with C and C++ is that developers have to do their own memory management (e.g., using malloc(), alloc(), free(), new, and delete), and failing to do it correctly may result in a security flaw. The more serious problem is that programs may erroneously free memory that should not be freed (e.g., because it's already been freed). This can result in an immediate crash or be exploitable, allowing an attacker to cause arbitrary code to be executed; see [Anonymous Phrack 2001]. Some systems (such as many GNU/Linux systems) don't protect against double-freeing at all by default, and it is not clear that those systems which attempt to protect themselves are truly unsubvertable. Although I haven't seen anything written on the subject, I suspect that using the incorrect call in C++ (e.g., mixing new and malloc()) could have similar effects. For example, on March 11, 2002, it was announced that the zlib library had this problem, affecting the many programs that use it. Thus, when testing programs on GNU/Linux, you should set the environment variable MALLOC_CHECK_ to 1 or 2, and you might consider executing your program with that environment variable set with 0, 1, 2. The reason for this variable is explained in GNU/Linux malloc(3) man page:

Recent versions of Linux libc (later than 5.4.23) and GNU libc (2.x) include a malloc implementation which is tunable via environment variables. When MALLOC_CHECK_ is set, a special (less efficient) implementation is used which is designed to be tolerant against simple errors, such as double calls of free() with the same argument, or overruns of a single byte (off-by-one bugs). Not all such errors can be protected against, however, and memory leaks can result. If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently ignored; if set to 1, a diagnostic is printed on stderr; if set to 2, abort() is called immediately. This can be useful because otherwise a crash may happen much later, and the true cause for the problem is then very hard to track down.

There are various tools to deal with this, such as Electric Fence and Valgrind; see Section 11.7 for more information. If unused memory is not free'd, (e.g., using free()), that unused memory may accumulate - and if enough unused memory can accumulate, the program may stop working. As a result, the unused memory may be exploitable by attackers to create a denial of service. It's theoretically possible for attackers to cause memory to be fragmented and cause a denial of service, but usually this is a fairly impractical and low-risk attack.

Be as strict as you reasonably can when you declare types. Where you can, use ``enum'' to define enumerated values (and not just a ``char'' or ``int'' with special values). This is particularly useful for values in switch statements, where the compiler can be used to determine if all legal values have been covered. Where it's appropriate, use ``unsigned'' types if the value can't be negative.

One complication in C and C++ is that the character type ``char'' can be signed or unsigned, depending on the compiler and machine; the C standard permits either. When a signed char with its high bit set is saved in an integer, the result will be a negative number; in some cases this can be exploitable. In general, use ``unsigned char'' instead of char or signed char for buffers, pointers, and casts when dealing with character data that may have values greater than 127 (0x7f). And when compiling, try to invoke a compiler option that forces unspecified "char"s to be unsigned. Portable programs shouldn't depend on whether a char is signed or not, and by forcing it to be unsigned, the resulting executable can avoid a few nasty security vulnerabilities. In gcc, you can make this happen using the "-funsigned-char" option.

C and C++ are by definition rather lax in their type-checking support, but you can at least increase their level of checking so that some mistakes can be detected automatically. Turn on as many compiler warnings as you can and change the code to cleanly compile with them, and strictly use ANSI prototypes in separate header (.h) files to ensure that all function calls use the correct types. For C or C++ compilations using gcc, use at least the following as compilation flags (which turn on a host of warning messages) and try to eliminate all warnings (note that -O2 is used since some warnings can only be detected by the data flow analysis performed at higher optimization levels):
gcc -Wall -Wpointer-arith -Wstrict-prototypes -O2
Doc Shankar (of IBM) recommends the following set of compiler options when using gcc; it may take some effort to make existing programs conform to all these checks, but these checks can also help find a number problems:
gcc -Werror -Wall \ -Wmissing-prototypes -Wmissing-declarations \ -Wstrict-prototypes -Wpointer-arith \ -Wwrite-strings -Wcast-qual -Wcast-align \ -Wbad-function-cast \ -Wformat-security -Wformat-nonliteral \ -Wmissing-format-attribute \ -Winline
You might want ``-W -pedantic'' too. Remember to add the "-funsigned-char" option to this set.

Many C/C++ compilers can detect inaccurate format strings. For example, gcc can warn about inaccurate format strings for functions you create if you use its __attribute__() facility (a C extension) to mark such functions, and you can use that facility without making your code non-portable. Here is an example of what you'd put in your header (.h) file:
/* in header.h */ #ifndef __GNUC__ # define __attribute__(x) /*nothing*/ #endif extern void logprintf(const char *format, ...) __attribute__((format(printf,1,2))); extern void logprintva(const char *format, va_list args) __attribute__((format(printf,1,0)));
The "format" attribute takes either "printf" or "scanf", and the numbers that follow are the parameter number of the format string and the first variadic parameter (respectively). The GNU docs talk about this well. Note that there are other __attribute__ facilities as well, such as "noreturn" and "const".

Avoid common errors made by C/C++ developers. Using warning systems and style checkers can help avoid common errors. For example, be careful about not using ``='' when you mean ``==''. The gcc compiler's -Wall option, recommended above, turns on a -Wparenthesis option. This -Wparenthesis option warns you when incorrectly use "=", and requires adding extra parentheses if you really mean to use "=").

Some organizations have defined a subset of a well-known language to try to make common mistakes in it either impossible or more obvious. One better-known subset of C is the MISRA C guidelines [MISRA 1998]. If you intend to use a subset, it's wise to use automated tools to check if you've actually used only a subset. There's a proprietary tool called Safer C that checks code to see if it meets most of the MISRA C requirements (it's not quite 100%, because some MISRA C requirements are difficult to check automatically).

Other approaches include building many more safety checks into the language, or changing the language itself into a variant dialect that is hopefully easier to write secure programs in. have not had any experience using them: The Safe C Compiler (SCC) is a C-to-C compiler that adds extended pointer and array access semantics to automatically detect memory access errors. Its front page and talk provide interesting information, but its distribution appears limited as of 2004. Cyclone is a variant of C with far more "compile-time, link-time, and run-time checks designed to ensure safety" (where they define safe as free of crashes, buffer overflows, format string attacks, and some other problems). At this point you're really starting to use a different (though similar) language, and you should carefully decide on a language before its use.