SLOCCount (pronounced "sloc-count") is a suite of programs for counting physical source lines of code (SLOC) in potentially large software systems. Thus, SLOCCount is a "software metrics tool" or "software measurement tool". SLOCCount was developed by David A. Wheeler, originally to count SLOC in a GNU/Linux distribution, but it can be used for counting the SLOC of arbitrary software systems.
SLOCCount is known to work on Linux systems, and has been tested on Red Hat Linux versions 6.2, 7, and 7.1. SLOCCount should run on many other Unix-like systems (if Perl is installed), in particular, I would expect a *BSD system to work well. Windows users can run sloccount by first installing Cygwin. SLOCCount is much slower on Windows/Cygwin, and it's not as easy to install or use on Windows, but it works. Of course, feel free to upgrade to an open source Unix-like system (such as Linux or *BSD) instead :-).
SLOCCount can count physical SLOC for a wide number of languages. Listed alphabetically, they are Ada, Assembly (for many machines and assemblers), awk (including gawk and nawk), Bourne shell (and relatives such as bash, ksh, zsh, and pdksh), C, C++, C# (also called C-sharp or cs), C shell (including tcsh), COBOL, Expect, Fortran (including Fortran 90), Haskell, Java, lex (including flex), LISP (including Scheme), makefiles (though they aren't usually shown in final reports), Modula3, Objective-C, Pascal, Perl, PHP, Python, Ruby, sed, SQL (normally not shown), TCL, and Yacc. It can gracefully handle awkward situations in many languages, for example, it can determine the syntax used in different assembly language files and adjust appropriately, it knows about Python's use of string constants as comments, and it can handle various Perl oddities (e.g., perlpods, here documents, and Perl's _ _END_ _ marker). It even has a "generic" SLOC counter that you may be able to use count the SLOC of other languages (depending on the language's syntax).
SLOCCount can also take a large list of files and automatically categorize them using a number of different heuristics. The heuristics automatically determine if a file is a source code file or not, and if so, which language it's written in. For example, it knows that ".pc" is usually a C source file for an Oracle preprocessor, but it can detect many circumstances where it's actually a file about a "PC" (personal computer). For another example, it knows that ".m" is the standard extension for Objective-C, but it will check the file contents to see if really is Objective-C. It will even examine file headers to attempt to accurately determine the file's true type. As a result, you can analyze large systems completely automatically.
Finally, SLOCCount has some report-generating tools to collect the data generated, and then present it in several different formats and sorted different ways. The report-generating tool can also generate simple tab-separated files so data can be passed on to other analysis tools (such as spreadsheets and database systems).
SLOCCount will try to quickly estimate development time and effort given only the lines of code it computes, using the original Basic COCOMO model. This estimate can be improved if you can give more information about the project. See the discussion below about COCOMO, including intermediate COCOMO, if you want to improve the estimates by giving additional information about the project.
SLOCCount is open source software/free software (OSS/FS), released under the GNU General Public License (GPL), version 2; see the license below. The master web site for SLOCCount is http://www.dwheeler.com/sloccount. You can learn a lot about SLOCCount by reading the paper that caused its creation, available at http://www.dwheeler.com/sloc. Feel free to see my master web site at http://www.dwheeler.com, which has other material such as the Secure Programming for Linux and Unix HOWTO, my list of OSS/FS references, and my paper Why OSS/FS? Look at the Numbers! Please send improvements by email to dwheeler, at, dwheeler.com (DO NOT SEND SPAM - please remove the commas, remove the spaces, and change the word "at" into the at symbol).
The following sections first give a "quick start" (discussing how to use SLOCCount once it's installed), discuss basic SLOCCount concepts, how to install it, how to set your PATH, how to install source code on RPM-based systems if you wish, and more information on how to use the "sloccount" front-end. This is followed by material for advanced users: how to use SLOCCount tools individually (for when you want more control than the "sloccount" tool gives you), designer's notes, the definition of SLOC, and miscellaneous notes. The last sections states the license used (GPL) and gives hints on how to submit changes to SLOCCount (if you decide to make changes to the program).
Once you've installed SLOCCount (discussed below), you can measure an arbitrary program by typing everything after the dollar sign into a terminal session:
$ sloccount topmost-source-code-directory
The directory listed and all its descendants will be examined. You'll see output while it calculates, culminating with physical SLOC totals and estimates of development time, schedule, and cost. If the directory contains a set of directories, each of which is a different project developed independently, use the "--multiproject" option so the effort estimations can correctly take this into account.
You can redisplay the data different ways by using the "--cached" option, which skips the calculation stage and re-prints previously computed information. You can use other options to control what's displayed: "--filecount" shows counts of files instead of SLOC, and "--details" shows the detailed information about every source code file. So, to display all the details of every file once you've previously calculated the results, just type:
sloccount --cached --details
You'll notice that the default output ends with a request. If you use this data (e.g., in a report), please credit that data as being "generated using 'SLOCCount' by David A. Wheeler." I make no money from this program, so at least please give me some credit.
SLOCCount tries to ignore all automatically generated files, but its heuristics to detect this are necessarily imperfect (after all, even humans sometimes have trouble determining if a file was automatically genenerated). If possible, try to clean out automatically generated files from the source directories -- in many situations "make clean" does this.
There's more to SLOCCount than this, but first we'll need to explain some basic concepts, then we'll discuss other options and advanced uses of SLOCCount.
SLOCCount counts physical SLOC, also called "non-blank, non-comment lines". More formally, physical SLOC is defined as follows: ``a physical source line of code (SLOC) is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.'' Comment delimiters (characters other than newlines starting and ending a comment) are considered comment characters. Data lines only including whitespace (e.g., lines with only tabs and spaces in multiline strings) are not included.
In SLOCCount, there are 3 different directories:
SLOCCount can handle many different programming languages, and separate them by type (so you can compare the use of each). Here is the set of languages, sorted alphabetically; common filename extensions are in parentheses, with SLOCCount's ``standard name'' for the language listed in brackets:
Obviously, before using SLOCCount you'll need to install it. SLOCCount depends on other programs, in particular perl, bash, a C compiler (gcc will do), and md5sum (you can get a useful md5sum program in the ``textutils'' package on many Unix-like systems), so you'll need to get them installed if they aren't already.
If your system uses RPM version 4 or greater to install software (e.g., Red Hat Linux 7 or later), just download the SLOCCount RPM and install it using a normal installation command; from the text line you can use:
rpm -Uvh sloccount*.rpm
Everyone else will need to install from a tar file, and Windows users will have to install Cygwin before installing sloccount.
If you're using Windows, you'll need to first install Cygwin. By installing Cygwin, you'll install an environment and a set of open source Unix-like tools. Cygwin essentially creates a Unix-like environment in which sloccount can run. You may be able to run parts of sloccount without Cygwin, in particular, the perl programs should run in the Windows port of Perl, but you're on your own - many of the sloccount components expect a Unix-like environment. If you want to install Cygwin, go to the Cygwin main page and install it. If you're using Cygwin, install it to use Unix newlines, not DOS newlines - DOS newlines will cause odd errors in SLOCCount (and probably other programs, too). I have only tested a "full" Cygwin installation, so I suggest installing everything. If you're short on disk space, at least install binutils, bash, fileutils, findutils, gcc, grep, gzip, make, man, perl, readline, sed, sh-utils, tar, textutils, unzip, and zlib; you should probably install vim as well, and there may be other dependencies as well. By default Cygwin will create a directory C:\cygwin\home\NAME, and will set up the ability to run Unix programs (which will think that the same directory is called /home/NAME). Now double-click on the Cygwin icon, or select from the Start menu the selection Programs / Cygnus Solutions / Cygwin Bash shell; you'll see a terminal screen with a Unix-like interface. Now follow the instructions (next) for tar file users.
If you're installing from the tar file, download the file (into your home directory is fine). Unpacking the file will create a subdirectory, so if you want the unpacked subdirectory to go somewhere special, "cd" to where you want it to go. Most likely, your home directory is just fine. Now gunzip and untar SLOCCount (the * replaces the version #) by typing this at a terminal session:
gunzip -c sloccount*.tar.gz | tar xvf -Replace "sloccount*.tar.gz" shown above with the full path of the downloaded file, wherever that is. You've now created the "bin directory", which is simply the "sloccount-VERSION" subdirectory created by the tar command (where VERSION is the version number).
Now you need to compile the few compiled programs in the "bin directory" so SLOCCount will be ready to go. First, cd into the newly-created bin directory, by typing:
cd sloccount*
You may then need to override some installation settings. You can can do this by editing the supplied makefile, or alternatively, by providing options to "make" whenever you run make. The supplied makefile assumes your C compiler is named "gcc", which is true for most Linux systems, *BSD systems, and Windows systems using Cygwin. If this isn't true, you'll need to set the "CC" variable to the correct value (e.g., "cc"). You can also modify where the files are stored; this variable is called PREFIX and its default is /usr/local (older versions of sloccount defaulted to /usr).
If you're using Windows and Cygwin, you must override one of the installation settings, EXE_SUFFIX, for installation to work correctly. One way to set this value is to edit the "makefile" file so that the line beginning with "EXE_SUFFIX" reads as follows:
EXE_SUFFIX=.exeIf you're using Cygwin and you choose to modify the "makefile", you can use any text editor on the Cygwin side, or you can use a Windows text editor if it can read and write Unix-formatted text files. Cygwin users are free to use vim, for example. If you're installing into your home directory and using the default locations, Windows text editors will see the makefile as file C:\cygwin\home\NAME\sloccount-VERSION\makefile. Note that the Windows "Notepad" application doesn't work well, because it's not able to handle Unix text files correctly. Since this can be quite a pain, Cygus users may instead decide to override make the makefile values instead during installation.
Finally, compile the few compiled programs in it by typing "make":
makeIf you didn't edit the makefile in the previous step, you need to provide options to make invocations to set the correct values. This is done by simply saying (after "make") the name of the variable, an equal sign, and its correct value. Thus, to compile the program on a Windows system using Cygus, you can skip modifying the makefile file by typing this instead of just "make":
make EXE_SUFFIX=.exe
If you want, you can install sloccount for system-wide use without using the RPM version. Windows users using Cygwin should probably do this, particularly if they chose a "local" installation. To do this, first log in as root (Cygwin users don't need to do this for local installation). Edit the makefile to match your system's conventions, if necessary, and then type "make install":
make installIf you need to set some make options, remember to do that here too. If you use "make install", you can uninstall it later using "make uninstall". Installing sloccount for system-wide use is optional; SLOCCount works without a system-wide installation. However, if you don't install sloccount system-wide, you'll need to set up your PATH variable; see the section on setting your path.
A note for Cygwin users (and some others): some systems, including Cygwin, don't set up the environment quite right and thus can't display the manual pages as installed. The problem is that they forget to search /usr/local/share/man for manual pages. If you want to read the installed manual pages, type this into a Bourne-like shell:
MANPATH=/usr/local/share/man:/usr/share/man:/usr/man export MANPATHOr, if you use a C shell:
setenv MANPATH "/usr/local/share/man:/usr/share/man:/usr/man"From then on, you'll be able to view the reference manual pages by typing "man sloccount" (or by using whatever manual page display system you prefer).
Obviously, you must install the software source code you're counting, so somehow you must create the "source directory" with the source code to measure. You must also make sure that permissions are set so the software can read these directories and files.
For example, if you're trying to count the SLOC for an RPM-based Linux system, install the software source code by doing the following as root (which will place all source code into the source directory /usr/src/redhat/BUILD):
mount /mnt/cdrom cd /mnt/cdrom/SRPMS rpm -ivh *.src.rpm
cd ../SPECS (look in contents of spec files, removing what you don't want)
rpm -bp *.spec
chmod -R a+rX /usr/src/redhat/BUILD
Here's an example of how to download source code from an anonymous CVS server. Let's say you want to examine the source code in GNOME's "gnome-core" directory, as stored at the CVS server "anoncvs.gnome.org". Here's how you'd do that:
export CVSROOT=':pserver:anonymous@anoncvs.gnome.org:/cvs/gnome'
cvs login
cvs -z3 checkout gnome-core
Of course, if you have a non-anonymous account, you'd set CVSROOT to reflect this. For example, to log in using the "pserver" protocol as ACCOUNT_NAME, do:
export CVSROOT=':pserver:ACCOUNT_NAME@cvs.gnome.org:/cvs/gnome'
You may need root privileges to install the source code and to give another user permission to read it, but please avoid running the sloccount program as root. Although I know of no specific reason this would be a problem, running any program as root turns off helpful safeguards.
Although SLOCCount tries to detect (and ignore) many cases where programs are automatically generated, these heuristics are necessarily imperfect. So, please don't run any programs that generate other programs - just do enough to get the source code prepared for counting. In general you shouldn't run "make" on the source code, and if you have, consider running "make clean" or "make really_clean" on the source code first. It often doesn't make any difference, but identifying those circumstances is difficult.
SLOCCount will not automatically uncompress files that are compressed/archive files (such as .zip, .tar, or .tgz files). Often such files are just "left over" old versions or files that you're already counting. If you want to count the contents of compressed files, uncompress them first.
SLOCCount also doesn't delve into files using "literate programming" techniques, in part because there are too many incompatible formats that implement it. Thus, run the tools to extract the code from the literate programming files before running SLOCCount. Currently, the only exception to this rule is Haskell.
Otherwise, in Bourne-shell variants, type:
PATH="$PATH:the directory with SLOCCount's executable files" export PATHCsh users should instead type:
setenv PATH "$PATH:the directory with SLOCCount's executable files"
You can also give "sloccount" a list of directories, in which case the report will be broken down by these directories (make sure that the basenames of these directories differ). SLOCCount normally considers all descendants of these directories, though unless told otherwise it ignores symbolic links.
This is all easier to explain by example. Let's say that we want to measure Apache 1.3.12 as installed using an RPM. Once it's installed, we just type:
sloccount /usr/src/redhat/BUILD/apache_1.3.12The output we'll see shows status reports while it analyzes things, and then it prints out:
SLOC Directory SLOC-by-Language (Sorted) 24728 src_modules ansic=24728 19067 src_main ansic=19067 8011 src_lib ansic=8011 5501 src_os ansic=5340,sh=106,cpp=55 3886 src_support ansic=2046,perl=1712,sh=128 3823 src_top_dir sh=3812,ansic=11 3788 src_include ansic=3788 3469 src_regex ansic=3407,sh=62 2783 src_ap ansic=2783 1378 src_helpers sh=1345,perl=23,ansic=10 1304 top_dir sh=1304 104 htdocs perl=104 31 cgi-bin sh=24,perl=7 0 icons (none) 0 conf (none) 0 logs (none) ansic: 69191 (88.85%) sh: 6781 (8.71%) perl: 1846 (2.37%) cpp: 55 (0.07%) Total Physical Source Lines of Code (SLOC) = 77873 Estimated Development Effort in Person-Years (Person-Months) = 19.36 (232.36) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Estimated Schedule in Years (Months) = 1.65 (19.82) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 11.72 Total Estimated Cost to Develop = $ 2615760 (average salary = $56286/year, overhead = 2.4). Please credit this data as "generated using 'SLOCCount' by David A. Wheeler."
Interpreting this should be straightforward. The Apache directory has several subdirectories, including "htdocs", "cgi-bin", and "src". The "src" directory has many subdirectories in it ("modules", "main", and so on). Code files directly contained in the main directory /usr/src/redhat/BUILD/apache_1.3.12 is labelled "top_dir", while code directly contained in the src subdirectory is labelled "src_top_dir". Code in the "src/modules" directory is labelled "src_modules" here. The output shows each major directory broken out, sorted from largest to smallest. Thus, the "src/modules" directory had the most code of the directories, 24728 physical SLOC, all of it in C. The "src/helpers" directory had a mix of shell, perl, and C; note that when multiple languages are shown, the list of languages in that child is also sorted from largest to smallest.
Below the per-component set is a list of all languages used, with their total SLOC shown, sorted from most to least. After this is the total physical SLOC (77,873 physical SLOC in this case).
Next is an estimation of the effort and schedule (calendar time) it would take to develop this code. For effort, the units shown are person-years (with person-months shown in parentheses); for schedule, total years are shown first (with months in parentheses). When invoked through "sloccount", the default assumption is that all code is part of a single program; the "--multiproject" option changes this to assume that all top-level components are independently developed programs. When "--multiproject" is invoked, each project's efforts are estimated separately (and then summed), and the schedule estimate presented is the largest estimated schedule of any single component.
By default the "Basic COCOMO" model is used for estimating effort and schedule; this model includes design, code, test, and documentation time (both user/admin documentation and development documentation). See below for more information on COCOMO as it's used in this program.
Next are several numbers that attempt to estimate what it would have cost to develop this program. This is simply the amount of effort, multiplied by the average annual salary and by the "overhead multiplier". The default annual salary is $56,286 per year; this value was from the ComputerWorld, September 4, 2000's Salary Survey of an average U.S. programmer/analyst salary in the year 2000. You might consider using other numbers (ComputerWorld's September 3, 2001 Salary Survey found an average U.S. programmer/analyst salary making $55,100, senior systems programmers averaging $68,900, and senior systems analysts averaging $72,300).
Overhead is much harder to estimate; I did not find a definitive source for information on overheads. After informal discussions with several cost analysts, I determined that an overhead of 2.4 would be representative of the overhead sustained by a typical software development company. As discussed in the next section, you can change these numbers too.
You may be surprised by the high cost estimates, but remember, these include design, coding, testing, documentation (both for users and for programmers), and a wrap rate for corporate overhead (to cover facilities, equipment, accounting, and so on). Many programmers forget these other costs and are shocked by the high figures. If you only wanted to know the costs of the coding, you'd need to get those figures.
Note that if any top-level directory has a file named PROGRAM_LICENSE, that file is assumed to contain the name of the license (e.g., "GPL", "LGPL", "MIT", "BSD", "MPL", and so on). If there is at least one such file, sloccount will also report statistics on licenses.
Note: sloccount internally uses MD5 hashes to detect duplicate files, and thus needs some program that can compute MD5 hashes. Normally it will use "md5sum" (available, for example, as a GNU utility). If that doesn't work, it will try to use "md5" and "openssl", and you may see error messages in this format:
Can't exec "md5sum": No such file or directory at /usr/local/bin/break_filelist line 678, <CODE_FILE> line 15. Can't exec "md5": No such file or directory at /usr/local/bin/break_filelist line 678, <CODE_FILE> line 15.You can safely ignore these error messages; these simply show that SLOCCount is probing for a working program to compute MD5 hashes. For example, Mac OS X users normally don't have md5sum installed, but do have md5 installed, so they will probably see the first error message (because md5sum isn't available), followed by a note that a working MD5 program was found.
There are several options that control which files are selected for counting:
--duplicates Count all duplicate files as normal files --crossdups Count duplicate files if they're in different data directory children. --autogen Count automatically generated files --follow Follow symbolic links (normally they're ignored) --addlang Add languages to be counted that normally aren't shown. --append Add more files to the data directoryNormally, files which have exactly the same content are counted only once (data directory children are counted alphabetically, so the child "first" in the alphabet will be considered the owner of the master copy). If you want them all counted, use "--duplicates". Sometimes when you use sloccount, each directory represents a different project, in which case you might want to specify "--crossdups". The program tries to reject files that are automatically generated (e.g., a C file generated by bison), but you can disable this as well. You can use "--addlang" to show makefiles and SQL files, which aren't usually counted.
Possibly the most important option is "--cached". Normally, when sloccount runs, it computes a lot of information and stores this data in a "data directory" (by default, "~/.slocdata"). The "--cached" option tells sloccount to use data previously computed, greatly speeding up use once you've done the computation once. The "--cached" option can't be used along with the options used to select what files should be counted. You can also select a different data directory by using the "--datadir" option.
There are many options for controlling the output:
--filecount Show counts of files instead of SLOC. --details Present details: present one line per source code file. --wide Show "wide" format. Ignored if "--details" selected --multiproject Assume each directory is for a different project (this modifies the effort estimation calculations) --effort F E Change the effort estimation model, so that it uses F as the factor and E as the exponent. --schedule F E Change the schedule estimation model, so that it uses F as the factor and E as the exponent. --personcost P Change the average annual salary to P. --overhead O Change the annual overhead to O. -- End of options
Basically, the first time you use sloccount, if you're measuring a set of projects (not a single project) you might consider using "--crossdups" instead of the defaults. Then, you can redisplay data quickly by using "--cached", combining it with options such as "--filecount". If you want to send the data to another tool, use "--details".
If you're measuring a set of projects, you probably ought to pass the option "--multiproject". When "--multiproject" is used, efforts are computed for each component separately and summed, and the time estimate used is the maximum single estimated time.
The "--details" option dumps the available data in 4 columns, tab-separated, where each line represents a source code file in the data directory children identified. The first column is the SLOC, the second column is the language type, the third column is the name of the data directory child (as it was given to get_sloc_details), and the last column is the absolute pathname of the source code file. You can then pipe this output to "sort" or some other tool for further analysis (such as a spreadsheet or RDBMS).
You can change the parameters used to estimate effort using "--effort". For example, if you believe that in the environment being used you can produce 2 KSLOC/month scaling linearly, then that means that the factor for effort you should use is 1/2 = 0.5 month/KSLOC, and the exponent for effort is 1 (linear). Thus, you can use "--effort 0.5 1".
You can also set the annual salary and overheads used to compute estimated development cost. While "$" is shown, there's no reason you have to use dollars; the unit of development cost is the same unit as the unit used for "--personcost".
By default SLOCCount uses a very simple estimating model for effort and schedule: the basic COCOMO model in the "organic" mode (modes are more fully discussed below). This model estimates effort and schedule, including design, code, test, and documentation time (both user/admin documentation and development documentation). Basic COCOMO is a nice simple model, and it's used as the default because it doesn't require any information about the code other than the SLOC count already computed.
However, basic COCOMO's accuracy is limited for the same reason - basic COCOMO doesn't take a number of important factors into account. If you have the necessary information, you can improve the model's accuracy by taking these factors into account. You can at least quickly determine if the right "mode" is being used to improve accuracy. You can also use the "Intermediate COCOMO" and "Detailed COCOMO" models that take more factors into account, and are likely to produce more accurate estimates as a result. Take these estimates as just that - estimates - they're not grand truths. If you have the necessary information, you can improve the model's accuracy by taking these factors into account, and pass this additional information to sloccount using its "--effort" and "--schedule" options (as discussed in options).
To use the COCOMO model, you first need to determine if your application's mode, which can be "Organic", "embedded", or "semidetached". Most software is "organic" (which is why it's the default). Here are simple definitions of these modes:
Cost Drivers | Ratings | ||||||
---|---|---|---|---|---|---|---|
ID | Driver Name | Very Low | Low | Nominal | High | Very High | Extra High |
RELY | Required software reliability | 0.75 (effect is slight inconvenience) | 0.88 (easily recovered losses) | 1.00 (recoverable losses) | 1.15 (high financial loss) | 1.40 (risk to human life) | |
DATA | Database size | 0.94 (database bytes/SLOC < 10) | 1.00 (D/S between 10 and 100) | 1.08 (D/S between 100 and 1000) | 1.16 (D/S > 1000) | ||
CPLX | Product complexity | 0.70 (mostly straightline code, simple arrays, simple expressions) | 0.85 | 1.00 | 1.15 | 1.30 | 1.65 (microcode, multiple resource scheduling, device timing dependent coding) |
TIME | Execution time constraint | 1.00 (<50% use of available execution time) | 1.11 (70% use) | 1.30 (85% use) | 1.66 (95% use) | ||
STOR | Main storage constraint | 1.00 (<50% use of available storage) | 1.06 (70% use) | 1.21 (85% use) | 1.56 (95% use) | ||
VIRT | Virtual machine (HW and OS) volatility | 0.87 (major change every 12 months, minor every month) | 1.00 (major change every 6 months, minor every 2 weeks) | 1.15 (major change every 2 months, minor changes every week) | 1.30 (major changes every 2 weeks, minor changes every 2 days) | ||
TURN | Computer turnaround time | 0.87 (interactive) | 1.00 (average turnaround < 4 hours) | 1.07 | 1.15 | ||
ACAP | Analyst capability | 1.46 (15th percentile) | 1.19 (35th percentile) | 1.00 (55th percentile) | 0.86 (75th percentile) | 0.71 (90th percentile) | |
AEXP | Applications experience | 1.29 (<= 4 months experience) | 1.13 (1 year) | 1.00 (3 years) | 0.91 (6 years) | 0.82 (12 years) | |
PCAP | Programmer capability | 1.42 (15th percentile) | 1.17 (35th percentile) | 1.00 (55th percentile) | 0.86 (75th percentile) | 0.70 (90th percentile) | |
VEXP | Virtual machine experience | 1.21 (<= 1 month experience) | 1.10 (4 months) | 1.00 (1 year) | 0.90 (3 years) | ||
LEXP | Programming language experience | 1.14 (<= 1 month experience) | 1.07 (4 months) | 1.00 (1 year) | 0.95 (3 years) | ||
MODP | Use of "modern" programming practices (e.g. structured programming) | 1.24 (No use) | 1.10 | 1.00 (some use) | 0.91 | 0.82 (routine use) | |
TOOL | Use of software tools | 1.24 | 1.10 | 1.00 (basic tools) | 0.91 (test tools) | 0.83 (requirements, design, management, documentation tools) | |
SCED | Required development schedule | 1.23 (75% of nominal) | 1.08 (85% of nominal) | 1.00 (nominal) | 1.04 (130% of nominal) | 1.10 (160% of nominal) |
For example, imagine that you're examining a fairly simple application that
meets the "organic" requirements. Organic projects have a base factor
of 2.3 and exponents of 1.05, as noted above.
We then examine all the factors to determine a corrected base factor.
For this example, imagine
that we determine the values of these cost drivers are as follows:
Cost Drivers |
Ratings |
||
ID |
Driver Name |
Rating |
Multiplier |
RELY |
Required software reliability |
Low - easily recovered losses |
0.88 |
DATA |
Database size |
Low |
0.94 |
CPLX |
Product complexity |
Nominal |
1.00 |
TIME |
Execution time constraint |
Nominal |
1.00 |
STOR |
Main storage constraint |
Nominal |
1.00 |
VIRT |
Virtual machine (HW and OS) volatility |
Low (major change every 12 months, minor every month) |
0.87 |
TURN |
Computer turnaround time |
Nominal |
1.00 |
ACAP |
Analyst capability |
Nominal (55th percentile) |
1.00 |
AEXP |
Applications experience |
Nominal (3 years) |
1.00 |
PCAP |
Programmer capability |
Nominal (55th percentile) |
1.00 |
VEXP |
Virtual machine experience |
High (3 years) |
0.90 |
LEXP |
Programming language experience |
High (3 years) |
0.95 |
MODP |
Use of "modern" programming practices (e.g. structured programming) |
High (Routine use) |
0.82 |
TOOL |
Use of software tools |
Nominal (basic tools) |
1.00 |
SCED |
Required development schedule |
Nominal |
1.00 |
So, starting with the base factor (2.3 in this case), and then multiplying
the driver values, we'll compute a final factor of:
By multiplying these driver values together in this example, we compute:
2.3*0.88*0.94*1*1*1*0.87*1.00*1*1*1*0.90*0.95*0.82*1*1For this example, the final factor for the effort calculation is 1.1605. You would then invoke sloccount with "--effort 1.1605 1.05" to pass in the corrected factor and exponent for the effort estimation. You don't need to use "--schedule" to set the factors when you're using organic model, because in SLOCCount the default values are the values for the organic model. You can set scheduling parameters manually anyway by setting "--schedule 2.5 0.38". You do need to use the --schedule option for embedded and semidetached projects, because those modes have different schedule parameters. The final command would be:
The detailed COCOMO model requires breaking information down further.
For more information about the original COCOMO model, including the detailed COCOMO model, see the book Software Engineering Economics by Barry Boehm.
You may be surprised by the high cost estimates, but remember, these include design, coding, testing (including integration and testing), documentation (both for users and for programmers), and a wrap rate for corporate overhead (to cover facilities, equipment, accounting, and so on). Many programmers forget these other costs and are shocked by the high cost estimates.
If you want to know a subset of this cost, you'll need to isolate just those figures that you're trying to measure. For example, let's say you want to find the money a programmer would receive to do just the coding of the units of the program (ignoring wrap rate, design, testing, integration, and so on). According to Boehm's book (page 65, table 5-2), the percentage varies by product size. For effort, code and unit test takes 42% for small (2 KSLOC), 40% for intermediate (8 KSLOC), 38% for medium (32 KSLOC), and 36% for large (128 KSLOC). Sadly, Boehm doesn't separate coding from unit test; perhaps 50% of the time is spent in unit test in traditional proprietary development (including fixing bugs found from unit test). If you want to know the income to the programmer (instead of cost to the company), you'll also want to remove the wrap rate. Thus, a programmer's income to only write the code for a small program (circa 2 KSLOC) would be 8.75% (42% x 50% x (1/2.4)) of the default figure computed by SLOCCount.
In other words, less than one-tenth of the cost as computed by SLOCCount is what actually would be made by a programmer for a small program for just the coding task. Note that a proprietary commercial company that bid using this lower figure would rapidly go out of business, since this figure ignores the many other costs they have to incur to actually develop working products. Programs don't arrive out of thin air; someone needs to determine what the requirements are, how to design it, and perform at least some testing of it.
There's another later estimation model for effort and schedule called "COCOMO II", but COCOMO II requires logical SLOC instead of physical SLOC. SLOCCount doesn't currently measure logical SLOC, so SLOCCount doesn't currently use COCOMO II. Contributions of code to compute logical SLOC and then optionally use COCOMO II will be gratefully accepted.
If you want to count a specific subset, you can use the "--details" option to list individual files, pipe this into "grep" to select the files you're interested in, and pipe the result to my tool "print_sum" (which reads lines beginning with numbers, and returns the total of those numbers). If you've already done the analysis, an example would be:
sloccount --cached --details | grep "/some/subdirectory/" | print_sum
If you just want to count specific files, and you know what language they're in, you can just invoke the basic SLOC counters directly. By convention the simple counters are named "LANGUAGE_count", and they take on the command line a list of the source files to count. Here are some examples:
c_count *.c *.cpp *.h # Count C and C++ in current directory. asm_count *.S # Count assembly.All the counters (*_count) program accept a "-f FILENAME" option, where FILENAME is a file containing the names of all the source files to count (one file per text line). If FILENAME is "-", the list of file names is taken from the standard input. The "c_count" program handles both C and C++ (but not objective-C; for that use objc_count). The available counters are ada_count, asm_count, awk_count, c_count, csh_count, exp_count, fortran_count, f90_count, java_count, lex_count, lisp_count, ml_count, modula3_count, objc_count, pascal_count, perl_count, python_count, sed_count, sh_count, sql_count, and tcl_count.
There is also "generic_count", which takes as its first parameter the ``comment string'', followed by a list of files. The comment string begins a comment that ends at the end of the line. Sometimes, if you have source for a language not listed, generic_count will be sufficient.
The basic SLOC counters will send output to standard out, one line per file (showing the SLOC count and filename). The assembly counter shows some additional information about each file. The basic SLOC counters always complete their output with a line saying "Total:", followe by a line with the total SLOC count.
count_unknown_extThis will look at the resulting data (in its default data directory location, ~/.slocdata) and report a sorted list of the file extensions for uncategorized ("unknown") files. The list will show every file extension and how many files had that extension, and is sorted by most common first. It's not a problem if an "unknown" type isn't a source code file, but if there are a significant number of source files in this category, you'll need to change SLOCCount to get an accurate result.
One error report that you may see is:
c_count ERROR - terminated in string in (filename)The cause of this is that c_count (the counter for C-like languages) keeps track of whether or not it's in a string, and when the counter reached the end of the file, it still thought it was in a string.
Note that c_count really does have to keep track of whether or not it's a string. For example, this is three lines of code, not two, because the ``comment'' is actually in string data:
a = "hello /* this is not a comment */ bye";
Usually this error means you have code that won't compile given certain #define settings. E.G., XFree86 has a line of code that's actually wrong (it has a string that's not terminated), but people don't notice because the #define to enable it is not usually set. Legitimate code can trigger this message, but code that triggers this message is horrendously formatted and is begging for problems.
In either case, the best way to handle the situation is to modify the source code (slightly) so that the code's intent is clear (by making sure that double-quotes balance). If it's your own code, you definitely should fix this anyway. You need to look at the double-quote (") characters. One approach is to just grep for double-quote, and look at every line for text that isn't terminated, e.g., printf("hello %s, myname);
SLOCcount reports warnings when an unusually large number of duplicate files are reported. A large number of duplicates may suggest that you're counting two different versions of the same program as though they were independently developed. You may want to cd into the data directory (usually ~/.slocdata), cd into the child directories corresponding to each component, and then look at their dup_list.dat files, which list the filenames that appeared to be duplicated (and what they duplicate with).
For some languages, you may be able to use the ``generic_count'' program to implement your counter - generic_count takes as its first argument the pattern which identifies comment begins (which continue until the end of the line); the other arguments are the files to count. Thus, the LISP counter looks like this:
#!/bin/sh generic_count ';' $@The generic_count program won't work correctly if there are multiline comments (e.g., C) or multiline string constants. If your language is identical to C/C++'s syntax in terms of string constant definitions and commenting syntax (using // or /* .. */), then you can use the c_count program - in this case, modify compute_sloc_lang so that the c_count program is used.
Otherwise, you'll have to devise your own counting program. The program must generate files with the same format, e.g., for every filename passed as an argument, it needs to return separate lines, where each line presents the SLOC for that file, a space, and the filename. (Note: the assembly language counter produces a slightly different format.) After that, print "Total:" on its own line, and the actual SLOC total on the following (last) line.
Here's how to manually create a "data directory" to hold intermediate results, and how to invoke each tool in sequence (with discussion of options):
mkdir ~/data
cd ~/dataThe rest of these instructions assume that your current directory is the data directory. You can set up many different data directories if you wish, to analyze different source programs or analyze the programs in different ways; just "cd" to the one you want to work with.
You use the "make_filelists" command to initialize a data directory. For example, if your source code is in /usr/src/redhat/BUILD, run:
make_filelists /usr/src/redhat/BUILD/*
Internally, make_filelists uses "find" to create the list of files, and by default it ignores all symbolic links. However, you may need to follow symbolic links; if you do, give make_filelists the "--follow" option (which will use find's "-follow" option). Here are make_filelists' options:
--follow Follow symbolic links --datadir D Use this data directory --skip S Skip basenames named S --prefix P When creating children, prepend P to their name. -- No more options
Although you don't normally need to do so, if you want certain files to not be counted at all in your analysis, you can remove data directory children or edit the "filelist" files to do so. There's no need to remove files which aren't source code files normally; this is handled automatically by the next step.
If you don't have a single source code directory where the subdirectories represent the major components you want to count separately, you can still use the tool but it's more work. One solution is to create a "shadow" directory with the structure you wish the program had, using symbolic links (you must use "--follow" for this to work). You can also just invoke make_filelists multiple times, with parameters listing the various top-level directories you wish to include. Note that the basenames of the directories must be unique.
If there are so many directories (e.g., a massive number of projects) that the command line is too long, you can run make_filelists multiple times in the same directory with different arguments to create them. You may find "find" and/or "xargs" helpful in doing this automatically. For example, here's how to do the same thing using "find":
find /usr/src/redhat/BUILD -maxdepth 1 -mindepth 1 -type d \ -exec make_filelists {} \;
break_filelist *At this point you might want to examine the data directory subdirectories to ensure that "break_filelist" has correctly determined the types of the various files. In particular, the "unknown" category may have source files in a language SLOCCount doesn't know about. If the heuristics got some categorization wrong, you can modify the break_filelist program and re-run break_filelist.
By default break_filelist removes duplicates, doesn't count automatically generated files as normal source code files, and only gives some feedback. You can change these defaults with the following options:
--duplicates Count all duplicate files as normal files --crossdups Count duplicate files if they're in different data directory children (i.e., in different "filelists") --autogen Count automatically generated files --verbose Present more verbose status information while processing.
Duplicate control in particular is an issue; you probably don't want duplicates counted, so that's the default. Duplicate files are detected by determining if their MD5 checksums are identical; the "first" duplicate encountered is the only one kept. Normally, since shells sort directory names, this means that the file in the alphabetically first child directory is the one counted. You can change this around by listing directories in the sort order you wish followed by "*"; if the same data directory child is requested for analysis more than once in a given execution, it's skipped after the first time. So, if you want any duplicate files with child directory "glibc" to count as part of "glibc", then you should provide the data directory children list as "glibc *".
Beware of choosing something other than "*" as the parameter here, unless you use the "--duplicates" or "--crossdups" options. The "*" represents the list of data directory children to examine. Since break_filelist skips duplicate files identified in a particular run, if you run break_filelist on only certain children, some duplicate files won't be detected. If you're allowing duplicates (via "--duplicates" or "--crossdups"), then this isn't a problem. Or, you can use the ``--duplistfile'' option to store and retrieve hashes of files, so that additional files can be handled.
If there are so many directories that the command line is too long, you can run break_filelist multiple times and give it a subset of the directories each time. You'll need to use one of the duplicate control options to do this. I would suggest using "--crossdups", which means that duplicates inside a child will only be counted once, eliminating at least some of the problems of duplicates. Here's the equivalent of "break_filelist *" when there are a large number of subdirectories:
find . -maxdepth 1 -mindepth 1 -type d -exec break_filelist --crossdups {} \;Indeed, for all of the later commands where "*" is listed as the parameter in these instructions (for the list of data directory children), just run the above "find" command and replace "break_filelist --crossdups" with the command shown.
count_unknown_ext(note that this command is unusual - it doesn't take any arguments, since it's hard to imagine a case where you wouldn't want every directory examined). Unlike the other commands discussed, this one specifically looks at ${HOME}/.slocdata. This command presents a list of extensions which are unknown to break_filelist, with the most common ones listed first. The output format is a name, followed by the number of instances; the name begins with a "." if it's an extension, or, if there's no extension, it begins with "/" followed by the base name of the file. break_filelist already knows about common extensions such as ".gif" and ".png", as well as common filenames like "README". You can also view the contents of each of the data directory children's files to see if break_filelist has correctly categorized the files.
compute_all *If you only want to compute SLOC for a specific language, you can invoke compute_sloc_lang, which takes as its first parameter the SLOCCount name of the language ("ansic" for C, "cpp" for C++, "ada" for Ada, "asm" for assembly), followed by the list of data directory children. Note that these names are a change from version 1.0, which called the master program "compute_all", and had "compute_*" programs for each language.
Notice the "*"; you can replace the "*" with just the list of data directory children (subdirectories) to compute, if you wish. Indeed, you'll notice that nearly all of the following commands take a list of data directory children as arguments; when you want all of them, use "*" (as shown in these instructions), otherwise, list the ones you want.
When you run compute_all or compute_sloc_lang, each data directory child (subdirectory) is consulted in turn for a list of the relevant files, and the SLOC results are placed in that data directory child. In each child, the file "LANGUAGE-outfile.dat" lists the information from the basic SLOC counters. That is, the oufile lists the SLOC and filename (the assembly outfile has additional information), and ends with a line saying "Total:" followed by a line showing the total SLOC of that language in that data directory child. The file "all-physical.sloc" has the final total SLOC for every language in that child directory (i.e., it's the last line of the outfile).
compute_c_usc *
get_sloc * | lessThe get_sloc program takes many options, including:
--filecount Display number of files instead of SLOC (SLOC is the default) --wide Use "wide" format instead (tab-separated columns) --nobreak Don't insert breaks in long lines --sort X Sort by "X", where "X" is the name of a language ("ansic", "cpp", "fortran", etc.), or "total". By default, get_sloc sorts by "total". --nosort Don't sort - just present results in order of directory listing given. --showother Show non-language totals (e.g., # duplicate files). --oneprogram When computing effort, assume that all files are part of a single program. By default, each subdirectory specified is assumed to be a separate, independently-developed program. --noheader Don't show the header --nofooter Don't show the footer (the per-language values and totals)
Note that unlike the "sloccount" tool, get_sloc requires the current directory to be the data directory.
If you're displaying SLOC, get_sloc will also estimate the time it would take to develop the software using COCOMO (using its "basic" model). By default, this figure assumes that each of the major subdirectories was developed independently of the others; you can use "--oneprogram" to make the assumption that all files are part of the same program. The COCOMO model makes many other assumptions; see the paper at http://www.dwheeler.com/sloc for more information.
If you need to do more analysis, you might want to use the "--wide" option and send the data to another tool such as a spreadsheet (e.g., gnumeric) or RDBMS (e.g., PostgreSQL). Using the "--wide" option creates tab-separated data, which is easier to import. You may also want to use the "--noheader" and/or "--nofooter" options to simplify porting the data to another tool.
Note that in version 1.0, "get_sloc" was called "get_data".
If you have so many data directory children that you can't use "*" on the command line, get_sloc won't be as helpful. Feel free to patch get_sloc to add this capability (as another option), or use get_sloc_detail (discussed next) to feed the data into another tool.
get_sloc_details *
Here are some ``designer's notes'' on how SLOCCount works, including what it can handle.
The program break_filelist has categories for each programming language it knows about, plus the special categories ``not'' (not a source code file), ``auto'' (an automatically-generated file and thus not to be counted), ``zero'' (a zero-length file), ``dup'' (a duplicate of another file as determined by an md5 checksum), and ``unknown'' (a file which doesn't seem to be a source code file nor any of these other categories). It's a good idea to examine the ``unknown'' items later, checking the common extensions to ensure you have not missed any common types of code.
The program break_filelist uses lots of heuristics to correctly categorize files. Here are few notes about its heuristics:
One complicating factor is that I wished to separate C, C++, and Objective-C code, but a header file ending with ``.h'' or ``.hpp'' file could be any of these languages. In theory, ``.hpp'' is only C++, but I found that in practice this isn't true. I developed a number of heuristics to determine, for each file, what language a given header belonged to. For example, if a given directory has exactly one of these languages (ignoring header files), the header is assumed to belong to that category as well. Similarly, if there is a body file (e.g., ".c") that has the same name as the header file, then presumably the header file is of the same language. Finally, a header file with the keyword ``class'' is almost certainly not a C header file, but a C++ header file; otherwise it's assumed to be a C file.
None of the SLOC counters fully parse the source code; they just examine the code using simple text processing patterns to count the SLOC. In practice, by handling a number of special cases this seems to be fine. Here are some notes on some of the language counters; the language name is followed by common extensions in parentheses and the SLOCCount name of the language in brackets:
Much of the code is written in Perl, since it's primarily a text processing problem and Perl is good at that. Many short scripts are Bourne shell scripts (it's good at short scripts for calling other programs), and the basic C/C++ SLOC counter is written in C for speed.
I originally named it "SLOC-Count", but I found that some web search engines (notably Google) treated that as two words. By naming it "SLOCCount", it's easier to find by those who know the name of the program.
SLOCCount only counts physical SLOC, not logical SLOC. Logical SLOC counting requires much more code to implement, and I needed to cover a large number of programming languages.
This tool measures ``physical SLOC.'' Physical SLOC is defined as follows: ``a physical source line of code (SLOC) is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.'' Comment delimiters (characters other than newlines starting and ending a comment) are considered comment characters. Data lines only including whitespace (e.g., lines with only tabs and spaces in multiline strings) are not included.
To make this concrete, here's an example of a simple C program (it strips ANSI C comments out). On the left side is the running SLOC total, where "-" indicates a line that is not considered a physical "source line of code":
1 #include <stdio.h> - - /* peek at the next character in stdin, but don't get it */ 2 int peek() { 3 int c = getchar(); 4 ungetc(c, stdin); 5 return c; 6 } - 7 main() { 8 int c; 9 int incomment = 0; /* 1 = we are inside a comment */ - 10 while ( (c = getchar()) != EOF) { 11 if (!incomment) { 12 if ((c == '/') && (peek() == '*')) {incomment=1;} 13 } else { 14 if ((c == '*') && (peek() == '/')) { 15 c= getchar(); c=getchar(); incomment=0; 16 } 17 } 18 if ((c != EOF) && !incomment) {putchar(c);} 19 } 20 }
Robert E. Park et al.'s Software Size Measurement: A Framework for Counting Source Statements (Technical Report CMU/SEI-92-TR-20) presents a set of issues to be decided when trying to count code. The paper's abstract states:
This report presents guidelines for defining, recording, and reporting two frequently used measures of software sizeŃ physical source lines and logical source statements. We propose a general framework for constructing size definitions and use it to derive operational methods for reducing misunderstandings in measurement results.
Using Park's framework, here is how physical lines of code are counted:
Thus, SLOCCount generally follows Park's ``basic definition'', but with the following exceptions depending on how you use it:
Otherwise, this counter follows Park's ``basic definition'' of a physical line of code, even down to Park's language-specific definitions where Park defined them for a language.
There are other undocumented analysis tools in the original tar file. Most of them are specialized scripts for my circumstances, but feel free to use them as you wish.
If you're packaging this program, don't just copy every executable into the system "bin" directory - many of the files are those specialized scripts. Just put in the bin directory every executable documented here, plus the the files they depend on (there aren't that many). See the RPM specification file to see what's actually installed.
You have to take any measure of SLOC (including this one) with a large grain of salt. Physical SLOC is sensitive to the format of source code. There's a correlation between SLOC and development effort, and some correlation between SLOC and functionality, but there's absolutely no correlation between SLOC and either "quality" or "value".
A problem of physical SLOC is that it's sensitive to formatting, and that's a legitimate (and known) problem with the measure. However, to be fair, logical SLOC is influenced by coding style too. For example, the following two phrases are semantically identical, but will have different logical SLOC values:
int i, j; /* 1 logical SLOC */ int i; /* 2 logical SLOC, but it does the same thing */ int j;
If you discover other information that can be divided up by data directory children (e.g., the license used), it's probably best to add that to each subdirectory (e.g., as a "license" file in the subdirectory). Then you can modify tools like get_sloc to add them to their display.
I developed SLOCCount for my own use, not originally as a community tool, so it's certainly not beautiful code. However, I think it's serviceable - I hope you find it useful. Please send me patches for any improvements you make!
You can't use this tool as-is with some estimation models, such as COCOMO II, because this tool doesn't compute logical SLOC. I certainly would accept code contributions to add the ability to measure logical SLOC (or related measures such as Cyclomatic Complexity and Cyclomatic density); selecting them could be a compile-time option. However, measuring logical SLOC takes more development effort, so I haven't done so; see USC's "CodeCount" for a set of code that measures logical SLOC for some languages (though I've had trouble with CodeCount - in particular, its C counter doesn't correctly handle large programs like the Linux kernel).
Here is the SLOCCount License; the file COPYING contains the standard GPL version 2 license:
===================================================================== SLOCCount Copyright (C) 2000-2001 David A. Wheeler (dwheeler, at, dwheeler.com) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA =====================================================================
While it's not formally required by the license, please give credit to me and this software in any report that uses results generated by it.
This document was written by David A. Wheeler (dwheeler, at, dwheeler.com), and is (C) Copyright 2001 David A. Wheeler. This document is covered by the license (GPL) listed above.
The license does give you the right to use SLOCCount to analyze proprietary programs.
One available toolset is CodeCount. I tried using this toolset, but I eventually gave up. It had too many problems handling the code I was trying to analyze, and it does a poor job automatically categorizing code. It also has no support for many of today's languages (such as Python, Perl, Ruby, PHP, and so on). However, it does a lot of analysis and measurements that SLOCCount doesn't do, so it all depends on your need. Its license appeared to be open source, but it's quite unusual and I'm not enough of a lawyer to be able to confirm that.
Another tool that's available is LOCC. It's available under the GPL. It can count Java code, and there's experimental support for C++. LOCC is really intended for more deeply analyzing each Java file; what's particularly interesting about it is that it can measure "diffs" (how much has changed). See A comparative review of LOCC and CodeCount.
CCCC is a tool which analyzes C++ and Java files and generates a report on various metrics of the code. Metrics supported include lines of code, McCabe's complexity, and metrics proposed by Chidamber & Kemerer and Henry & Kafura. (You can see Time Littlefair's comments). CCCC is in the public domain. It reports on metrics that sloccount doesn't, but sloccount can handle far more computer languages.
The GPL license doesn't require you to submit changes you make back to its maintainer (currently me), but it's highly recommended and wise to do so. Because others will send changes to me, a version you make on your own will slowly because obsolete and incompatible. Rather than allowing this to happen, it's better to send changes in to me so that the latest version of SLOCCount also has the features you're looking for. If you're submitting support for new languages, be sure that your chnage correctly ignores files that aren't in that new language (some filename extensions have multiple meanings). You might want to look at the TODO file first.
When you send changes to me, send them as "diff" results so that I can use the "patch" program to install them. If you can, please send ``unified diffs'' -- GNU's diff can create these using the "-u" option.