This page is obsolete; see http://readable.sourceforge.net instead. |
Many people find Lisp s-expressions hard to read as a programming notation. This paper briefly describes 3 layered alternatives I've developed (curly infix, modern-expressions, and sweet-expressions), and shows how to download and run free-libre/open source software that implements them. It also has some style suggestions. These formats are not tied to any particular Lisp system, and can do everything regular s-expressions can do. You don't need to be familiar with Lisp-like languages to understand this tutorial, though it helps.
Lisp-derived systems normally represent programs as s-expressions, where an operation and its parameters is surrounded by parentheses. The operation to be performed is identified first, and each parameter afterwards is separated by whitespace. So the traditional “2+3” is written as “(+ 2 3)” in a Lisp-derived language. S-expressions make it unusually easy for programs to read, process, and generate programs. This can make some programs considerably easier to create, as well as much easier to automatically analyze or prove.
Unfortunately, programs in S-expression format can be hard to read. Lisp S-expressions fail to support infix notation, its notation for making function calls is completely different than most programming languages and math books, and it requires using and balancing endless parentheses. The last problem can be partly overcome through tools, but why use a notation that's so bad that such tools are important? It'd be better to have a notation that people can easily read in the first place.
So I've developed three notations, each building on the previous; the final "sweet-expressions" is the most readable. They are simply new abbreviations for common cases (Lisp has traditionally had several abbreviations, but not these important ones). These new notations/abbreviations are:
Take a look at some examples, and compare sweet-expressions to the ugly old s-expressions:
(Ugly) S-expression | Sweet-expression 0.2 |
---|---|
(define (fibfast n) (if (< n 2) n (fibup n 2 1 0))) |
define fibfast(n) ; Typical function notation if {n < 2} ; Indentation, infix {...} n ; Single expr = no new list fibup(n 2 1 0) ; Simple function calls |
(define (fibup max count n-1 n-2) (if (= max count) (+ n-1 n-2) (fibup max (+ count 1) (+ n-1 n-2) n-1))) |
define fibup(max count n-1 n-2) if {max = count} {n-1 + n-2} fibup max {count + 1} {n-1 + n-2} n-1 |
(define (factorial n) (if (<= n 1) 1 (* n (factorial (- n 1))))) |
define factorial(n) if {n <= 1} 1 {n * factorial{n - 1}} ; f{...} => f({...}) |
This example uses variable names with embedded "-" characters; that's not a problem, because the infix operators must be surrounded by whitespace and are only used when {...} requests them.
Credit where credit is due: The Fibonacci number code is loosely based on an example by Hanson Char.
In short, "Sweet-expressions" are a mostly-backward-compatible format of s-expressions that are easier to read. A sweet-expression reader will accept traditional s-expressions, but it also supports various extensions that make it easier to read (especially for those less familar with s-expressions). Sweet-expressions are automatically translated into s-expressions, and are carefully crafted so they lose no power.
If you are already very familiar with Lisp-derived languages, see my web page on readable s-expressions which discusses this further.
I've implemented a demo implementation of curly infix, modern-expressions, and sweet-expressions. I really want people to try this out and give feedback.
To try the demos out, make sure you have GNU Guile (an implementation of the Scheme dialect of Lisp), expect, and the software configuration management tool "Subversion" (whose command line name is "svn"). You should also have a Common Lisp implementation; here I'll presume it is clisp, but it doesn't really matter. You should be running on a POSIX system; if you use Windows, you may need to install Cygwin first. Any modern GNU/Linux system will do nicely.
First, download the latest version of the demo and related files using Subversion:
svn co https://readable.svn.sourceforge.net/svnroot/readable/trunk readable
This will create a subdirectory called readable, so "cd" into it:
cd readable
Let's first try out "curly infix". Start up your Common Lisp (I'll presume it's clisp), and load the curly-infix demo:
clisp (load "curly-infix.cl")
Any of these three new notations (curly infix, modern-expressions, and sweet-expressions) can also accept normally-formatted s-expressions. So you can type in at the command line:
(+ 2 3)and when you press enter you'll see 5.
But "Curly infix" adds the ability to use {...} around infix operators. So just type this in:
{2 + 3}and you will see 5. Look! You now have a calculator! It's important to realize this is a simple syntactic convenience — an abbreviation. Traditional Lisp actually includes many abbreviations, for example, 'x is a traditional abbreviation for (quote x), so this is just another one and does not change how "Lisp works". Thus, if you enter:
'{2 + 3}it will respond with '(+ 2 3).
There is intentionally no support for precedence between different operators. While precedence is useful in some circumstances, in typical uses for Lisp-derived languages and sweet-expressions, it doesn't help and is often harmful. This is not a problem at all, just use curly braces or parentheses when mixing different infix operators:
{2 + {3 * 4}}
You can "chain" the same infix operator, e.g., {2 + 3 + 4} is fine, and it will map to (+ 2 3 4). Note that the infix operator must be surrounded by whitespace - otherwise, it would have no way to know where the name of the operation begins or ends.
What happens if we do use multiple infix operators in a single list? Instead of trying to guess the precedence, the reader will simply turn it into a list with "nfx" at front. You can then define a macro named "nfx" to process the list, or define "nfx" as an error to prevent its use.
So this explains the real rule: in curly infix, the {...} contains an "infix list". If the enclosed infix list has (1) an odd number of parameters, (2) at least 3 parameters, and (3) all even parameters are the same symbol, then it is mapped to "(even-parameter odd-parameters)". Otherwise, it is mapped to "(nfx list)" — you'll need to have a macro named "nfx" to use it.
Note that curly infix intentionally does not force a particular precedence, nor does it automatically switch to infix operators recursively inside {...}. Many previous systems did that, but this turned out to interfere with many of Lisp's power (homoiconicity, multiple levels of language, and so on). It also does not attempt to guess when infix operators are used. After many experiments with real problems, I found that the rule as given actually works better for Lisps than those alternatives.
You can type control-D (Windows: control-Z) to end this demo.
Now let's try out modern-expressions, but this time using guile (a Scheme implementation):
./modern-guile
Modern-expressions include curly-infix, but adds special meanings to the grouping symbols (), [], and {} if they immediately follow a symbol or list (instead of being separated by whitespace).
This means that you can use more "traditional" functional notation, e.g., f(1 2) maps to (f 1 2). Just type in the name of a function, an opening "(", its parameters (space-separated), and a closing ")". Make sure that you do not have a space before the (prefixed) function name and the following "(". For example, type this in:
cos(0)and get a very reasonable response. Here's another - try this:
substring("Hello" 1 3)This will produce "el".
You can nest them, just as you'd expect:
substring("Hello" 1 string-length("xyz"))
Using function name prefixes is a nice way of showing negation, e.g., -(x) computes the value of 0 - x. So while curly infix by itself doesn't handle prefix functions, modern-expressions can handle them nicely:
{-(n) / 2}
You can even use function name prefixes with traditional binary operators, such as:
*(5 4)
This works with zero parameters, too; if you have a command called "help" (guile does), and choose not to give it any parameters, just type this (without pressing space before typing it in):
help()
It's actually quite common to have a function call pass one parameter, where the parameter is calculated using infix notation, so there's a rule to simplify this common case. You can use f{x + 1}, which maps to (f {x + 1}) which then maps to (f (+ x 1)). This makes it easy to pass a single parameter which happens to be calculated using infix. For example, factorial{n - 1} maps to factorial({n - 1}) which maps to (factorial (- n 1)).
not{#t and #f}
Just like traditional s-expressions, spaces separate parameters, so it's important that there be no space between the function name and the opening "(". Since spaces separate parameters, a space between the function name and the opening "(" would create two parameters instead of a single function call. The same is basically true for traditional s-expressions, too; (a b) and (ab) are not the same thing.
Here's the real rule: in modern-expressions, f(...) maps to (f ...), f{...} maps to (f {...}), and f[...] maps to (bracketaccess f), but in all cases only when f is a symbol or list. The "bracketaccess" is so that you can write a macro to access arrays and other mappings. Unprefixed (...) and [...] mean "create a list" (as they do in Scheme R6RS), but they may contain other modern-expressions.
Normally, people and pretty-printers will format Lisp code so that parameters inside a list are separated by whitespace, e.g., (a b c), so it turns out that this change in interpretation doesn't change the meaning of typically-formatted modern Lisp code (and you can pretty-print code to fix it). What's more, typical Lisp-aware text editors can work with modern-expressions as they are, without change... so if you don't want to change the way you work, but have a somewhat more readable notation, modern-expressions can help. But we still have to do all that parentheses-balancing, which hinders readability. So let's fix that too. Type control-D to get out of this demo.
Sweet-expressions takes modern-expressions and adds indentation as being syntactically meaningful. Let's try them out:
./sweet-guile
Here, an indented line is a parameter of its parent, later terms on a line are parameters of the first term, and lists of lists are marked with "group". A line with exactly one term, and no child lines, is simply that item; otherwise that line and its child lines are themselves a new list. Lines with only a ;-comment, and nothing else, are completely ignored - even their indentation is irrelevant. Whitespace-only lines at the beginning of a new expression are ignored, but a whitespace-only line (including a zero-length line) ends an expression. So, just type in your expression, and type a blank line (an extra Enter) to indicate that you're done.
Here's a trivial example:
substring "Hello" 1 3
What happens if the parameters are not constants, but something to be calculated? No problem, just put them on new lines and give them parameters. If something has parameters, then it must be something to calculate too! Here's another example (be sure to press at least one space before the 'substring'):
substring "Hello" 1 string-length "xyz"You can use parentheses, too; inside any grouping characters (...), [...], and {...}, indentation is ignored:
substring "Hello" 1 string-length("xyz")Here are some other valid sweet-expressions:
if {7 < 5} {3 + 4} {5 * {2 + 3}} abs{0 - 5}
Here's a more substantial example, one we've seen earlier:
define fibfast(n) ; Typical function notation if {n < 2} ; Indentation, infix {...} n ; Single expr = no new list fibup(n 2 1 0) ; Simple function calls define fibup(max count n-1 n-2) if {max = count} {n-1 + n-2} fibup(max {count + 1} {n-1 + n-2} n-1)
Sometimes you want to have a parameter that is a list of lists, or where the function to be called is in fact determined by another calculation. This is indicated with the "group" keyword; basically, "group" maps into a null function name, so you can use forms like "let" easily.
sweet-filter that lets you read in many sweet-expressions, and outputs the underlying s-expressions. You can use this to process files (say, in a makefile) so that you can write in sweet-expressions, yet be able to see what will be run by the underlying system.
When you're done with the demo, you can exit:
exit()
If you're familiar with traditional s-expressions, here are more examples. The left-hand-side are sweet-expressions; the right-hand-side are the transitional s-expression forms, though the sweet-expression reader can read them as well:
factorial(z) <==> (factorial z) foo(x y) <==> (foo x y) bar(x y z) <==> (bar x y z) factorial{x - 1} <==> (factorial {x - 1}) <==> (factorial (- x 1)) f(g(y) h(y) a) <==> (f (g y) (h y) a) f({x + 2} y {z - 3}) <==> (f (+ x 2) y (- z 3)) f{{x + 2} * {y - 3}} <==> (f (* (+ x 2) (- y 3)))
If you aren't familiar with Lisp, you may say "what's the big deal"? After all, this looks a lot like traditional languages. Many have commented that it looks like Python, with its use of indenting, and of course nearly all other languages use infix notation.
But that's the point - the results look much more familiar (and thus are more acceptable to non-Lispers), but all of Lisp's more exotic capabilities still work. You can use techniques like quoting (') and quasi-quoting (`) with lifting (,), which enable powerful capabilities. Many people have created "infix" notations with Lisp-like languages before, but they all failed to work with many other Lisp features. I think this approach succeeds instead, where others before have failed.
Here are the full rules, in one place, now that we've gone over them:
If you're using sweet-expressions, the bottom line for style is "make it easy to read". But here are a few suggestions that should help you do so in most cases. I presume an 80-character width for program text; sweet-expressions don't require them, but it can be annoying to handle overwide source code.
In general, use indentation to make it easy to see the larger-scale structure of a program or data. Typically major structural atoms should start a new line, including defining a new term (e.g., "define" and "let"), conditionals (e.g., "if" and "cond"), and loops (e.g., "loop"). For example, the traditional "cond" function of Common Lisp looks like this:
(cond (test-1 consequent-1-1 consequent-1-2 ...) (test-2) (test-3 consequent-3-1 ...) ... )This is easily implemented as:
cond test-1 consequent-1-1 consequent-1-2 ... test-2() test-3 consequent-3-1 ... ...
Indent as necessary to make it easy to understand. For example, if the consequents for test-1 (above) are long, indent them too:
cond test-1 consequent-1-1 consequent-1-2 ... test-2() test-3 consequent-3-1 ... ...
Use a consistent amount of indenting for each level. I tend to use 2 spaces for indentation; indentation nesting is more common in sweet-expressions, so 8-character indentations are often too much.
When calling a function, if the parameters will fit easily on a line if you use function notation like f(x y(z)), then put them all on a line. When you're calling a function with no parameters, use function-calling format with "()" at the end, e.g., "f()". Here's a good example (this example is for Scheme):
cond list?(x) extractify(car(x)) atom?(x) x #t #f
In general, indentation is used for the major "structural" elements of a program, and function calls get used once you're "near the leaf" of structure (where you won't go beyond the end of the line).
If you are providing a list of data (and not performing a function/method call), then use the traditional list notation such as "(a b c)". This is exactly equivalent to "a(b c)", but expressing it as a list will give the human reader a hint that this data is not considered a potential program. If it's used as both data and as program, then consider it a program, and use function call notation.
Occasionally you may what to use function call notation even if the parameters won't fit in a line, because once inside a function call indentation processing isn't relevant - and that can be useful.
If the function is typically written as infix (including "+", "*", "or", and "<"), use {...} to write it as an infix value. Generally these operators will be "and", "or", or an operator that only uses punctuation. If you're calling a function with only one parameter, and that parameter is calculated with an infix operation, use the f{...} shorthand. You can see all of these suggestions in this example:
define factorial(n) if {n <= 1} 1 {n * factorial{n - 1}} ; f{...} => f({...})
However, you may want to keep using prefix form if indentation still matters and one or more of the parameters is exceedingly complex (e.g., it's nested very deeply or includes program structuring forms like "cond" and "define"). This situation can often occur with "and" and "or" if you're using a functional programming style.
Where it's understandable, don't include unnecessary parentheses. For example, when indentation processing is active, don't do this:
myfunction(1 a(2) 3)
Instead, do this:
myfunction 1 a(2) 3
The latter is easier for humans to read, because the human reader has one less pair of parentheses to track.
Remember that "f(x)" is completely different from "f (x)"; the former means the 2-element list "(f x)", while the latter means an atom followed by a single-element list, "f (x)". In sweet-expressions (and traditional Lisp notation), whitespace is a separator, so for sweet-expression prefixed function calls, be sure to not put a space between the function name and the open paren.
You can put whitespace after the "(", or before the ")", and it'll make no difference, so "f(x)" and "f( x )" are equivalent. However, as a style I suggest not inserting this unnecessary space in most cases. That way, any whitespace between an atom and open paren is "unusual" and will catch the eye, making it easier to read.
If one of the parameters is a "list of lists", and indentation is still valid, use "group". The "let" form of Common Lisp is an example; it's traditionally written as:
(let ((var1 value1) (var2 value2) ... (varm valuem)) declaration1 declaration2 ... declarationp body1 body2 ... bodyn)
I suggest writing such functions this way; after a while, you mentally realize that "let's first parameter is a group":
let group var1 value1 var2 value2 ... varm valuem declaration1 declaration2 ... declarationp body1 body2 ... bodyn
But if the group-of-groups so trivial that the traditional list notation will easily fit on one line, use the traditional list notation instead (it's short). For example:
let ((var1 value1) (var2 value2)) body1 body2
Sweet-expressions can take a few minutes to learn how to use, just like anything else new. But I think they are worth it. If you want more details on the rules, and/or why they are the way they are, see the sweet expressions paper.
For more information, see my website page at http://www.dwheeler.com/readable. For example, I provide a sweet-expression reader in Scheme (under the MIT license), as well as an indenting pretty-printer in Common Lisp. In particular, you can see my lengthy paper about why sweet-expressions do what they do, and some plausible alternatives, and my follow-on paper on sweet-expressions version 0.2. You can also download some other implementation code. I've also set up a SourceForge project where options like sweet-expressions can be discussed; if you're interested, please join!