This page is obsolete; see http://readable.sourceforge.net instead. |
Many people find Lisp s-expressions hard to read as a programming notation. This paper discusses various ways to extend/modify s-expressions so they can be more readable without losing their power (such as quasiquoting, macros, and easily-manipulated program fragments). The goal is a notation that can be trivially translated to and from traditional s-expression notation (both by computer and in people’s heads), and isn't dependent on a particular underlying semantic. The paper identifies and discusses three approaches that seem particularly promising: indentation, name-prefixing (so func(x y) is the same as (func x y)), and infix support.
It then defines a particular way of combining these approaches, called “sweet-expressions”, that can be viewed as an essentially backward-compatible extension of s-expressions. A sweet-expression reader can accept typical cleanly-formatted s-expressions without change, but it also supports various extensions (optional syntactic sugar) that make much clearer code possible. This is purely a matter of screen presentation; underlying systems can continue to use s-expressions, unchanged. For example, here’s a trivial Common Lisp program that takes advantage of sweet-expression’s formatting extensions (the Scheme version is similar):
defun factorial (n) ; Parameters can be indented, but need not be if (n <= 1) ; Supports infix, prefix, & function <=(n 1) 1 ; This has no parameters, so it's an atom. n * factorial(n - 1) ; Function(...) notation supported
Sweet-expressions add the following abilities:
[+-\*/<>=&\|\p{Sm}]{1-4}|\:|\|\|
For more information, see my website at http://www.dwheeler.com/readable.
This paper describes the rationale behind the older sweet-expressions version 0.1; see Sweet-expressions: Version 0.2 for information on the changes made to sweet-expressions since this paper.
S-expression notation is a very simple notation for programs, and programs in variants of the Lisp programming language have traditionally been written using s-expressions. In this notation, an operation and its parameters is surrounded by parentheses; the operation to be performed is identified first, and each parameter afterwards is separated by whitespace. So “2+3” is written as “(+ 2 3)”. As noted in the May 2006 Wikipedia, this syntax “is extremely regular, which facilitates manipulation by computer. The reliance on [s-]expressions gives the language great flexibility. Because Lisp functions are themselves written as lists, they can be processed exactly like data: allowing easy writing of programs which manipulate other programs (metaprogramming).” In short, s-expressions are a powerful and regular way to represent programs and other data.
I’ve written a lot of Lisp code, so I’ve learned to read s-expressions fairly well. (I wrote a lot of Lisp code in the late 1980s on a $120,000 system.) But I am never the only one who reads my programs -- I need to make sure others can read my programs too.
Here’s the problem: for most software developers, programs written solely using s-expressions are hard to read, and they will only voluntarily use programming languages that allow, at least optionally, a more common notation. This is particularly true for the usual infix operations (+, <, and so on). People who use Lisp-based languages all the time eventually learn, but not everyone wants to use them all the time, and even developers who are comfortable with programs in s-expression notation need to share their work with others. Wikipedia notes that “the heavy use of parentheses in S-expressions has been criticized -- some joke acronyms for Lisp are ‘Lots of Irritating Superfluous Parentheses’, ‘Let’s Insert Some Parentheses’, or ‘Long Irritating Series of Parentheses’ “.
Yes, I know the arguments. “S-expressions are powerful and regular” ! Of course they are. They are a wonderful intermediate representation for lots of things, in fact. But they are a terrible user interface, especially if you are trying to share your results with others. People today -- even most programmers -- want systems that are “easy to use”, and one of the best ways to make something easy to use is to make it familiar. Most software developers have been trained for many years to use traditional infix mathematical notation, and S-expression notation fails to use it. Programming is often not a solo effort; development today is practically always a group effort, and readability for a large diverse group matters.
All the loud statements about the power of S-expressions cannot compete with time looking at real programs. Most software developers laugh at languages that are “obviously weak” (to them) because their default parser cannot handle <, *, and - in conventional ways. Below is a trivial Common Lisp program to compute a factorial - now ask, “is this really the most readable format possible for other readers?”
(defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))
Maybe you think so now, but I use this trivial program as an example to show some alternatives that I think many people would prefer. For example, here’s the same program, but exploiting the abilities of sweet-expressions; a reader for sweet-expressions could read both the previous expression and this one:
defun factorial (n) if (n <= 1) 1 n * factorial(n - 1)
The 1993 paper “The Evolution of Lisp” by Guy L. Steele, Jr. and Richard P. Gabriel section 3.5.1 discusses “Algol-style Syntax”, give a history, and slyly makes fun of efforts to try to use anything other than S-expressions: “Algol-style syntax makes programs look less like the data structures used to represent them. In a culture where the ability to manipulate representations of programs is a central paradigm, a notation that distances the appearance of a program from the appearance of its representation as data is not likely to be warmly received... it is always easy for the novice to experiment with alternative notations. Therefore we expect future generations of Lisp programmers to continue to reinvent Algol-style for Lisp, over and over and over again, and we are equally confident that they will continue, after an initial period of infatuation, to reject it. (Perhaps this process should be regarded as a rite of passage for Lisp hackers.)” Paul Graham posts a longer version of this quote.
Perhaps. But it’s worth noting that thousands of languages have been invented over the years, but almost none decide to use S-expressions as their surface expressions. Smalltalk and Python took many ideas from Lisp (see Norvig’s “Python for Lisp programmers” for more)... but not its surface syntax. Languages like ML and Haskell have strong academic communities... and do not use S-expressions for their surface syntax either. Logo was devised to have Lisp’s power, but intentionally chose to not use its syntax. Dylan actually switched from S-expressions to a more traditional syntax, when they wanted “normal” people to use it. Yacas is intentionally Lisp-like underneath, but completely abandoned Lisp's surface syntax for normal user interaction (as have essentially all computer algebra systems, even though many have a Lisp underneath).
The fact that so many people are compelled to find a “fix” to Lisp syntax indicates to me that there is a problem... the fact that so many efforts fail suggests that it is a hard problem. Norvig’s Lisp retrospective found that even in 2002, when comparing Lisp to Python and Java, Lisp was far faster and more extensible, and had many powerful properties... but it was gaining few users (it was mostly stagnant). I argue that one key reason is the syntax; Lisp's syntax looks genuinely hostile to most “normal programmers”. Instead of Lisp-like languages gaining converts who adjust to prefix syntax, many people are ignoring or abandoning Lisp-like languages to use languages designed to be readable by humans. John Foderaro correctly said, “Lisp is a programmable programming language”, but people expect standard notation to be already available. Obviously it’s easy to create a some kinds of front-end syntax for Lisp-like systems, but they generally don’t catch on -- in part because they usually don’t provide the full power of the system underneath (e.g., you often can’t do backquoting with comma-lifting, or insert new parameters in a control structure, or handle macros well). I prefer to use the best tool for the job; when it's not Lisp, don't use Lisp. But some jobs are naturals for Lisp-based langauges, yet their poor readability seriously interferes with that. I think there are ways to retain the power of Lisp-based languages, while improving their readability and retaining their flexibility.
Oh, and let’s debunk one claim here. Steele and Gabriel claim one problem with Algol-like syntax is that there are “not enough symbols”. Yet even if that were true, by combining characters lots of symbols can be created without resorting to fancy characters. Indeed, <= is a common combined character for “less than or equal to”; nobody has trouble understanding that.
So below, I discuss a lot of past and current work to improve the syntax of Lisp-like languages. I then focus on a few especially promising areas that can make Lisp more readable while still accepting normal s-expression notation and having only minimal changes to it. These include:
My goals were originally for the BitC project, but now I think there might be ways to work more widely for any language that uses Lisp-like notations. Here were my goals for a programming notation for Lisp-like systems:
My notion is that the underlying s-expression system would not change... instead, the system would support a reader that takes an extended notation and converts it into s-expressions. A printer could then redisplay s-expressions later in the traditional notation, or same kind of notation used to input it. It would also be nice if the notation was not confusing or led to likely errors.
I’ve written this document to put at least a few ideas down in writing. In the long term, I suspect what needs to happen is for there to be some sort of “neutral” forum where ideas can be discussed, and code shared. For any syntax to be widely used, there must be trustworthy, widely-usable implementations. I think at least one FLOSS implementation with a generous license that permits use by proprietary programs and Free-libre / open source software (FLOSS) programs. Note that the LGPL doesn’t work as intended with most Lisp implementations; Franz has created the Lisp LGPL (LLGPL) which is a specific clarification of the LGPL for use with Lisp. Lisp code is typically licensed under the LLGPL instead of the LGPL, since the LLGPL clarifies some otherwise sticky issues and ambiguities in the LGPL. Note that any FLOSS software should be GPL-compatible, since there is so much GPL’ed code. The implementations would need to be widely portable and modular, so that they can be widely depended on.
Scheme and Common Lisp aren’t really compatible at all. Translators like scm2cl (Scheme to Common Lisp translator) could help initially, certainly to get started (suggesting that it'd be best to start with Scheme, and then transition at least an initial version to Common Lisp). But in the end, specialized implementations well-tuned to different environment will be necessary for an improved syntax to really work.
There are lots of Lisp code repositories and other Lisp-related sites, including Common-Lisp.net, Lisp.org / Association of Lisp users (ALU), CLiki (Common Lisp Wiki), and schemers.org.
Most readability efforts focus on creating special syntax for every language constructs; these often end up unused (because they cannot keep modifying the grammar to match the underlying system), or end up creating a completely new language less suitable for self-analysis of program fragments.
“The Evolution of Lisp” lists many efforts to create "Algol-like notations for Lisp", which generally included infix notations and attempts to be more “readable”. Originally Lisp was supposed to be written in M-expressions, which were more traditional in format. Function calls were written as F[x;y...] instead of (F x y...), for example. One problem is that they kept adding new syntax, and never found a good “final” M-expression format, so it was never implemented... and it just receded into the never-finished future. Many other notations were developed, including those of LISP 2. These generally had if ... then ... else and other more traditional naming conventions.
For example, RLisp (used by Reduce, among others) is a Lisp with an infix notation. A yacc grammar for RLisp by A. C. Norman (2002) is available. Norman notes that the grammar is “ambiguous or delicate in several areas”:
One trouble of many of these notations is that it becomes difficult to see where the end of an expression is (e.g., there might be no way to indicate the “end” of the if statement); this can creates ambiguities and makes it harder to easily match the infix notation and s-expression if you need to. And, if you want to describe arbitrary Lisp s-expressions, not having an end-marker means that you may not able to access some of the capabilities of the underlying s-expressions (though that may not be an issue for all uses).
Using a hierarchy of Domain Specific Languages in complex software systems design by V. S. Lugovsky discusses using Lisp to create domain-specific languages, including syntactic transformations.
The ACL2 language is Common Lisp-based, but it has a separate front-end for a more traditional interface including an infix processor; it is named IACL2. This is not often used, for several reasons:
Logo is basically Lisp with an infix and more readable syntax. Instead of “(”...”)”, Logo uses “[”...”]”. Normally, all commands begin with the name of the function, just like Lisp, and Logo even has text names for math functions: sum, product, difference, and quotient. More interestingly, infix is also available... by using symbols, Logo automatically uses the infix forms instead. Logo knows the number of parameters for each function, so once the number of parameters is provided you can just provide another function call on the same line without any marking of the end of the call (see the “rt” call below). This is not as flexible as s-expressions, where you can always add another parameter. Here’s a sample Logo program:
to spiral :size if :size > 30 [stop] ; a condition stop fd :size rt 15 ; many lines of action spiral :size *1.02 ; the tailend recursive call end
Dylan is another Lisp with more conventional notation, including an infix format. Here’s a Lisp-to-Dylan translator, exploiting the Common Lisp pretty-printer. D-Expressions: Lisp Power, Dylan Style even shows it is possible to combine infix forms with Lisp’s abilities to manipulate programs. Dylan is a little wordy, in part because of namespace problems (types are in the same namespace, which is often not what you want), but it’s easy to read. It ends blocks with “end blockname”, e.g., “if (a) b else c end if”; a little long but clear. Here’s a simple example from the page Procedural Dylan:
define method distance (x1 :: <real>, y1 :: <real>, x2 :: <real>, y2 :: <real>) => distance :: <real>; sqrt((x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1)) end method distance;
There is also CGOL (an Algol-like language that compiles into Common Lisp); this was developed originally by Vaughn Pratt, and written by a UC Berkeley graduate student, Tom Phelps. This program is a Common Lisp implementation of CGOL that is translated into Lisp before execution. I do not know what its license is. You can look more generally at the CMU archive, though beware of licensing problems.
Clike (see also Clike on Sourceforge) is a compiler which converts code written in a simple C-like language to Scheme. Pico is an educational Scheme with C-ish syntax.
TwinLisp is a "new way of programming in Common Lisp"; it creates a new reader with a more conventional syntax, which it then translates to Common Lisp and executes. It predefines precedence - which unfortunately means that it is tied to a particular interpretation of all operations. It also defines special control structures; thus, like most past efforts to create "readable Lisp", it ends up sacrificing generality of the resulting format (the reader is rigged to a specific set of operators with special syntax - so creating new operators is not simply a matter of defining the meaning of the operator).
Lisp resources gives pointers for where to go for more information. For extensions specifically focused on indentation, see the text below on indentation; for extensions specifically focused on infix notation, see the section on infix.
Skill from Cadence is a proprietary Lisp-based extension language. By examining that paper, this text, and this document, a hint of its syntax can be inferred. Skill supports name-prefixing, where FUNC(x y) is treated as meaning (FUNC x y). Skill also supports infix processing, automatically translating infix operators into an internal prefix format. In Skill, it appears that all of the prefix functions have alphabetic names, e.g., "(plus 3 4)" adds 3 to 4. This means that infix operators can be unambiguously detected and used without surrounding whitespace; thus "3+4" is automatically translated to "(plus 3 4)". Skill does not consider indentation significant.
I point out Skill specially because it does shows that it's possible to have a very general syntax that is easier to use.
Paul Graham began devising a Lisp variant named Arc, and promulgated the term Syntax as abbreviation for his approach. Here’s what he said:
Arc has inspired many people to re-evaluate Lisp syntax, and see if there are ways it can be made more readable. Some comments on Arc made some interesting points:
Kragen Sitaker’s email “two-dimensional syntax for Lisp” summarizes some ideas from Paul Graham and Arc, but the author seems to add his own points too. He says:
A conversation about Arc and related ideas is posted on reddit.
BitC version 0.9 uses this “syntax as abbrevation” approach. You can indicate that “Value” has type “Type” using the quasi-function “the”, e.g., as (the Type Value), but the preferred form is “value : type”. It does not go far right now, though; + is still prefix (not infix).
Infpre is a Common Lisp infix utility (LGPL license).
Quack is yet another Lisp derivative. This adds two syntactic items. First, "postfix colon syntax" - colon with indentation as a shortcut for parentheses. Second, "infix colon syntax" - a colon without spaces around it are a shortcut for function/macro call, right-associative, so cdr:cdr:obj is (cdr (cdr obj))). He doesn't know if they're workable, and I don't find the examples convincing. Indentation has promise, but the colon ends up in a bad place and it's completely unnecessary. And while an infix operator like the infix colon is useful, making that the only infix operator is rather weak; where's 3 + 4?
Lisp sans parentheses hell discusses some similar ideas:
1. We use () to indicate grouping. ((((a)))) will mean the same thing as (a). 2. We use right associative infix operator notation - a * b + c translates to (* a (+ b c)) in lisp. 3. We use indentation (tab-size = 4) to convey grouping - say hello becomes (say hello (to lisp)) to lisp 4. We use square brackets notation to encode lists - for instance - [a,b,c] means (a b c) The items in a list can also be separated by line breaks. So [ one two means (one two three) three ] Since indentation is significant, [ one two means ((one two) three) three ] There are a few special operators that help reduce the need for parentheses and aid readability - 1. a : b translates to (a . b) and due to right associativity, a : b : c translates to (a b . c) 2. a :: b translates to (a b) 3. a := b translates to (define a b) Here are the cases where you won’t need parentheses - 1. Your expression starts on its own line. In this case, the first term is the head term and the rest of the terms on the same line are tail terms. 2. Your expression is the second argument to an infix operator. For example - mid :: quotient (low + high) 2 is equivalent to mid :: (quotient (low + high) 2) Exceptions - 1. Use func() notation to indicate that you mean (func) (a function call) and not just the value of func. Limitations of current implementation - 1. Back-quoting not supported. 2. The use of comma separator outside of list expressions is not defined and handled well enough. Enjoy! ... and, for the record, I’ll stick to the parentheses, thank you :)This aids readability, but note the inability to handle all of Lisp's capabilities (like back-quoting), and that the creator won't use it (clearly a sign that it's not good enough).
So now, let’s turn to three areas that have great promise, and then see if we can combine them together.
Lisp programs are normally presentated using indentation; the Common Lisp standard even includes a pretty printer! Experienced Lisp programmers eventually stop seeing the parentheses, and see the structure of a program instead as suggested by indentation. So why not use indentation to identify structure, since people do anyway, and eliminate many unnecessary parentheses?
Other languages, including Python, Haskell, Occam, Icon, use indentation to indicate structure, so this is a proven idea. Other recently-developed languages like Cobra (a variant of Python with strong compile-time typechecking) have decided to use indentation too, so clearly indentation-sensitive languages are considered useful by many. In praise of mandatory indentation notes that it can be helpful to have mandatory indentation.
Paul Graham (developer of Arc) is known to be an advocate of indentation for this purpose. As I noted above, Kragen Sitaker’s notes on Graham and Arc discusses how indentation can really help (in this notation, functions with no parameters need to be surrounded by parentheses, to distinguish them from atoms - “oh well” ). Graham's RTML is implemented using Lisp, but uses indentation instead of parentheses to define structure. RTML is a proprietary programming language used by Yahoo!’s Yahoo! Store and Yahoo! Site hosting products, though Yahoo are transitioning away from it. Paul Graham’s comments about the RTML language design and this introduction to RTML by Yahoo.
Darius Bacon's ”indent” file, includes his own implementation of a Python/Haskell-like syntax for Scheme using indentation in place of parentheses, and in that file he also includes Paul D. Fernhout's implementation of an indentation approach. Bacon's syntax for indenting uses colons in a way that is limiting (it interferes with other uses of the colon in various Lisp-like languages). I have not had a chance to examine Paul D. Fernhout's yet. (It also includes an I-expression implementation.) All of the files are released under the MIT/X license. (Darius Bacon also created mlish, an infix syntax front end listed earlier). Lispin discusses a way to get S-expressions with indentation.
I-expressions are an alternative method for presenting s-expressions (either program or data), using indentation. They are defined in SRFI ("surfie") 49; this has final status, making I-expressions a quasi-official part of Scheme. I-expressions have no special cases for semantic constructs of the language. SRFI 49 includes a sample implementation with an MIT-style license (based on the Sugar project).
Here’s an example, quoted from the Scheme Requests for Implementation (SFRI) number 49 (this example uses Scheme’s “define” function, not Common Lisp’s “defun” function):
define fac x if = x 0 1 * x fac - x 1
I-expressions can include traditional s-expression representation too; here's an example (using Scheme):
define (fac x) if (= x 0) 1 * x fac (- x 1)As you can probably guess, both of these are equivalent to this traditional s-expression representation:
(define (fac x) (if (= x 0) 1 (* x (fac (- x 1)))))
The keyword “group” is used to begin a list of lists. (One minor drawback: it's more difficult to use a function named "group", so it's best to simply not create such a function.) You can drop back several indentation levels without dropping them all:
let group foo + 1 2 bar + 3 4 + foo bar
The SFRI permits both tabs and space characters. I-expressions can also be quoted (including being quasi-quoted); see the proposal for information. Note this means that ’ followed by whitespace is now the beginning of a quote, because the initial whitespace is significant, and newlines become important.
In any indentation system for s-expressions, you need to be able to express s-expressions such as (f a), (g (h 1)), and (j (k)). In the first case, the first parameter is simple symbol, in the second case, the first parameter is a call with at least one parameter, and in the third case, the call to k has no parameters at all. I-expressions represent 0-parameter function calls by having the parameter surrounded with (...) (or having () as a name-ender). Let’s see what the indentation rules by themselves look like:
; Here is (f a): f a ; or f a ; Here is (g (h 1)): g h 1 ; or g h 1 ; or g (h 1) ; Here is (j (k)), note the treatment of 0-parameter calls: j (k) ; or j (k)
The idea is simple, but it can be hard to reason about how it works -- so here is one way to think about it (if you're using this to write a program). On any line, the first term is the function to be called; the rest of the items on the line, and each indented line below, are its parameters. If a line has no other parameters (on the rest of the line or indented), then it's an atom; otherwise, it's treated as one expression and parentheses surround that line and its indents (if any). The “group” term creates an extra surrounding (...) in the resulting s-expression, so you can create lists of lists.
So how do you present computing a function and then calling it? Personally, I’d just switch to traditional s-expression notation in this case. But the I-expression representation actually isn’t bad; just use a “group” followed by how to compute the function to be called:
; ((getfunction x) a b) can be represented as: group getfunction x a b ; or: group (getfunction x) a b
The SFRI supplies Guile code (not fully portable Scheme code) to implement I-expressions (e.g., it uses define-public and not define). However, it should be easy to port.
In the I-expression sample implementation a “(” disables I-expression processing until its matching “)”, and this is an intended part of the definition of I-expressions. This has all sorts of wonderful side-effects:
There are a few special cases where the reader with I-expressions enabled parsed a file differently. When two top-level s-expressions follow each other, either (1) on the same line or (2) on consecutive lines with the second expression indented, the lines will be combined by an I-expression reader. The first seems unlikely to me; the second is a little unfortunate. What's worse, though, is that in a read-eval-print loop, users have to enter return twice to see their results, and it's easy to end up trying to evalute an empty list. We'll discuss those below in possible extensions to I-expressions.
The mailing list discussion of I-expressions includes a posting of an I-expression pretty-printer; the author says "I just couldn't get to sleep, so I wrote one". Be warned: I believe this pretty-printer has a bug, because it does not handle (f (g)) correctly. The first pretty-printer I wrote for I-expressions in Common Lisp had the same error, too, so this is a common mistake when doing this task - be warned. The expression (f (g)) should print like this:
f (g)
I believe that users who use indenting expressions should only use space characters, and never tabs, to indent. The R5RS Scheme specification doesn't even officially permit tab characters (though implementations generally do). The real problem is that tabs produce a varying number of spaces on real systems; experience with Python suggests that using tabs can cause a lot of problems. Most of today’s text editors can be configured to turn tabs into spaces automatically.
I-expressions and similar indentation approaches are a very nice approach. But there are several ways it could be extended or modified, now that I've had a chance to experiment with them.
If you use indented I-expressions interactively (in a read-eval-print loop), you have to enter an extra a blank line (an extra "enter") before the expression is evaluated. That's because the system needs to know when an expression has ended; the system believes you might indent the next line and add more expressions. This is fine for multi-line expressions, but it's very annoying for simple single-line expressions. If you do a lot of simple one-line commands this becomes very annoying. And this is likely; few people type in long, multi-line definitions into a read-eval loop, because they would stick those into a file instead!
So how can we make indenting easy to use at a command line? We could create a special command for use at the beginning of a line, such as "! ", that means "at the end of the line, you're done" . Or we could create a special indicator that means "execute now" . But these don't seem any easier than entering a blank line, and they are yet something else to remember.
One simple modification could be to add a special case, based on whether or not text was entered in column 1. There are several obvious variants:
x (load "file") load("file") (3 + 4)However, the following will not (they will wait for a blank line) if entered on the leftmost edge:
load "file" define x 3 + 4The system will wait for a blank line in these cases, because the first term is followed by a space, not a newline... so it is waiting to see if you'll add more parameters by indenting them. The 'define x' is an example of good news (probably), and the 'load' command is an example of the bad news. Another problem is that the rule is a little complicated to explain. The good news is that 'define x' on the left edge will not trigger an immediate return. With these semantics, if you want to enter an indented expression, you'll need to indent the first line or include at least one parameter (expression) on the same line. If you indent at all on the command line, or enter one or more terms on one line, you must type an extra blank line to execute the sequence; this seems reasonable.
load "file" 3 + 4 define x
The good news with this approach is that 'load "file"' on the left edge works as a user might expect - it will trigger an immediate return. The same is true for 3 + 4 ... suddenly an interactive Lisp interpreter makes a plausible calculator!
The bad news is that 'define x' on the left edge followed by return will also trigger an immediate return of the read expression, which is almost certainly not what was intended. That is not such a big deal in an interactive command loop, since the user could try again. But more seriously, this would be a likely thing to do in a file, and it could cause very hard-to-detect bugs.
How can we counter the risks of not noting the change in semantics for text on the left edge? It might be possible to have a special loop for file-reading that detected the case of "unindented expression followed immediately by indented expression", and trigger a warning or error. Another variant would be to have two slightly different modes for read (or two different starting functions): one intended for interactive use, where unindented lines are ended on newline where possible, and one intended for file reading (where there is no such distinction). This does complicate reasoning and implementation, and it's quite normal to cut-and-paste into an interactive session.
One promising approach to countering the problem is to configure other tools to help detect such problems. Text editors, for example, could be colorize such constructs. Thus, if a left edge that would run immediately is followed by an indented line, it would be labelled with red lines. And special tools could be designed to detect this as well.
One good point is that the rule is simple: "if you want to indent, you must start by indenting the first line". When you indent, you enter a blank line or another expression at the same indentation level. Frankly, that's much simpler than the rules above.
x (load "file") load("file") (3 + 4) 3 + 4 3 + (4 * 2)And these will not:
define x load "file" 3 + 4 * 2 3 + 4 +
For sweet-expressions, for a long time I keep going back and forth between option 2 and option 3. Option 3 presents the risk of subtle bugs, and that is of great concern. However option 3 is a joy to work with interactively. I then said, "This is a hard choice to make, so I plan to experiment. Expressions like load("filename") and (3 + 4) seem particularly easy to explain, and do not have the drawbacks of this alternative, so they suggest using alternative 2." It was only in November 2007 that I finally realized that what I wanted was option 4.
Note that when activated these also partly eliminate one of the minor incompatibilities of I-expressions with traditional s-expressions. If there are two separate outermost s-expressions on consecutive lines, with the second expression indented from the first, as long as the first expression was not indented (a likely case) the lines will no longer be combined by the I-expression reader. Of course, this could be misleading, because the second line might appear to be combined with the first; this is a trade-off with no perfect answer.
Excess blank lines get interpreted oddly, as '(), which the underlying system may then try to run (Scheme in particular complains about them). It can also cause confusion if interpretation gets "out of sync".
A plausible modification would be to ignore any additional blank lines before a (next) expression. In short, a blank line shouldn't really return '() to the system; if they want that, users should need to enter that directly. If there blank lines at the end of the file, the system should probably return the end-of-file marker (if it supports that) once it has consumed the final blank lines trying to read the next expression.
An alternative would be to ignore any additional blank lines after an expression, but you don't want to do that. Then you need to treat the beginning of a file specially (assuming that there was a beginning of file).
Also, should horizontal spaces followed by newline be treated the same as a blank line? Users cannot easily see the difference, especially on typical printouts. That isn't clear, so I leave that as a question for now.
Lisp’s standard notation is different from “normal” notation in that the parentheses precede the function name, rather than follow it. Jorgen ‘forcer’ Schaefer argues that this is a more serious problem than the lack of infix notation; on July 2000 he said “I think most people would like Scheme a lot better if they could say lambda (expression) ... instead of (lambda (expression) ...”
Peter Norvig had some interesting ideas, as noted earlier. Let’s look at one of his rules, the rule that says “if a function name ends with an open parentheses, move it inside the list (when converting to an s-expression)”. This means that “(fact x)” and “fact(x)” will mean the same thing.
Obviously, this is trivial to parse. We don’t lose any power, because this is completely optional -- we only use it when we want to, and we can switch back to the traditional s-expression notation if we want to. It’s trivially quoted.. if you quote a symbol followed by “(”, just keep going until its matching “)” -- essentially the same rule as before! Technically, this is a change from some official Lisp s-expression notations and implementations. For example, entering “a(b)” into CLisp (a Common Lisp implementation) is the same as “a (b)” -- its parser tries to return the value of a, followed by running the function b. But it’s not clear it’s a big change in practice; commonly accepted style always separates parameters (including the first function call name) with whitespace. So normally, what follows a function call’s name is whitespace or “)”, and this is enforced by pretty-printers. Thus, many large existing Lisp programs could go through this kind of parsing without resulting in a change in meaning!
Does this help? Let’s rewrite the CL factorial example, but not make the infix operations do this:
defun(factorial (n) if((<= n 1) 1 (* n factorial((- n 1)))))
It looks slightly more familiar, but not that much. Let’s try again, but move the infix ops out too; the result is actually not bad:
defun(factorial (n) if(<=(n 1) 1 *(n factorial(-(n 1)))))
If we really wanted it to look conventional, we could use wordy names instead of symbols that are traditionally infix; that isn’t too horrible for + (use “sum”), but that is rather wordy for others (I don’t like it):
defun(factorial (n) if(lessequal(n 1) 1 product(n factorial(subtract(n 1)))))
Note that as far as I can tell, you do not need any whitespace after the opening parentheses, because atoms (including function names) cannot have parentheses in their name in s-expressions. A variant of this idea would be to require whitespace after the opening paren if name-prefixing is used, but I see no need for that.
Skill from Cadence, a proprietary Lisp-based extension language, also supports name-prefixing.
Mathematica does something similar. As noted in How Purely Nested Notation Limits The Language's Utility, its FullForm notation can be transformed into Lisp using simple rules, starting with transforming f[a,b] into (f a b). But it isn't designed to be backward-compatible with existing Lisp notations; its use of "," to separate arguments would cause confusion since "," is also used in macro handling.
Name-prefixing can be combined with indentation (e.g., I-expressions). Let’s presume that if you indent further and begin the expression with a parentheses, or by a function name, no space, and a parenthesis, that parenthesis and its mate are silently ignored (they don’t add yet another subexpression in the s-expression). Furthermore, let’s presume that if you have a name-ender format (function name followed immediately by open paren), it does not switch to some “no more I-expression” mode. Then you could do this (using English names for operators):
defun factorial (n) if lessequal(n 1) 1 product n factorial(subtract(n 1))Or this (using traditional symbols for operators):
defun factorial (n) if <=(n 1) 1 *(n factorial(-(n 1)))
It turns out that name-prefixing works well with indentation. Here are some examples:
; Here is (g (h 1)) <=> g(h(1)) g h(1) ; or g h(1) ; Here is (g (h)) <=> g(h()) g h() ; or g h() ; Here is (g (a) (b 1) c): <=> g(a() b(1) c) g a() b(1) c ; or g a() b(1) c ; Here is (g a (b 1) c()) <=> g(a b(1) c()) g a b(1) c() ; or g a b(1) c()
This does introduce the risk of someone inserting a space between the function name and the opening “(”. But whitespace is already significant as a parameter separator, so this is consistent with how the system works anyway... this is not really a change at all.
I think this is slightly better for untrained eyes. It’s hard to argue that this is a major improvement, at least by itself, but the more I look at it the more I like it. What’s more, the pain is tiny, and that’s a good thing. This is a lower pain, lower gain approach, and it can be combined with SWP-expressions, which I’ll describe in a moment.
The article Improving lisp syntax is harder than it looks discusses name-prefixing, but I think it makes a number of errors. It first claims that this would be hard to integrate with macros - this actually isn't true if it's built into the reader (and the macros aren't doing reading themselves). If the reader transforms a(x) into (a x), then when the macro has a chance to run all it sees is (a x) - exactly what it was expecting to see. He also says that "Another disadvantage to this change of syntax is that it makes functional programming much more odd looking. Lets say you have a list containing functions and you want to call the first one. In Scheme you write ((car lst) params) and in Common Lisp (funcall (car lst) params). However in our new syntax it looks like: car(lst)(params) and funcall(car(lst) (params)). Neither of these is very elegant, and it only gets worse if that call in turn returns a function, which would look like: car(lst)(params)(params2) and funcall(funcall(car(lst) (params)) (params2))." But I find this remarkably elegant, and better than the traditional notation - to do functional programming, just cuddle up the parentheses. It's much easier to understand sequential parentheses compared to a deeply nested list.
Again, let’s continue the thought that we want to support a syntax that is maximally Lisp-like, generally accepting existing expressions. The biggest issue in making Lisp s-expressions easier to read are in whether or not to provide infix support, and if so, how.
The first question is, should you support infix at all? Because if we do, and we presume that the underlying s-expressions always have the operation first, this means that we are changing the external presentation of (some) data. Note that we are only changing the presentation for entering programs and program-like data, and possibly changing how they are externally displayed; the traditional s-expression would still be used internally. Thus, if “{3 + 4}” is interpreted as an infix expression, it will quietly be transformed to the s-expression “(+ 3 4)”, so the “car” (head) of “{3 + 4}” would be “+”. The ll1-discuss mailing list includes discussions on the issues of infix.
The reality is that nearly everyone prefers infix notation; people will specifically avoid Lisp-based systems solely because they lack infix support built-in. Even Paul Graham, a well-known Lisp advocate, admits that "Sometimes infix syntax is easier to read. This is especially true for math expressions. I've used Lisp my whole programming life and I still don't find prefix math expressions natural." Paul Prescod remarked, “[Regarding] infix versus prefix... I have more faith that you could convince the world to use esperanto than prefix notation.” Nearly all developers prefer to read infix for many operations. I believe Lisp-based systems have often been specifically ignored even where they were generally the best tool for the job, solely because there was no built-in support for infix operations. After all, if language creators can’t be bothered to support the standard notation for mathematical operations, then clearly it isn’t very powerful (as far as they are concerned). So let’s see some ways we can support infix, yet with minimal changes to s-expression notation.
If we are willing to support infix, there are many options. The big issues are:
There are some code samples that might be used as a starting point for implementations; after going over the issues, I identify some of them.
Nathan Baum (of Bournemouth, United Kingdom) posted on 4/14/2006 8:19:16 PM the idea of using [...] to surround infix notation (I’m sure he’s neither the first nor last). He said, “Lisp actually has a few syntactic shortcuts of its own. Essentially anything that isn’t part of a name is a character which triggers a specialised reader function. Strings, for example, don’t need to be built in to the Lisp core: a reader can be associated with the double quote character, and that’ll return a string object. This also means you can redefine it yourself to get shell/Perl/Ruby-style interpolated strings, for example. Even lists themselves are read using a reader function triggered by [parentheses]. One simple shortcut is to have ‘normal’ Lisp expressions delimited by [parentheses], and infix expressions delimited by brackets...
(print (+ x (- (/ y z) a))) ; or (print [x + [[y / x] - a]] ”.
There are several major options:
Pairs of characters can force infix interpretation, and seem like the obvious idea, but they do have some disadvantages. Infix operations can work across a list, so it makes sense to mark the end of such a list and use a special pair of characters like {...} instead of (...) to mark an “infix list”. There are two pairs available: {...} and [...]. Advantages of {...} is that they are more commonly used for statement blocks in other programming languages (Logo being an exception), and in particular users may want [...] for other purposes (such as identifying lists, like BitC). A disadvantage of using {...} is that in many fonts these characters look extremely similar to (...), making them harder to see; the characters [...] are more distinct. But in either case ({...} or [...]), using a pair means you have to match the correct closing pair, so inside many parens you have to make sure you use the right ones in the right place. This can create much extra work, and it is debatable if they help much in error detection. Also, infix notation is really obvious when you see it... so there really isn’t a big need for a special marker to show its end. Combining this with name-prefixing does not seem hard; just accept “{” for name-prefixing as well.
Combining the paired characters {...} or [...] with indented forms like I-expressions requires careful thinking about what you are doing. In particular, if you want to identify something as a parameter of a function f and switch to infix notation, do you indent inside function f? I think you should, so the system should only add one level of s-expression (from the indent) if the first character after the indenting whitespace is “{”, instead of the two that would otherwise be implied. In all other cases, other than the beginning of a line, a “{” would normally end up matching a “(” in the corresponding s-expression if you want to stay similar to usual s-expression syntax (if not, see the relaxed infix notation such as ZY-expressions, below). This means that you need to think carefully about the meaning of {...} (or [...]). Basically, don’t think that {...} create a matching s-expression (...). Instead, have the mindset that the infix operators (+, -, etc.) create the s-expressions, and that the surrounding characters {...} simply clarify that what’s inside has an infix interpretation. Also, do {...} disable I-expressions inside them (as I’ve proposed (...) do), or should a “{” at the beginning of a line only switch to infix without disabling I-expressions? I’m not sure what the meaning of indented sub-expressions would even be, if indentation still mattered. Thus, for the moment, I’ll assume that {...} disable I-expressions inside them, as I’ve proposed (...) do, for the simple reason of simplicity: if an infix expression is so complicated that it goes beyond one line, indentation like I-expressions are more likely to harm than help.
Note that if it’s not automatic, there’s a follow-on question... how deep does infix go when it is enabled? Do the infix operators only work at one level (one list), or do they keep going down to lists contained inside? This question doesn’t come up if the detection is automatic, so let’s address that next.
There are two major variants that I can see: (1) if the first or second parameter is an infix-type operator switch to infix notation (Norvig’s approach), or (2) only the second parameter can be infix to trigger infix processing (my approach). The first approach has the advantage that it can work with prefixed “-” traditionally, e.g., (- x * y) works, becoming (* (- x) y). But there are also many problems with that approach. For example, then it is much harder to use traditional Lisp notation (with the infix operator in front) at the same time, and it’s easier to get confused when looking at traditional s-expression notation. It is all too easy to accidentally trigger the infix notation when it wasn’t intended. And the traditional Lisp notation (- x) is not that hard to write or read; if you support traditional function notation, like -(x), it's even easier. Finally, if you only trigger on the second parameter, you can add a subtle safety check: the switch to infix notation can be made to only occur if there are three or more parameters. This safety check avoids unintentionally enabling infix in some cases, and makes it very simple to implement trivial quoting mechanisms: since expressions like “(self +)” will not be interpreted as being infix, using functions as parameters is simplified. In fact, you can go further on the safety check: You can say that infix lists have to have an odd number of parameters, and that the first parameter (as displayed) cannot be an infix operator (so a list of infix operators won't cause problems). I think the second approach is much better. For fully-traditional infix notation, the first is slightly better, and we’ll discuss that later, but for our purposes I don't think it's as useful.
If it’s automatic, now you need to determine the legal pattern of infix operators, since they will trigger the infix interpretation. This is probably some sort of regular expression; we’ll discuss this below.
You’ll also need a syntax to say when it is not infix at least (to disable the automatic part). Again, this can be identified using surrounding pairs such as {...} or [...] around non-infix operators, or using single expressions before the first or second parameter to declare that this is not infix. For functions, in many real-world use cases Common Lisp’s #’ notation or simply “(function +)” would do. Indeed, if at least three parameters are required for infix, any function that simply returns its second parameter could do the escaping. As noted below, I'm currently using "as(+)" aka "(as +)" to do this escaping.
Automatic approaches, either way, are in some sense a little “hackish”.. the second parameter determines the order of parameters in the resulting s-expression! Yet it produces very nice-looking results, and I suspect that over time it will become workable. After all, in all other languages you have to look for infix operators too, so this actually not that strange a requirement. And the results (shown elsewhere) are startlingly clear.
You might still have an optional “make this infix” syntax indicator as well, or at least a “warning, this is infix”, which can be used when printing. That way, when you read a result as infix, you’ll get warned that the expression is infix. If this is optional, you might still want to make it very short, since you may have to read a lot of them in printouts. For example, let’s say that #_ means “no infix” and is placed before the first parameter (mnemomic: “base case” ), while ? means “infix notation follows” is placed before the first (displayed) parameter (#^ might make a reasonable alternative). Thus, “(3 + 4)” , “(?3 + 4)”, and “(#_+ 3 4)” could all mean the same as “(+ 3 4”), and the “car” of all of them is “+” . Or, if [...] are the infix warnings, [x >= 4] is a way to hint that the s-expression is actually (>= x 4).
Of course, if the notation permits mixed infix and prefix notations, you can easily have a “manual” notation that just means “turn on autodetection all the way down from here” and end up having a combination of properties. Alan Manuel K. Gloria’s “infix notation macro” is a good example of a mixture - in this implementation, there must be an initial marker (nfx ...), but it then descends all the way down to all subexpressions. As a result, you could use the marker at a very high level, and have a result similar to automatic detection. A similar result could be had by having a function like “(enable-infix)” or “(sugar)” which you could insert at the beginning of a file as a top-level command (this could also enable indenting and the name-prefixing notation), which would then enable the infix operations (even at the topmost level) until specifically disabled. Using such a command would mean that everything below, for the rest of the file, would use automatic infix detection... but since there would be a specific a command to enable it first, it would be compatible with existing code (which didn’t enable infix). A command-line flag, or invoking a command with a different name (possibly via a different file extension), could enable automatic detection of infix notation.
If detection of infix operators is automatic, now you need to determine the legal pattern of infix operators, since they will trigger the infix interpretation. Even if it’s manually triggered, if you permit arbitrary expressions, or want error-checking of the operators (”you said this was infix but there were none”) you’ll need to detect infix operators.
There are several options:
If we use a pattern, what is the pattern? Patterns are often expressed as regular expressions (REs), and an obvious RE for this purpose is: [+-\*/<>=]+. Basically, it has to be a set of one or more special characters. (One alternative, which might be nice, would be that it has to start with a special character, but after that, other characters are allowed; be wary that +h1 is an infix operator but +1 is not). This is enough to cover the 4 arithmetic expressions (+, -, *, and /), and all the usual comparisons (<, <=, =, >=, >). It’s also enough to cover other operations that people like to have as infix, such as “**” (exponentiate), “->” (implies), “<->” (if and only if), and so on. You can probably add “&”, which is enough to represent common representations of “and”. But there are some issues, especially with the characters not noted here:
One possibility is to use a different form for assignment, such as "<--" or "<==". The term "<==" is actually fairly easy to distinguish, so "x -> a <== b -> c" is easier to read, and "(x -> a) <== (b -> c)" is actually especially nice.
Another possibility is to use "<-" for assignment, and use [...] as a field accessor. That looks nice:
x[a] <- b[c]But if we use [...], we need to figure out what general function to map [...] to, and we just lost the phrase [...] for other purposes (such as describing special lists). (For sweet-expressions, I'll presume that the language or user will define the infix macros/functions, and not add [...] simply because it's easier and more general not to.)
If the printing routines normally convert to infix, then you need to not have a pattern that is likely to be engaged unintentionally... which are good reasons to not have the terms “and” and “or” be infix operators, and certainly a good reason to stay away from anything other than an isolated “:”. You can probably limit the regular expression length, say up to 3-6 characters, to help limit unintentional conversions as well. I would not include “and” and “or”, instead add & and | to the infix operators (and say that | has to be escaped), and limit it so it must be 1 through 4 characters in length. I would accept “:” alone (so it can be used for infix type declarations), but nothing else with colons.
These considerations suggest that “=>” be quietly prevented from being an infix opreator, and then use this regular expression for infix operators;
[+-\*/<>=&\|\p{Sm}]{1-4}|\:|\|\|where “\|” means match the “|” character, and \p{Sm} means “match mathematical symbols, Unicode range U+2200 through U+22FF”.
As a related matter, you need to decide how to separate infix operators from non-infix operators. In many other language families, “x-y” would be parsed as x minus y. Unfortunately, in many Lisp-like languages, names often include typical infix operator characters such as “-”. Both Common Lisp and Scheme would be unusable without the - symbol, because they have many functions with “-” in the name (in addition, Scheme has a convention where “->” embedded in the name indicates a conversion). The simplest and most obvious approach is to require that any infix operator be surrounded by whitespace; this is very easy for humans to remember, and is consistent with normal s-expression syntax anyway. You could require escaping such characters as part of a name, e.g. |simple-string-p| or simple\-string\-p, but this is both ugly and unnecessarily incompatible with common practice. If you’re devising a new language, you could simply forbid such characters in regular symbol names, and then you could automatically detect all the infix operators without requiring that they be surrounded by whitespace. However, for general-purpose parsing, sweet-expressions will require that infix operators be surrounded by whitespace (in cases where function names don't have such symbols, this requirement could be relaxed).
Once we detect that we are using infix notation, what are the semantics of the “infix” notation?
Note that this does not support precedence, so you have to surround different operators with parenthesis or indents, e.g., (3 + (5 * 6)). In one view, that's a disadvantage - you still have to indicate some information that would be automatic in other languages. On the other hand, this makes when there are new lists explicit, and that has some pretty big advantages. It also sidesteps the problems of defining precedence (either fixed or a way to add them). Those are pretty big advantages.
There are many options for determining when things aren't correct, and then deciding if that is an error or merely an indicator that infix was not intended. Note that this only supports an odd number of parameters, and obviously there shouldn't be an infix operator presented as the first parameter in the text. Doing otherwise could be an error, or could be an indicator that it should not be treated as infix. For the moment, I'll plan to consider both as an indicator that infix was not intended. This only permits all the operators to be identical - if they're different, but all infix operators, then the writer probably intended some sort of precedence (and so an error should be reported). Otherwise, infix was probably not intended, so again, I'd probably interpret that as "not infix".
Here, we don’t allow prefix/suffix operators, such as “- x”, or suffix operators, like “x !”. There is a trade-off here; prefix/suffix don’t work well with automatic detection or trying to also read traditional s-expressions correctly - it would be all too easy to misread such expressions. Using name-prefix notation for them, such as -(x), is a very reasonable alternative.
If you have precedence rules, what are they, and are they controllable? It's not hard to add a command that sets precedence rules, but there's always the problem that you risk not getting that called at the "right time" (before reading). Worse, if there are global precedence rules that can be changed, setting them in one environment may manipulate another (unintended) environment, so now we have to have a parameter for (read), and that may be hard to pass down. This is particularly a problem for Lisp-like systems, where you may have multiple levels and meta-levels that you might not want to interact. Another big challenge is that if different developers set precedences, it's harder to combine their code. Different operations may have different meanings in different circumstances, so the lack of control over precedence can be a problem too.
A chain of the same operation can normally be combined into a single operation with all those parameters, e.g., (a + b + c) should become (+ a b c). This adds capability, without any problems or loss.
Everyone agrees that * and / should have the same precedence, both of which have greater precedence than the equal-precedence binary + and -, and that all of these are left-to-right. So in theory that, at least, could be implemented. Originally I thought that it may be best to just implement those universal rules, and normally use parentheses most everywhere else for controlling infix precedence (or at least discourage lots of precedence-setting). Chaining is fine, but as discussed below, precedence causes a lot of problems in many use cases, so in the end I decided against them for sweet-expressions.
So far, I’ve primarily discussed how to read in expressions, but expressions need to be printed too. A likely answer is “just print s-expressions as usual” by default. That means that the output would be “(+ 3 4)”, even if the input was “(3 + 4)”. For debugging code, this is probably a Good Thing. There might be value in presenting expressions back with some infix operations “re-inserted”, at least as some sort of “pretty-printing” function or option. It might be wise to identify which lists are being interpreted as infix in these cases (e.g., by surrounding them with {...}). I think there is no reason to try to redo precedence when re-displaying; showing how the lists are actually stored is very clear and eliminates questions about precedence. Thus, if “(3 + 4 * 5)” is input, the resulting s-expression would be “(+ 3 (* 4 5))”, and the infix-printer might show this as “{3 + {4 * 5}}”.
There is a subtlety when combining name-prefixing with automatic infix: I think people expect an infix expression to be considered a single expression, even though at first it appears to be a multi-parameter list.
For example, most people would accept this syntax for calling function f with two parameters, x and y:
f(x y)
However, if infix is normally done automatically, they would expect this to be computed with f given one parameter, not three:
f(x + y)
Note that this is actually a very nice thing; it means that in many cases, name-prefixing causes the number of parentheses that need writing to reduce. So the original Lisp:
(f (+ x y))becomes:
f(x + y)
There is likely to be a need to control infix processing. These should be a standard way of saying “disable infix” and “enable infix”. You'd like to be able to enable or disable infix at only one level, in particular, so that you could leave infix on as the default and yet disable it in one particular expression. It would be especially useful to have an “enable infix for this one level of expression” operation that must receive an expression that “looks like an infix expression” or it is in error.
The following text are some very early ideas on the topic; I include them here, but it's all very early, and at least some is likely to be very wrong. Still, you may find useful thoughts here.
One implementation issue is that macros work from outside in, and most macros will not expect special additional macros. In particular, it might be very useful to cuddle an infix operator with a macro that says do not use infix, like this (where "as" means "as-is"):
(define as(+) ...)However, the obvious solution will fail; typical implentations of "define" do not expect to be followed by another macro in the definition! This means that any such function reference will need to be removed at read time or very early on during eval time (e.g., by an outside cuddling nfx(...)), because other macros won't know what to do with this.
There should be a standard way of inserting at the beginning of a file or read-eval-print loop “please switch to and from special processing, and for setting some of its options (e.g., infix control). For debugging, you could just print s-expressions as now. However, a standard way to request printing these expressions that shows infix as such, but marking infix operators and non-infix, would be a good idea and each list which could be misinterpreted as an infix expression but is not has another marker. This implies a need for a marker that says “the following list is an infix expression and it’s an error if it is not”. If adding specific new infix operations is supported (such as “and” and “or”), a standard name and syntax for this add operation would be a good idea too. Ideally, the external interfaces for these operations be pseudo-standard across all Lisp-like notations. Bridging the gap between Common Lisp and Scheme might be challenging for a standard interface (in some cases string parameters might be needed; string syntax is now universal, but other syntax is not - e.g., the keyword syntax of Common Lisp is incompatible with the proposed keyword syntax for Scheme).
The notation for controlling all this is to be determined. Here’s a start. Originally, I looked at implementing another # character option. If you want to do that, you might look for sharpsign options that no one one currently uses, examining sources such as the Common Lisp Hyperspec (particularly the sharpsign section), the R5RS Scheme specification, the GNU Emacs Lisp Reference Manual, as well as less-common systems like NewLisp. Using #I for infix is tempting, but Scheme uses #i for inexact; it’s not clear if #I would get interpreted the same way on some systems, but it’s not worth risking and might be confusing to developers anyway. I especially considered using #/ to mean "infix". The #@ combination is used by Emacs Lisp, so we should avoid that. However, Scheme only supports one-character lookahead; if you want to be able to call the "ordinary" Scheme reader for other #-beginning constructs, there isn't an easy way to do that - because to read the character after #, you have to consume the #, making that more difficult to do. So I don't think beginning with # is a good idea.
I’ve looked for some possible notation, under the presumption that this is a future standard format common to Common Lisp and Scheme (so we need something unallocated by either).
A simple approach might this naming convention, with function-call-like macros:
Each of the above could either accept a single list (in which case it's the list that is being referred to), otherwise it is the expression itself as the rest of its parameters that is being referred to. Since any infix expression must have at least three parameters (2 if unary operators are used), this isn't ambiguous.
We could interpret unfx(...) around a second parameter to quietly disable infix processing. Alternatively, we could have a different function/macro name such as "as(...)". This would Treat the inside (function) as-is, so it won't be considered an infix operation (use this if an infix operator is the second parameter, but you don't want the expression to be considered infix). Here's an example:
defun as(+) (left right) ...
One interesting challenge in doing this is recursion in the read function. I decided to implement much of the processing in the reader, and to have the processing go "as I go" in the reader, rather than have the reader automatically add nfx() in front of expressions and then have eval() fix it up by calling a macro. Since the reader is called on ordinary data in s-expression format, it is very inconvenient to automatically have the reader add nfx(), etc. calls! Yet if there is no outermost call to "fix up" the expression, then the outermost parameters would be defun, define, or other terms that would not handle these macro calls correctly. And you dare not remove things piecemeal - read can be called recursively, so if you call read, remove things partly, but then later all "fix up" all the way down again, you can "unfix" things. This is one reason to have a separate "as" function, so that these different forms can be differentiated.
Note that “(s (b))” is interpreted as you might expect during sweet-expression processing, but (s(b)) is interpreted as the s-expression ((s b)), not (s (b)). In practice this is not a problem - nobody likes the format (s(b)) for s-expressions, and not all s-expression processors would even accept them. More also needs to be done to describe their interaction with macros; this is a critical area and hard to get right, but since there is relatively little that is changing, this may not be so bad.
A simple program could transform an existing program to sweet-expression format, including its comments, and with any necessary markers to control infix interpretation. I expect that the need for controlling infix interpretation will be exceedingly rare, so the results should look very nice.
Many, many people have implemented infix processors. Here are some, not including the many larger notational systems (like IACL2) that have an infix notation built into them:
(define (square x) #[ x ^ 2 ]) (square 2) ==> 4 (define (cube x) #[ square(x) * x ]) (cube 2) ==> 8
Obviously, there are lots of different ways to try to make Lisp-like languages easier to read. So let’s combine them. In particular, I’ve identified a particular set of options that I think are tasty.
I have developed a combination I call “sweet-expressions” that I hope will be a good solution. In this notation, normally-styled s-expressions will typically work “as normal”, but various extensions are added that make it possible to create easier-to-read programs:
Previously I planned to use built-in precedence rules if the infix operators differ, but after some experience I'm currently thinking that it'd be better to not support precedence at all. Precedence is nearly universal among programming languages; they're very useful, and only a few infix-supporting languages (such as Smalltalk) lack them. But in the typical use cases of a Lisp-like language, precedence has some significant downsides that are irrelevant in other languages. First, let's talk about a big advantage to not supporting precedence in sweet-expressions: It makes the creation of every new list obvious in the text. That's very valuable in a list processing language; the key advantage of list processing languages is that you can process programs like data, and data like programs, in a very fluid way, so having clear markers of new lists like parentheses and indentation is very valuable.
What's surprising is that precedence turns out to have a host of problems in specific context, leading me to believe that it's a bad idea for this usage. Precedence is very useful in general (and it could be added for special cases), but adding precedence rules to a general-purpose list expression processor creates a slippery slope of complexity. You can trivially create commands that create new infix operators, and express their precedence and direction - no problem. But you're using a Lisp-like language, the use case tends to be different, with objects shifting between being data and being code. Often you're working at multiple levels and meta-levels of input - and it's not likely that you'll want their precedences to be the same. Yet trying to handle that turns out to require very complex management of the reader routine. Also, combining code from different sources can be a nightmare if they have different precedence values. In short, having precedences vary by meta-level creates complexities you just don't want to deal with. That suggests moving to a completely static, pre-canned set of static levels is required. Okay, but someone has to predefine that set. That turns out to be really hard; the meanings of various infix operators are not cast in stone, so it's difficult to preset the precedence levels meaningfully (since you don't know what the operators mean). The one obvious case is that * and / have higher precedence than binary + and - basically everywhere. That particular rule could be built-in, but if that's all you build in, it's not clear that it's worth the trouble. Especially since the idea that completely forbidding mixing the operators means that we make the mapping between lists and the textual representation much clearer. It's easier to add precedence later, if that turns out to be important after more experimentation. But after the current experimentation it appears that precedence simply isn't worth it in this case; it creates complexity, and hides where the lists begin/end. Precedence is great in many circumstances, but not in this one.
Of course, you can write code in some Lisp dialect to implement a language that includes precedence. Many programs written in Lisp, including PVS and Maxima, do just that. But when you're implementing another language, you know what the operators are, and you're probably implementing other syntactic sugar too, so adding precedence is a non-problem. But sweet-expressions are intended to be a universal representation, just like S-expressions are, so their role is different... and in that different role, precedence causes problems that don't typically show up in other uses. Not supporting precedence turns out to be better in this different role.
This approach supports binary infix operators, but doesn't include any special mechanism for unary minus. For unary minus, you must use the name-prefixed form “-(x)” or the prefix form “(- x)” . By only handling binary infix operators, it is much less likely for an expression to be “accidentally” interpreted as an infix expression when we did not mean it to. For example, you can even have quoted lists of infix operators, and freely mix s-expressions that begin with infix operations, without trouble. But there are a few cases where automatic infix interpretation while reading can be troublesome, so there will be ways to control it. Note that an expression is not infix if you do just about anything with the second parameter, or if you use an infix operator as the first operator. This means that:
(2 + (3 * 5)) <==> (2 + (* 3 5)) <==> (+ 2 (* 3 5))
factorial(z) <==> (factorial z) foo(x y) <==> (foo x y) bar(x y z) <==> (bar x y z) factorial(x - 1) <==> (factorial (x - 1)) <==> (factorial (- x 1)) f(g(y) h(y) a) <==> (f (g y) (h y) a) f((x + 2) y (z - 3)) <==> (f (+ x 2) y (- z 3)) f((x + 2) * (y - 3)) <==> (f (* (+ x 2) (- y 3)))
I call this combination “sweet-expressions” - by adding syntactic sugar (which are essentially abbreviations), I hope to create a sweet result. In the few cases where the sweet-expression is awkward, just use the usual s-expression notation (with infix escapes if necessary) -- well-formatted s-expressions are still legal. The result supports quoting and backquoting, is very compatible with typical s-expressions, and should work well with most macros.
There should be standard ways controlling this, particularly for infix interpretation. Here are some early ideas about this.
For the moment, I plan to just implement an "as(..)" expression that can be cuddled around the infix operator (the second parameter). This must be removed at read time or very early on during eval time, because other macros won't know what to do with this. In other words, to ensure that "(defun as(+) ...)" will work correctly, we need to process as(..) before we process "defun". (Alternatively, use name-prefixing: defun(+ ...) works as well). I plan to implement this at read time for now, so that all other macros will work perfectly normally.. by the time normal macros get the expressions, they only see standard s-expression format. (Instead of as(...), I could use some notation beginning with #, but this turns out to be awkward in Scheme (because handling # as a secondary reader is painful), and it also turns out to be awkward in Common Lisp (because functions have a different name space, so you have chained # operators that look confusing).
Controlling infix processing is probably the most troublesome issue. For new code, it's more convenient to presume infix unless disabled - you really only need the occasional "as". Yet for maximum compatibility with existing code, it's better to expressly turn it on and to have control over it, so you want default off, with various ways to enable it. I believe this is very solvable.
For now, I'll just implement the "as" version, which is enough to see how it works.
A related issue is defining what the infix operations are. It seems clear to me that = (equality), <=, +, and so on should map to their prefix versions. I also think that && should map to "and", and || should map to "or" -- these are common conventions in other languages, and thus should be easy to read. I think "**" should become "expt" (exponentiate). But there are many other prefix operations with text-based names that should be defined (by defining a function or macro) that aren't as obvious. All these mappings can be done using regular definitions, without anything special happening in the language.
First, assignment (setf in Common Lisp, set! in Scheme). Gloria suggests using "=" for setting values (setf in Common Lisp), but I think that's a bad idea. After all, "="... that's already the equality test, and having the same symbol mean different things is likely to be disastrous; C already has trouble simply because "=" and "==" look too similar. The phrase ":=" has its problems too; it'd be considered "=" in the keyword space (confusing!) in Common Lisp, and doesn't prefix well. A plausible alternative is "<-", so "x < 5" reads as "set x to the value 5". This also combines well with C-like operators, e.g., "+<-" could be "add to"... that means that "x +< 5" would be the same as "(setf (+ x 5))" in Common Lisp or "(set! x (+ x 5))" in Scheme.
There are lots of other potential operators. Gloria suggests "@" become prefix "aref" (which accesses array "left" at element "right"), "->" become "slot-value-q" (though this conflicts with using it as "implies"), "%" become "mod", "===" become eq, "==" become equal, "&" become logand, "|" become logor, and ":" become "values". I worry about using | and & because they're easily mistaken for the double-character kind. Indeed, Dennis Ritchie admits that the precedence values for bitwise and/or, vs. logical and/or, were a mistake - made worse because of their visual similarity. Languages should designed so that people will be less likely to make obvious mistakes, based on previous measured evidence that these mistakes occur... and this evidence certainly exists in the case of && vs. &.
What should ++ be... increment, string concatenation, something else? There should probably be a string concatenation infix operator, but I'm not sure what to name it.
I would use "->" for implies, and "<->" for iff (if and only if). One danger is that mixing "->" and "<-" in the same expression can be very confusing, but I don't see a simple alternative.
A "standard list" of common infix operations and their meaning would make it easier to read and understand such expressions. But this is difficult to do, and in case only makes sense once there's more experience with sweet-expressions.
What about writing a list back out? It'd be nice to have a written form that could always be read back in as a sweet-expression. An optional writer would be nice - one that added as(...) where needed, and which optionally showed infix as infix, or optionally used indenting. It could be made really obvious that it was the special writer by using name-prefixing where possible -- that would be a glaringly obvious difference from the traditional s-expression writer. So it would write stuff like:
if((x = 5) display("Five") display("Not five"))Alternatively, the writer could normally write without using name-prefixing:
(if (x = 5) (display "Five") (display "Not five"))It could also just write traditional s-expressions, but adding the as(...) in the few places where it's necessary; that'd be good for debugging:
(if (= x 5) (display "Five") (display "Not five"))
Now, let’s look again at our sample Common Lisp program, using traditional s-expression notation:
(defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))
The Scheme version is similar:
(define (factorial n) (if (<= n 1) 1 (* n (factorial (- n 1)))))The two samples above are also legal sweet-expressions. Why? Well, the initial “(” disables I-expression processing, so even if the formatting is ruined, we’re simply processing as usual. There’s no function name ajoined to the right with “(”; normal formatting rules wouldn’t even consider that. As is usual for Lisp code, “infix” operators are listed first (not second) so there’s no infix operator as a second parameter to cause a change.
But what if we take advantage of the new capabilities of sweet-expressions? A pretty-printer could even do this automatically, as a way to start.. and the results look fantastic. Here's the Common Lisp version:
defun factorial (n) if (n <= 1) 1 n * factorial(n - 1)
And here's the Scheme version:
define factorial(n) if (n <= 1) 1 n * factorial(n - 1)
What’s even more interesting, we haven’t lost any power. We can still use quotes, or quasiquotes with comma-lifting. We can still have functions that return functions, and so on. We can also mix up regular s-expressions as part of sweet-expressions, or even mix things up back and forth:
defun factorial (n) if (n <= 1) 1 (* n (factorial (n - 1)))
Here's a trivial example derived from the Common Lisp Tutorial (by Gordon, Haible, and Taube):
(do ((x 1 (+ x 1)) (y 1 (* y 2))) ((> x 5) y) (print y))
Again, the above is also a valid sweet-expression, but we can make it look much nicer instead:
do group x 1 (x + 1) y 1 (y * 2) group x > 5 y print y
More complex s-expression patterns turn out to have regular representations in sweet-expressions. For example, the typical structure:
(cond ((condition1) (result1)) ((condition2) (result2)) ... (t (otherwiseresult)))can be expressed the same way in sweet-expressions. But if you use I-expression indentation, they can become staggeringly more obvious to the uninitiated:
cond group (condition1) result1 group (condition2) result2 group t otherwiseresult
Since the infix operations are “merged” if they are all the same, this works as it would in mathematics in a sweet-expression:
if (0 <= x <= 10) (dostuff) ; x is between 0 and 10 inclusive. (dontstuff) ; x is not between 0 and 10 inclusive.
In the definition of Sweet-expressions above, any "(" disables indentation processing, but name-prefix and infix process continue to work. This is mostly compatible with existing Lisp code, but not completely. Name-prefixing is unlikely to cause a problem in well-formatted code, but expressions that look like infix expressions can certainly occur, and that could be a problem. Basically, it'd be very nice to be able to disable infix processing in some reasonable way.
I thought of one variation, which I describe here.. but then explain its problems.
One solution is to modify sweet-expressions (as described above) so that, if the first non-whitespace character of a newly-read block is "(", more of sweet-expressions could be disabled. We could disable infix processing, at least. Of course, once you do that, why not just disable everything and process it as a traditional s-expression? The advantage of doing this is that you can call the "original" read routine, with all of its local extensions. The rules would then get a little more complex to explain, but you also get massive backwards compatibility. If the first non-whitespace character is a special character (mainly ";", "'", and ","), process them and try again. Otherwise, after than initial character normal sweet-expression processing occurs... so an embedded "(" in expressions like load("name") or if (n < 3) would work normally.
There are disadvantages to this approach, too. One sad thing is that parentheses-surrounded infix calculations would no longer work at the topmost level, e.g., "(3 + 4)" would work inside a block, but not as the topmost block. You could enter "3 + 4", but that would require two "Enters" if we keep the rule that "multiple terms on the initial line mean wait for more lines". The user could call using name-prefixing any function that returns its first parameter, e.g., if "calc" is a function that returns its first parameter, then "calc(3 + 4)" could do the calculation. An alternative for the "usual case" would be to add one more rule: if there are 3 or more terms on the first line, and it matches the infix pattern, then return immediately after the first line is entered - so "3 + 4" now works on the command line. I like that idea; it seems unlikely to happen by accident, and it's even simpler than "(3 + 4)". You'd still need a "calc()" function if on the initial line you have an infix operation whose first parameter is also infix, e.g., "(5 * 6) + 3" won't work - you'd have to use "calc(5 * 6) + 3". Such special-casing is not as "clean", unfortunately. The advantages in backwards-compatibility and ease-of-use are significant, though, so I'm leaning towards that change.
So here's how the rules would look:
The blank-line rule is needed to make interactive processing pleasant. I think a space-and-tab only line should be considered the same as an immediate-return blank line; you could make lines with only spaces and tabs have a different meaning, but that's certain to cause mysterious problems (since they'd look the same). Lines which include only a ;-prefixed comment should be completely ignored, even if they have less indentation, so that they can be included without ending the block.
Note: If the first line was indented, but the second line is not indented at all, then the previous line is processed and returned, with the second line unconsumed. This deals with weird formats like:
hello (more stuff....)Such formats were probably intended to be interpreted as traditional s-expressions (they make no sense as multi-line sweet-expressions), so we'll interpret the first line as a traditional format to increase backwards compatibility.
The rules are complex, though. What's disturbing is that "(3 + 4) * 5" works completely differently from "3 + (4 * 5)"; that's the sort of inconsistency that makes a notation a burden instead of a pleasure. One complication is that the "read" function needs to return when it's read an expression, and ideally doesn't store internal state of whitespace alrady read. That means that recording the indentation of the initial block should be avoided, if it's reasonable to do so... otherwise we have to record that information.
In short, striving for backwards-compatibility, ease-of-use at the command line, ease-of-use for programming, readability, and full list processing without special-casing a lot of syntax involves tradeoffs... my goal is to find the "best" trade.
There are times when you don't want infix processing (or want to control it specially). Also, it's clearly possible that reading an existing s-expression, but using infix-interpretation, could occasionally silently change the meaning of that expression. And that is obviously of concern to some.
In discussions with Alan Manuel Gloria, another idea popped up: really using multiple different grouping/list markers. The original discussion noted using [] for grouping/list initialization instead of (). Basically, [...] would start a new list and turn off indentation, but continue name-prefix and infix processing. In contrast, () would at least disable infix processing.
Although the original discussions used [...], I tried out various experiments and think that {...} is a better choice. The problem is that [...] is used in mathematics, and many programming languages including Python and Haskell, to mean a literal list. But this is exactly what we don't mean - instead, we want a "grouping" symbol that means that we specially interpret what's inside (in particular, that we interpret infix operators). In contrast, all languages with C-like syntaxes (C, C++, Java, C#) have trained folks to read {..} as possibly meaning a block, and in mathematics {...} can denote sets (again, special interpretation used for what's inside).
So, for the moment, let's presume that unprefixed (...) will indicate "no infix" (since this is much more backwards-compatible with s-expressions). There are actually two alternative ways to interpret (), if it's to be distinct from {} grouping. One is to say that () disables only infix, but you still have name-prefixing, and you can switch back to infix inside by using {}. Another is that () switches to pure s-expression processing.
There's a big advantage to saying that () switches to pure s-expression processing (disabling both indentation and infix processing, as well as indentation). If you define it that way, the sweet-expression reader can immediately call the original built-in (read) routine instead. That's important - while implementing sweet-expressions, I've found that there's a lot of implementation-unique read processing that you have to try to simulate if you want ALL the functionality of the underlying system. (E.G., standard Scheme doesn't have |..| constants, but some implementations do, so to read stuff for their Scheme you have to implement these new oddities.) That makes it essentially impossible to write a portable implementation of sweet-expressions that retains the capabilities of the underlying system. As a side-benefit, most s-expression files would be just readable as-is... which I admit is NOT a 'most important' criteria, but it'd be nice to have.
On the other hand, sometimes you'd like to say "no infix or indent processing", but be able to use name-prefixing, and switch to infix inside it. Guess what - there's a symbol pair already with that meaning: [..]. So [f g(x) + h(x)] is interpreted as (f (g x) + (h x)), without any infix operator in sight. Let's say that [..] has this meaning as long as it's not name-prefixed; that way, we can later give expressions like name[i] another meaning. If we use [...] to mean "list, no infix" and {...} to mean "list, with infix", you can then write sweet-expressions without (...) at all. This means that we could then give the reader a choice of how to handle (...)... as grouping, like {}, or as no infix, as []. The latter is more backwards-compatible, and so I'd expect that to be the default; the former is much more similar to other languages.
I think it'd be nicer to start with "infix as the default" (maybe a callable parameter if you want something else). That's a reasonable default simply because MOST people are more comfortable with it. Since ANY open paren disables the infix until its matching paren, users that use ordinary s-expressions might not even notice sweet-expressions. It's not like normal Lispers type in "3 + 4" and expect it to work :-). So it won't harm those who want s-expression prefix, but it will help those who want infix. That might make sweet-expressions more likely to be accepted; you could enable them, and yet for most people they wouldn't even notice the difference (except for the addition of new abilities).
The rules are simple: When unprefixed, use {} for grouping, which disables only indentation; use [...] to disable indentation and infix; use () to disable everything (infix, name-prefix, and indentation). A prefixed {} or () is a function call, and only disables indentation. We'll reserve prefixed [...] for future use. This use of {} instead of () for grouping infix operators like {4 + {3 * 2}} is _slightly_ nonstandard, but not terribly so, and lets us be completely backwards-compatible with s-expressions. That's pretty compelling.
What about name-prefixed forms? Even if (f x) disables infix processing, that does not mean that f(x) must do the same. It's consistent in one sense if it does, but f{x} looks really funny while f(x) looks very conventional. I'm presuming, for the moment, that unprefixed (...) will really only be used for backwards-compatibility.
So now we're back to nice forms, without any calc()-like irregularities:
3 + 4 3 + {5 * 6} ; just use {} instead of (). {3 + 5} * 6 ; just use {} instead of (). No special cases. {3 + 4} * f(x + y) ; Function calls work fine. (+ 3 4) * f{x + y} ; this works too (!) f{3 + 4} ; Function calls support infix, is (f (+ 3 4)) f(3 + 4) ; Proposal: prefixed () keeps infix, so it's (f (+ 3 4)) [1 + 2] ; Here's a safe way to describe (1 + 2). [f 3 + 4] ; Here's (f 3 + 4). (f 3 + 4) ; Same thing.
There's another interesting benefit too. This would mean that sweet-expressions would be full of {...} when infix is used... which would make them even easier to tell apart from s-expressions.
Interestingly, this also makes it easy to use special "infix" macros (often written as "nfx"). Just tell users that "for advanced infix processing, use (nfx ....) and then follow these infix rules (which can then support precedence levels, defining infix operators, etc.)". So now you can have a choice: (1) a built-in "base" system doesn't need to be told about operators, precedence, or associativity, but you have to group all different operators using [] and you can only use punctuation as infix operators... and (2) advanced "nfx-like" functions, but one where you have to define all that stuff.
In theory, a program written using Scheme RSR6 might use [...] to mean the same as (..). But I bet almost no one actually DOES that.
This switch to using {...} everywhere for grouping, and (...) for straight s-expressions, is very tempting. The factorial example becomes:
defun factorial {n} ; Parameters can be indented, but need not be if {n <= 1} ; Supports infix, prefix, & function <=(n 1) 1 ; This has no parameters, so it's an atom. n * factorial(n - 1) ; Function(...) notation supported ; Function{...} okay too.
It's easy to accidentally use parens when you meant to use {...}, particularly in expressions like {4 + {3 * 2}}. But it's still tempting to make this change, since this makes the entire notation even more backwards-compatible, as well as giving it an easy (and clear) way to disable infix.
Since this gives an easy way to disable infix, this does raise the possibility of allowing "and" and "or" as built-in infix operators. After all, now infix operators are easy to disable. On the one hand, it's horribly inconsistent to have "and" and "or" built-in as infix, since all other infix operators are punctuation. On the other hand, "and" and "or" are essentially supported by all major Lisps. Also, these two operators are often treated specially; provers often treat them specially, and some Lisps treat them as short-circuiting operations (so in some since they are already very special). A typical dilemma: You don't want many special cases, but if a special case really makes something easy to use, it may be worth it. And of course, once "and" and "or" are built-in, along with relational operators and + - * /, perhaps some built-in precedence does make sense.
In the end, the best approach may be to choose the "most easily changed" decision. If precedence is not supported at first, then adding it later is easy (since any written code won't use precedence). On the other hand, removing precedence (or fixing precedence levels) is hard/impossible. Similarly, if () disables sweet-expressions entirely, then code won't be written inside () that uses sweet-expressions... so it won't be a big deal to add that functionality later.
It's really too bad about (...). In any other language, (...) merely groups expressions, but if we're serious about backwards-compatibility, then an unprefixed (...) needs to turn off infix processing inside it. If you don't need backwards compatibility, accepting unprefixed (...) for simply grouping (without disabling infix) sure looks nicer. It's too bad that there is a big trade-off here, I'd like the benefits of both.
If there's a really major risk that a misplaced space would immediately cause your deep space probe to crash, perhaps a Lisp-like notation isn't the best idea for your application. Even regular s-expressions have similar problems; a space anywhere in an s-expression separates parameters, and there's little compile-time checking of the kind expected for such circumstances. Lisp-like languages don't do a lot of compile-time checking in general (of the kind expected for an "any mistake is disastrous" application). Specialized notations that make it hard to make that kind of mistake, and easier to detect every mistake at compile time, might be better (Ada comes to mind). But in many circumstances, that's not the primary need; if you need to do a pile of list processing, including processing/analysis of language, a list processing language makes sense. For example, analysis of a critically-important program might be a perfectly fine use for sweet-expressions.
Another problem is when there are multiple ways to express something, it can be hard to choose between them. In particular, you can use name-prefixing or indentation to implement a function call, as well as traditional s-expressions - which do you choose? And when displaying a list, which do you choose? I think "name-prefixing for short expressions, and indentation when their parameter lists get long", is a reasonable rule of thumb when writing code where you intend to use it as a function call. If it's just a list of information, a traditional s-expression shows it clearly. I would probably use traditional s-expressions to write the list back out; that is clear, and can help people avoid misunderstandings and misuse.
There are lots of other options, but I only have so much time. In any case, I’ve investigated a lot of options, and I think I’ve found a plausible combination. What I have here is a paper design that looks like it should work.
I think the next step is to establish a project where people can discuss this, and fix it based on those comments. Then, after a period of discussion, create some FLOSS code to implement it that is widely-usable (say under the MIT, BSD-new, or LLGPL license). One of the first things to implement are pretty-printers for this format, so we can see what big programs look like in this format (and see its problems). Once the format is updated through experience coding it, I would want the resulting programs would be available to all. Multiple implementations will be necessary (for Scheme, Common Lisp, etc.) and they will need to support several different usage scenarios:
My current thought is to develop it in a subset of Scheme, which can then be easily translated to ACL2, Common Lisp, and so on (Scheme has far fewer operations, and is much stricter about types such as booleans, so it's easier to translate Scheme to something else instead of the other way around.)
Finally, after experimentation and testing in the real world, the final form of sweet-expressions could be set and released to the world.
I've created a SourceForge project named "readable" - the project's role is to discuss ways to make s-expressions more readable, create at least one (sweet-expressions), and promulgate FLOSS code to support readable representations of s-expressions. I could possibly create "sweetexpr" for specifically working on sweet-expressions, or work via the Scheme SFRI process.
I recently encountered the new language BitC, a language devised to support low-level programs (e.g., operating system kernels) that are easy to mathematically prove correct. (I learned about this while writing my paper High Assurance (for Security or Safety) and Free-Libre / Open Source Software (FLOSS)... with Lots on Formal Methods.) BitC combines ideas from Scheme and ML, including an emphasis on functional programming and tail recursion, with low-level capabilities like those of C. In particular, it’s designed so that you do not need to do heap allocation (e.g., “malloc” or “new”) or garbage collection to create useful programs, a property that functional programming languages generally do not have. This is very interesting work, and BitC version 0.9+ uses an S-expression-like notation. The reason it does, intead of a more common notation, is because the authors want to make it easy to prove them correct. One way to prove them correct is to use reasoning where program fragments are repeatedly replaced by other fragments -- and since s-expressions are very regular, s-expressions simplify the process. They hope to reuse some of ACL2’s capabilities, which is written in Common Lisp and expects s-expressions as input.
The problem is that s-expressions are still not easy for humans to read, and BitC’s creators are well aware of that. Also, while Lisp programmers routinely “write programs to write programs”, here the application is different -- generally a theorem prover is doing the rewriting, and displaying LOTS of output from text generated from a program. This means humans will be doing lots of reading, possibly of long, complicated structures; what you want is an input and output that is simple for the human to read and unambiguous. What’s more, this is a new language, not Lisp, so trying to stay compatible with Lisp for compatibility’s sake is not necessary. Indeed, the BitC folks had already strayed from the strict s-expression camp; they had included a few abbreviations, like an infix “:” to indicate types (the abbreviations quietly transform to regular s-expressions). Keeping the power of s-expressions (e.g., the regularity of processing) would be advantage for BitC.
In an email I received from Jonathan Shapiro, he noted that “I share your concern about using an [s-expression-based] syntax [that users may find it hard to use]. The problem is that we have a bunch of very nasty experience in Isabelle/HOL, TWELF, and friends with the alternative...”, in part because “All known examples for quoting programs so that they can be manipulated as terms in a prover are a complete disaster if the [programming language] is infix.” He continued, “When we get around to dealing with the logic language, it will become clear that treating program fragments as terms is vital. In LISP this is quite natural: ’(eval 1). ... ACL2 exploits a devious pun between applicative functions and logic terms... [defining a function] simultaneously [admits] admitting a procedure and a [mathematical pure] function... Because [this pun] is maintained so successfully, godelization requires no explicit syntax... Our experience with parsing infix languages as terms has been that (a) it is truly dismal, and (b) it is incomprehensible to the developer... even the people at UT could not make IACL2 work well... One reason that we did NOT adopt the ‘{’ syntax for vectors was the anticipation that we might use it for infix terms in some future implementation. I’ve been very careful to preserve the possibility of an equivalent infix syntax. It doesn’t mean that we would have to drop the existing lisp-ish syntax.” He also doesn’t want to spend time devising syntax, since that is not the focus of his work.
Note that this application needs to be able to manipulate fragments of program code, repeatedly -- a traditional Lisp strength, and a traditional reason to use s-expressions. Yet, we want the s-expressions to be easier to read!
The power of metaprogramming is well-known, and there's a lot of literature about it. To cite just one example, Jurgen Jordanus Vinju's "Analysis and Transformation of Source Code by Parsing and Rewriting" discusses some of the things you can do when you use programs to analyze/modify programs. There are a legion of other references.
Whether or not the BitC folks pick up these ideas immediately, I think finding a way to make s-expressions more accessible is an interesting puzzle, one that would be great to solve. I suspect there are many people who might be interested in picking up an improvement, if it really was an improvement. The ACL2 folks have the same issues, as do many others, too. So it'd be great to solve it once, and then anyone who wants to use the solution can do so.
Of course, making programming languages readable is much more general topic, and there's a lot of literature on that. The literature on parsing is also extensive. One paper on parsing that I should mention is Haskell, do you read me?: constructing and composing efficient top-down parsers at runtime, Proceedings of the first ACM SIGPLAN symposium on Haskell table of contents Victoria, BC, Canada, pp. 63-74, 2008. This is implemented by "ChristmasTree": "Changing Haskell's Read Implementation Such That by Manipulating Abstract Syntax Trees it Reads Expressions Efficiently".
You might want to look at my paper High Assurance (for Security or Safety) and Free-Libre / Open Source Software (FLOSS)... with Lots on Formal Methods.
You can see David A. Wheeler’s home page, http://www.dwheeler.com.