I-expressions: Detailed Specification
David A. Wheeler
2008-01-06

SRFI-49 (http://srfi.schemers.org/srfi-49/srfi-49.html dated 2005/05/29)
provides a pretty good system for indentation, but there are some issues
with it.  The spec has a few errors, and the BNF productions don't
include much information on the whitespace-handling (which may explain
why the sample implementation has bugs in handling comments in certain
constructs).  Instead, much of the whitespace handling is described
only informally.  In addition, the sample code isn't obviously related
to the BNF productions, so it's difficult to be confident that the code
is correct even if its known bugs were fixed.

So, below are step-by-step transforms of the
SRFI-49 rules.  The first one is a mild "fix-up" of the SFRI rules;
the second takes the first and adds whitespace rules that are
(mostly) implicit in the text, as well as proposing a way to deal
with "initial indent".  The goal is to make the specification more
detailed, in a step-by-step fashion, and then create an implementation
that is "obviously correct" from the detailed spec.
That way, we can be much more confident that the code is correct.

===========================================================
The SRFI-49 spec productions are modified as follows:

(1) Fix spec bug: the "head" productions' "expr" are changed to "s-expr"

(2) Fix spec bug: the rule for "head-> s-expr" is changed from "(list expr)"
    to "(list $1)"

(3) Fix spec bug: the missing rule for UNQUOTE-SPLICING has been added

(4) Editorial: rules are reordered so the GROUP productions are adjacent

(5) Relaxation of rule: This version adds a "start-expr" production,
    which defines reading the first line of an expression, and adds a new
    rule to permit indented initial lines:
        start-expr -> INDENT expr DEDENT

    Rationale: The original specification completely forbid initial
    indentation, but this turns out to have been overly strict.  See below
    for details about various alternatives. This alternative was selected
    because it has the "obvious" meaning, yet does NOT require "read"
    to store hidden indentation state after returning.  Many other
    alternatives required hidden state or could be easily misleading
    (resulting in subtle bugs).  This means that if initial expressions
    are all indented, they must be separated with blank lines.

(6) Proposed bug-fix: After the FIRST line of an expression, any
    line with ONLY zero or more spaces and tabs, and no ;-comment,
    ENDS the expression.

    This is only a small change from the spec text as literally written.
    A line with zero or more horizontal whitespace characters followed by
    a ;-comment, aka an "empty line", is (still) ALWAYS ignored
    and not considered for indentation processing. In addition,
    a line with ONLY horizontal whitespace characters, aka a
    "blank line" (it has only blanks), is (still) ignored and skipped
    when reading the first line of an expression (and not considered
    for identation processing).

    Rationale: If the spec were followed literally, interactive use
    would be quite unpleasant. That's because the results of an
    expression would NEVER be written until after the next
    expression's first line was entered, which is very confusing.
    This was clearly not the intent; with the sample implementation,
    pressing "Enter Enter" (a blank line) causes execution, which is
    much more sensible.

    The modification also states that lines with JUST spaces and tabs
    would be considered the same as a line with no characters at all.
    Otherwise, printed and displayed text could not be understood with
    certainty - a line that looked completely blank COULD mean either
    "expression completed" OR "expression continues", depending on the
    invisible whitespace indentation - leading to hard-to-debug code.
    What's worse, many tools (including editors and email clients) quietly
    remove trailing whitespace, so differentiating between lines that
    have only spaces/tabs from lines with no characters can lead to
    many quiet changes in code meaning.

    Note that the sample implementation (in the spec) does not follow the
    spec production rules - it DOES return on empty lines in
    some circumstances.  So this is considered a bug in the production rules.

    As noted in its spec, I-expressions were specifically designed
    so that they will work the same way whether they are entered
    interactively or via a file, enabling cut-and-paste
    and avoiding the complexities of a "mode" flag (which can be hard
    to get right).

    To create vertical space, just use a ;-leading comment.  Note that
    lines with only leading whitespace and ;-comments MAY, but
    NEED NOT, align with other text - so quick "FIXME" comments, or
    lengthy comments, need not match the indentation.

    *** THIS IS UNDER DISCUSSION IN THE MAILING LIST. ***
    TODO: If this proposal is acceptable (or another one found that's even
    better), reword the clarifications to merge this in, and modify the
    "more detailed spec" below to match.

(7) Proposed repair: In the production:
  head-> s-expr head
   (append $1 $2)
  the "append" should be "cons".  An s-expr need not be a list, but
  append only accepts lists, so clearly "accept" cannot be correct.

  TODO: If accepted, repair below.

(8) ISSUE: QUOTE, etc., are really only the rules when the abbreviations
are followed by whitespace.  This is not made clear in the specification.
This can be tricky to implement, too, because there's no "unget-char"
function in Scheme.

(9) ISSUE: GROUP.  The "group" rules are probably not what was intended.
For example, the:
  expr -> GROUP head
and
  expr -> head
effects are (per the spec) exactly the same, yet presumably what was
INTENDED was to add an extra (...).

(10) TODO/ISSUE: What to do about "."?  What does it mean in various
circumstances?  Presumably similar to regular lists, but the spec doesn't
note this at all.  Also, the implementation transforms "." to "()"
too hastily in some cases, where it should stay as ".".
Need to make sure can represent anything, so there needs to be some
escape measures.

(11) TODO/ISSUE: These productions are not as easy to implement; it might be
better to look at the sample implementation and work out productions
from that.  (Though the sample doesn't distinguish "first terms in a line",
and that would need to change - may need to change both code and spec)


Note: "group" is carefully defined so that any s-expression can be
represented with I-expressions.  In particular, "group" with no parameters
or children maps to the symbol "group", so the symbol can be represented.
"group" can escape itself, so to begin a list with "group", just prefix it
with group, i.e. "group group" maps to "(group)".  The symbol "group"
only has meaning at the beginning of a line, so expressions like
+ group 1
map to (+ group 1) as you would expect.

===========================================================
Updated SRFI-49 spec productions

The modified production rules are:

  start-expr -> expr
    $1
  start-expr -> INDENT expr DEDENT
    $2 ; Use "most consistent" approach for handling toplevel indents.

  ; Abbreviations; these are considered first, BEFORE reading as s-expr:
  expr -> QUOTE expr
   (list 'quote $2)
  expr -> QUASIQUOTE expr
   (list 'quasiquote $2)
  expr -> UNQUOTE expr
   (list 'unquote $2)
  expr -> UNQUOTE-SPLICING expr  ; ,@
   (list 'unquote-splicing $2)

  ; Note: GROUP is defined so it's only meaningful if it's the first
  ; non-abbreviation of the line; GROUP has no special effect elsewhere.
  expr -> GROUP head INDENT body DEDENT
   (append $2 $4)
  expr -> GROUP INDENT body DEDENT
   $3
  expr -> GROUP head
   (if (= (length $2) 1)
       (car $2)
     $2)
  expr -> head INDENT body DEDENT
   (append $1 $3)
  expr -> head
   (if (= (length $1) 1)
       (car $1)
     $1)

  ; "head" is what happens on ONE line, and a head sequence ends with eol.
  head-> s-expr head
   (append $1 $2)
  head-> s-expr
   (list $1)

  ; "body" is the set of children lines (from the point-of-view of head)
  body -> expr body
    (cons $1 $2)
  body ->
   '()


===========================================================
Detailed version of spec

The original spec described primarily in words what to do about whitespace.
This can make it difficult to implement with certainty, so the following
is a more detailed version of the production rules, but with whitespace
rules made explicit as part of the productions (instead of being
implicit in the English text).

Instead of having an anonymous "whitespace preprocessor", we will
define a special rule that can consume a line's initial indentation, and
compare that with the currently-active indentation.
It can then produce INDENT, DEDENT, or SAME (for the same indentation).
During processing of the start-expr rule, the "current" indentation is
the empty string, and it can start immediately (since this rule only
applies to new lines).  For the rest of the rules, INDENT, DEDENT or
SAME can only match if the currently-being-processed character is newline
or EOF.  It will match INDENT if the new line is further indented,
DEDENT for each level of dedenting, and SAME if it's the same level of
indenting.

Note that EOF can start a whole new expression, but can't be in
the middle of one.  Thus, a ' followed by EOF is not legal, but
EOF is perfectly legal result of start_expr (it's how a correctly-terminated
file would end).

A current proposal is to treat as line with only horizontal
whitespace as if it's a line solely with newline - i.e.,
as if the horizontal whitespace didn't even exist.
After all, you can't see the difference when printing,
and typically can't see them when editing either.
This appears to be the safer alternative.
That's what the text below does.

The following is psuedocode for key functions, followed by
modified productions that add specific rules for whitespace processing
(instead of the English descriptions):

define get-leading-hspace()
   {sequence <- get/consume sequence of spaces and tabs}
   if pair?(memv(peek-char() '(NL EOF)))
      ""  ; if newline (no ;), treat as newline-by-self. Don't consume here.
      sequence ; return new indent level

Here's the whitespace processor, described
in pseudocode using sweet-expressions 0.2.
This is called when INDENT/SAME/DEDENT are being matched; we know we
have one (it begins with a newline after start-expr), but we need
to figure out which one.

define process-line-begin()
  ; Used by the productions OTHER that start-expr; consume the newline
  ; (which is the current character if you peek), and return the new
  ; indentation text.  Do NOT use this in start-expr, because start-expr
  ; need NOT begin with a newline.
  ; Use "whats-indent" to see if the indentation text returned is
  ; an INDENT, SAME, DEDENT, or an error.
  ; The following consumes the newline - and complains if it's NOT a newline
  if {(read-char) char-ne? #\newline) error("process-line-begin miscalled")
  {new-indent <- get-leading-hspace()}
  cond
    {peek() = #\; }       ; Comment-only line?
      get-comment()       ; Yes, ignore comment-only lines -
      consume-nl-ifany()  ; even their indentation is irrelevant.
      process-line-begin()
    #t
      new-indent
  ; Note: This won't consume "return" when we have a line with no characters,
  ; or ONLY spaces/tabs.  Instead, it will return all the way back up to
  ; start-expr.  If we're processing something, it will eventually return
  ; to complete a start-expr, which returns with the expression.
  ; If we're _already_ in start-expr, then start-expr will just consume the
  ; blank line and keep going.

define whats-indent(old-indent new-indent)
  cond
    {new-indent > old-indent}  return(INDENT)
    {new-indent = old-indent}  return(SAME)
    {new-indent < old-indent}  return(DEDENT)
    #t                         error("Incomparable indents")
  

Here are the modified production rules, making whitespace explicit.
The expectation is that in the implementation, each of these nonterminals
would be implemented by a function; one of its input parameters would be
the current indentation, and each would return
(new-indentation result-so-far).  A function can compare the indentation
that it received when it started with a new-indentation returned
using what-indent.  Note that some of the $values are changed in below,
because when the whitespace symbols were inserted they caused the positions
of the other non-terminals to change.  In some cases $last is used, so that
it'd be more likely to be right as the text below was edited.

  ; Definitions of whitespace:
  eol -> comment? eol-final               ; eol = "end of line"
  comment -> ";" (not NL|EOF)*   ; Note: does not consume NL or EOF.
  eol-final -> NL | EOF
  hspace -> SPACE | TAB

  ; Start-up is special (esp. for EOF handling).
  ; INDENT-NO-NL, DEDENT-NO-NL, and SAME-NO-NL are like INDENT, DEDENT, and
  ; SAME, but they do NOT start with a newline.  Thus, they do NOT consume
  ; an initial newline.  We use these, because in start-expr we are ALREADY
  ; at the start of a new line, while in all other productions, we don't
  ; transition to a new line until we see the newline character.

  start-expr -> hspace* comment? EOF
    ; Consume this EOF, in case it's interactive - all others should be peeked
    ; Note: get hspace* using get-leading-hspace, and NOT by skipping the
    ; space (because we need to retain those characters for examination)
    ; and NOT by invoking process-line begin (because that presumes that we're
    ; beginning with a newline, not necessarily true here)
    $last ; EOF is legal at top level (here).  Note that '<EOF> isn't legal
  start-expr -> hspace* comment? NL start-expr
    $last ; skip any initial content-free lines.
  start-expr -> INDENT-NO-NL expr DEDENT-NO-NL
    $2 ; Use "most consistent" approach for handling toplevel indents.
    ; Note that DEDENT-NO-NL does NOT consume a newline; that was already
    ; done in expr.
  start-expr -> expr SAME-NO-NL
    $1 ; this is an expression starting at the left edge.
    ; Notice that we have no way of knowing when we're "done" until we
    ; read the next line, and see that it has effectively a left-edge-start
    ; (because it's a whitespace-only line, or because it's another expr
    ; starting at the left edge).  Note that SAME-NO-NL does NOT consume a
    ; newline; that consumption was already done by expr.

  ; "expr" describes a single expression/datum.
  ; Abbreviations; these are considered first (have higher priority)
  ; that the abbreviation processing built into the nonterminal "s-expr".
  ; That way, we do NOT leave indentation processing merely because we had
  ; one of the standard abbreviations, preventing certain misleading formats.
  expr -> QUOTE hspace* SAME? expr
   (list 'quote $last)
  expr -> QUASIQUOTE hspace* SAME? expr
   (list 'quasiquote $last)
  expr -> UNQUOTE hspace* SAME? expr   ; , without a following @
   (list 'unquote $last)
  expr -> UNQUOTE-SPLICING hspace* SAME? expr  ; ,@
   (list 'unquote-splicing $last)
  ; Note: Abbreviations only accept expr at their tail; they
  ; do NOT (by intent) accept "INDENT body DEDENT" instead of expr.
  ; If they did, that would imply that abbreviations can have multiple
  ; arguments (because bodies can have more than one entry) - but they can't!
  ; Thus, if an abbreviation symbol is followed by hspace* newline,
  ; the next line (that isn't a comment-only line) must have the SAME
  ; indentation level. This prevents nonsense like a ' with two arguments.
  ; Notice that an expr is required after an abbreviation,
  ; so '<EOF> is not legal.

  ; In actual code, you can't distinguish between GROUP and head until the
  ; leading s-expr is read in.  So in the implementation:
  ;   * peek at the first character - remember if it is G/g or not.
  ;   * read in the s-expr
  ;   * if the s-expr is the symbol "group", _and_ the first peeked character
  ;     was G/g, then it is GROUP... else it is not.

  ; Note: "DEDENT after processing body" moved to body; that's
  ; easier to implement.

  expr -> GROUP hspace* head INDENT body
   (append $3 $5)
  expr -> GROUP hspace* comment? INDENT body
   $4
  expr -> GROUP hspace* head
   (if (= (length $last) 1)
       (car $last)
     $last)
  expr -> head INDENT body
   (append $1 $3)
  expr -> head  ; followed by DEDENT or SAME
   (if (= (length $1) 1)
       (car $1)
     $1)

  ; "head" describes multiple datums on one line:
  head -> s-expr hspace* head ; typically hspace+.
   (cons $1 $3) ; PROPOSED REPAIR
  head -> s-expr hspace* comment?
   (list $1)
   ; this is the terminating production (the other one recurses to "head");
   ; this production is followed by EOF or newline (newline is represented
   ; here as part of the indentation token)

  ; "body" describes the sequence of child lines.
  ; It's impossible to have the sequence INDENT DEDENT, there MUST be
  ; something in between.  Thus, when body is first called, it cannot
  ; match the "empty" rule of the I-expression spec.  Thus, we'll rewrite
  ; the rule as a body followed by body-tail, because that's
  ; easier to implement.  Also, we'll consume the DEDENT here, instead of
  ; in the caller; that's easier to implement.
  body -> expr body-tail
    (cons $1 $last)
  body-tail -> SAME body
    $2
    ; It's another line at the SAME indentation, so we have another body.
  body-tail -> DEDENT
    '()
    ; No more children; the sequence of bodies has ended.
    ; Note: It would be illegal to be INDENT; INDENT would be handled
    ; by expr, not body.

  ; s-expr is a traditional s-expr, aka datum.  It does NOT begin with
  ; ";", hspace, NL, or EOF.
  ; To implement it, the I-expression reader presumably calls on
  ; the _previous_ "read" routine for datum.
  ; When processing "expr", the special definitions for
  ; abbreviations QUOTE etc. take precedence, so that indentation
  ; processing is not accidentally disabled. But, if you're processing
  ; the later entries of "head" (i.e., datums that are NOT the first
  ; datum on the line), the s-expr reader must handle the abbreviations.
  ; Users could abuse this to make ugly-looking code, but unlike the
  ; case of the initial expr, this wouldn't confuse the indentation...
  ; once you're processing later datums on a line, if the line ends in
  ; an abbreviation, then the next line's indentation is ignored since we
  ; KNOW we're trying to read in exactly one datum.
  ; Use a style checker if you want to curb the worst abuses.


Note: I-expressions do not provide special syntax for improper lists,
e.g., (a . b).  When you need them, just use s-expressions or cons.
A _syntax_ for this would be easy, e.g., rules like:
  head -> s-expr hspace+ "." hspace+ s-expr
However, it'd be hard to IMPLEMENT, because "." is a leading character
for many different circumstances (.9, ..., etc.), yet calling the
underlying reader might not be effective.  E.G., clisp's "read" will
fail if given a solo ".".  There doesn't seem to be a compelling need
anyway; you can use s-expressions or cons to construct these,
and there's an implementation headache to create
another syntax in I-expressions to implement them.

Other than in start-expr, "INDENT" matches newline followed by a deeper
indentation than the current one.  Similarly for SAME and DEDENT.
Thus, INDENT/SAME/DEDENT only match in non-initial lines streams that
begin with newline or EOF... and obviously, INDENT can't match an EOF.

On a match, all the matching characters (and only those) should be
consumed.  The exception is EOF: Except for the first rule of
start-expr, "peek" for EOF and don't consume it.

Note some invariants:
* You can't follow DEDENT/SAME/INDENT with any hspace - it'd be ambiguous.
  - Therefore, expr, head, and body can't begin with an hspace.

Goals:
* Inside a line, you should be able to separate s-expr with hspace+
* hspace* should be okay at the end of each line, with unchanged meaning.
  - This is handled by a "head" production

========================
Whitespace problems in original spec

Unfortunately, there are several problems in the original spec
regarding the handling of lines that contain 0+ spaces or tabs,
possibly followed by a ;-leading comment.

The original final I-expression spec says:
"Unfortunately, [in Python] the syntaxes of file input and
interactive input differs slightly...
[In I-expressions,]
Each line in a file is either empty (contains only whitepace and/or a
comment), or contains some code, preceeded by some number of space and/or tab
characters.
In the following syntax definition, this initial space, as well as linebreaks,
is not included in the rules. Instead, preceding any matching, the leading
space of each line is compared to the leading space of the last non-empty line
preceeding it, and then removed. If the line is preceeded by more space than
the last one was, the special symbol INDENT is added..."

Thus, in the original spec, a line with only horizontal whitespace,
optionally trailed by a comment (presumably a ;-comment), was ignored after
the first line of an expression.  But this says nothing about what
happens on the FIRST line of an expression; some clarification is needed.

In addition, the text above implies that in I-expressions,
the file input and interactive input formats are the same.
Yet this is improbable as stated.  The "obvious" reading of the spec
suggests that blank lines ("Enter Enter") at the end of an expression
would be ignored. But this would mean that in interactive use, the output
of a first expression would only be produced after the first line of the
second expression were entered.  This would lead to confusion like this
(where ">" is an input prompt):
> + 1 1
> + 1 2
2
> + 1 3
3

The sample implementation given in SFRI-49 didn't really follow
the spec of SFRI-49.  For example, it DID accept indentation of the
first line (though with a problematic semantic), and it DID accept
blank lines (in some cases) as ending an expression.
This suggests that the spec is not quite accurate, and needs careful
revision/clarification.

The sections below discuss two issues:
* Leading whitespace at start of expression reading
* Blank and comment-only lines after initial line of an expression.

========================
ISSUE: Leading whitespace at start of expression reading in I-expressions

I propose a specific interpretation for leading whitespace in an
indented I-expression, which I'll call the "most consistent" format.
Below is an explanation of the problem, and my proposed resolution.

Thoughts?  After fiddling with the alternatives, I'm getting very worried that
it'd be easy to type in text that would APPEAR to mean one thing, but would
ACTUALLY mean something else.  That's definitely something to avoid.
My "most consistent" proposal completely avoids that, without being quite as
strict as Python's "thou shalt always start any expression at the left edge".

First, the initial situation:  The I-expression spec revised 2005/05/29
does NOT permit an indentation at the beginning of an expression.
The sample implementation does permit them; it simply skips horizontal
whitespace on the first line (ignoring them).  The spec's completely
forbidding them is easily done, but is overly strict; the
"skip horizontal whitespace" approach has unforunate consequences
(as discussed further below).

What SHOULD be done if the start of an expression
(I'll call that start-expr) begins with whitespace that
is followed by content (and not just an
;-comments, newline (NL) or end-of-file (EOF))? E.G.:
 start-expr -> hspace+ (not eol...)

An example should make it clear. Imagine you read this (three lines,
all indented to the same level at the TOPMOST level):
   x
   y
   z

One interpretation is that there should be 3 different results: x, y, and z.
But consider how this would be read.
You'd read in the indentation before x, and note
that as the "topmost" indentation.  Then you'd read in the indentation
before y, notice that it was the same as x's, and stop just before reading
the "y" and return with just "x".
But wait - if you did that, when you read "y" you would think that there
was <i>no</i> indentation (the previous read consumed it), and thus z
would be further indented... returning (y&nbsp;z).  Ooops, that can't be right.

Since essentially the dawn of Lisp in the 1950s
there has been a "read" function that reads an S-expression
from the input and returns it.
This is an extremely stable function interface, and one not easily changed
in fundamental ways.
In particular, no user of "read" expects it to <i>also</i> return some
state - such as the indentation that was read the <i>last</i> time read
was called - and certainly they aren't going to provide that information
back to "read" anyway.
Not only is this difficult to change for backwards-compatibility reasons,
it's not clear you should - simple interfaces are a good idea, if you can
get them, and adding such "indentation state" as a required parameter would
certainly complicate the interface.

In theory, you could "unget" all the indentation characters, so that the
next read would work correctly.
But support for this is rare; for example,
Scheme doesn't even <i>have</i> a standard unget character function, and
the Common Lisp standard only supports one character unget (not enough!).

You could store "hidden state" inside the read function.
Problem is, character-reading is not the exclusive domain of the read function;
many other functions read characters, and they are unlikely to look at this
hidden state.
These functions tend to be low-level functions and in some implementations
are difficult to override.  So you'd probably have inconsistent values
from different reading functions, a recipe for subtle bugs.
What's more, you would have to store hidden state for each possible input
source, and this can become insane in the many implementations that support
support ports of non-files (such as from strings).
"Hidden state" could allow for all this, but the complications of
<i>implementing</i> hidden state suggests that it'd be better to spec
something that does <i>not</i> require hidden state.


Possible solutions:

1. Simplest approach: Forbid it.  It's an error if it doesn't start on
left line.  Python does this.  You could argue that the original spec requires
this, since there's no production that accepts an initial INDENT.
The xyz example above would then be illegal.

But this is not very flexible; #2 (next) appears to be a better option.

2. Most consistent: Allow indentation on initial line (and consider that
the indentation for that expression), as long all later lines have
a further indentation OR are on the left edge (including a blank line
ending in EOL or EOF, or a comment the left edge).
Anything on the left edge ends the expression.
This at least LETS you indent each expression if you like,
with NO risk of misinterpretation of later lines.
The result is that you can indent the first lines of expressions.. you just
have to separate the different expressions with blank lines.
To implement this:
start-expr -> INDENT expr DEDENT

The xyz example above would be illegal, and thus rejected.
However, if you inserted blank lines between x, y, and z, you'd be okay.

This is the most consistent and most flexible, and has no risk of
misinterpretation, so I propose this one.

3. Original implementation ignores hspace:
start-expr -> hspace+ start-expr
  $2

But when this is given the xyz example above, it will misleadingly
produce (x y z).  That kind of surprise seems undesirable, esp. given
that there is alternative #2.

4. Instead, could disable indent processing on initial hspace, to maximize
backwards-compatibility and simplify some command line use:
start-expr -> hspace+ s-expr
  $2

This would read in the "xyz" example as you would expect.  It would
also read in old text like this as it was originally intended:
  (define x 5) (define y 6)

However, other formats will be misinterpreted, e.g.,
  fact
    5
will be understood as the two separate expressions (requiring two reads):
  fact
  5
and not as (fact 5).

This is risky; on printouts, it might not at ALL be obvious when
expressions are indented like this - resulting in hard-to-debug code
and hidden defects.

In general, I think it's much wiser to reject text that might
be very easily misinterpreted by the reader.  So I suggest #2.


========================
ISSUE: Blank and comment-only lines after initial line of an expression.

    *** THIS IS UNDER DISCUSSION IN THE MAILING LIST. ***

Both the original I-expression definition, and this revision,
have the following rule: After the initial line of an expression,
any line containing zero or more leading spaces and tabs,
FOLLOWED by a ";"-leading comment, is COMPLETELY IGNORED, even in the
middle of an expression.  In particular, its indentation is ignored.

The proposed change is that lines with 0+ spaces or tabs,
and no ;-leading comment, end an expression.  Note that a line
WITH just spaces and tabs is treated the same as a line with no characters.

This changes is necessary for reasonable interactive use.
If lines with only zero or more horizontal whitespace were completely
ignored, as the original spec stated, then interactive use is painful.
Results would print only after the first line of the NEXT expression
are entered.  Even the "sample implementation" in the spec didn't
actually do this.

To make interactivity pleasant, at least lines with absolutely no
characters should end an expression after the first line.  That way,
"Enter Enter" will cause an expression to be executed.

Then the question becomes, how should a line with 1 or more
spaces/tabs, and no ;-comments, be interpreted?  And should a line
with the current indentation be treated differently than if it is not?

Such lines COULD be interpreted as "continue the indentation"
if they matched the current indentation... or even if there was
at least one space/tab.  A minor argument FOR this alternative
is that it makes "pretty" vertical spacing a possibility.

But there is a serious problem to treating lines with no characters
differently from lines with only spaces and tabs.
The problems is that this could lead to mysterious bugs.
With such a rule, printed or displayed text could not be understood
with certainty; a line that looked empty COULD mean either
"expression completed" OR "expression continues",
leading to hard-to-debug code.  What's worse, many tools
quietly remove trailing horizontal whitespace on a line
(including many text editors and mailers); these deletions would
generally be unnoticed, yet change the meaning of a program.

The risk of mysterious, undetectable errors in code is serious,
and one that leads at least Wheeler to recommend that space/tab-only
lines (with no comments) be treated exactly like lines with no
characters at all.  Certainly, it's sometimes valuable to create
vertical space, but this can be done using comment-only lines
(which may, but need not, have leading whitespace).

It is true that Python 2.5 in _interactive_ mode distinguishes
between lines with no characters (not even whitespace) and lines
with only whitespace.  In interactive mode, a line with no characters
ends an expression, while lines with whitespace are considered relevant
(and can continue further lines).  However, Python 2.5 in
_file-reading_ mode has different semantics - it ignores both types of
lines (they have no effect on indentation).  This difference in modes
has the problem that it can cause cutting-and-pasting from a file to
an interactive session to fail, which is unfortunate.
But it's even more difficult for a Lisp-based system; it's often even
more difficult to be certain when a session is interactive or not -
there is no standard way for doing so.  This distinction of
interactive and non-interactive modes is very inconvenient, and
should be avoided if practical.

Note that this does NOT change how comment-only lines, preceded by
only 0+ spaces and tabs, are interpreted - these are STILL completely
ignored, and any indentation they have is considered irrelevant.
Thus, people will not need to "line up" indentation
of comments. This is useful for "FIXME" comments, or for
long comments to explain a complicated circumstance which is deeply
indented.  Since this ONLY applies to ;-comment-only lines,
blank lines (without comments) still end an expression - so
interactive use is still pleasant.  Most importantly, there's no
need for a difference between interactive and non-interactive use -
there's no need for a "mode" flag (which can be hard to get right),
and you can always cut-and-paste from a file into an interactive session.

===================
Notes about multiline comments

Note that multiline comments (e.g., #| ... |#) are NOT considered
comments by these rules. There is simply no way a
library implementation can reliably detect
such multicharacter sequences without disabling the reader's
implementation of # prefixes, because they are limited in many
languages to single character lookahead.  A #| will be considered
an s-expr by the indentation processor.  The following uses of #|
are known to be safe and not misleading:
  * #| _inside_ an s-expression; in this case indentation processing
    isn't happening.
  * #|...|#, where the opening "#|" is the first non-whitespace on the
    line, nothing trails the line after the closing |#, and
    the opening #| is indented the same way as the first
    non-comment-only line following the closing #|.

========================
Possible Test patterns

Below are some possible test patterns to help eliminate
errors in an implementation.

test = (first (second third?)?)?  (SPACE? semicolon - comment)?
first = abbreviation? t1 (SPACE t2 (SPACE t3)?)?
t1 = A | GROUP
t2 = B
t3 = C

second = first | SPACE first
third = SPACE first | SPACE SPACE second