src/developer/shell/parser/README.md - fuchsia - Git at Google

 # Fuchsia Shell Parser

 Provides a basic parser for the shell grammar.

 The parser is built on combinators, which means the parser is built out of other parsers.

 A parser in the source appears as a function taking a single argument of type `ParseResult` and
 returning a value of type `ParseResult`. The argument is the result of parsing a prefix before the
 current parse. As an example, consider the grammar:

 ```
 S <- A B
 ```

 Assuming we had parser functions for `A` and `B`, named `A()` and `B()` respectively, we could
 implement `S` as follows:

 ```
 ParseResult S(ParseResult prefix) {
   return B(A(prefix));
 }
 ```

 A `ParseResult` represents a position in the string. If parsing fails completely, the returned
 `ParseResult` is equal to `ParseResult::kEnd`, which tests `false` when used as a boolean.
 Generally, we hope that a failed parse will be recovered, so we will always get a valid parse from
 our grammar, but it may or may not contain error nodes where we had to recover a failure.

 `ParseResult` contains three pieces of information: the "tail" of the parse, representing the
 un-parsed data remaining after the parse, various information comprising an "error score", which
 indicates how much error recovery was necessary to make the parse succeed, and a stack, which
 contains the AST nodes we have produced.

 To understand the stack, consider our `A B` example above. After successfully parsing `S`, the stack
 might contain two nodes:

 ```
 Node::A("A") Node::B("B")
 ```

 The `ParseResult`'s `node` accessor would yield `Node::B`, as it yields the top of the stack. We can
 call `Reduce()` on a `ParseResult` to collapse the stack into a single node, so if we called
 `Reduce<Node::S>()` on our result, we would end up with one node on the stack:

 ```
 Node::S(children = { Node::A Node::B })
 ```

 `Reduce()` will consume the entire stack by default. This is inconvenient for more complicated
 parsers. To make `S` safe, we would insert a "marker node" to indicate the point in the stack where
 `Reduce()` should stop consuming. Thus our implementation looks like this:

 ```
 ParseResult S(ParseResult prefix) {
   return B(A(prefix.Mark())).Reduce<Node::S>();
 }
 ```

 This pattern is somewhat unwieldy, so we implement several combinator functions to produce it
 automatically. The `NT` ("NonTerminal") combinator takes a parser as input and automatically places
 a `Mark()` before it and a `Reduce()` after it, while the `Seq` combinator runs two parsers in
 sequence.

 ```
 ParseResult S(ParseResult prefix) {
   return NT<Node::S>(Seq(A, B));
 }
 ```

 ## Error Handling in a Nutshell

 Every parse result has an error score. The error score determines how much error correction has been
 necessary to keep the parse going. To increase the error score, there are two special combinators:
 `ErInsert` and `ErSkip`.

 `ErSkip` takes a child parser, which it runs. If that child parser parses without error, the parsed
 text is discarded and an error is inserted instead. Here's an example:

 ```
 Alt(Expression, ErSkip("Expected expression before '%MATCH%'", OnePlus(AnyCharBut(";"))));
 ```

 This parser would attempt to parse an expression, and if it can't find one, it will consume up to
 the next semicolon and insert an error message. Note the `'%MATCH%'` in the message string; this
 will be substituted for the text matched before the semicolon. The error score for the returned
 parse result will increase by the length of the text matched by the error.

 If the parser given to `ErSkip` doesn't match without error, `ErSkip` fails completely, so error
 recovery does not occur.

 `ErInsert` is more straightforward than `ErSkip`. It injects the given amount of error at the
 current parse position without moving.

 ```
 Alt(Expression, Alt(Token(";"), ErInsert("Expected semicolon", 1)));
 ```

 With this parser, if we do not see an expression, we'll return a `ParseResult` with the same
 position as the one we were given and an error score increased by 1.

 Conceptually, `ErInsert` can be thought of as inserting text into the parse stream to allow the
 parse to continue. As such, it has an overload that allows you to specify the string inserted
 instead of a length:

 ```
 Alt(Expression, Alt(Token(";"), ErInsert("Expected semicolon", ";")));
 ```

 This is for readability purposes only; the only property of the string that affects behavior is
 its length.

 ### Levenshtein Error Score Combining

 The way the error combinators affect error score has an important caveat. Consider the following:

 ```
 Seq(ErInsert("Expected a thing", 2), ErSkip("Unexpected character", AnyChar));
 ```

 The `ErInsert` adds an error score of 2, and the `ErSkip` parses a single skipped character, so it
 adds a score of 1. We would thus expect the total error score to be 3. However, it will actually be
 2.

 The reason is that the error score is a Levenshtein distance between the parse stream and the
 hypothetical parse stream which contains our modified, recovered parse. This means the error score
 increases by one for any character that is inserted, deleted, *or replaced*. We don't have a
 specific error combinator for replacement, but an insert and a skip consecutively will be merged
 into replacement if possible.

 This may be more intuitive if we consider the algorithm used to calculate the error score. The parse
 result actually maintains three error scores: an insertion error score, a deletion error score, and
 an internal error score. They are used as follows:

   * `ErSkip` increases the deletion error score
   * `ErInsert` increases the insertion error score
   * If we parse a non-zero number of characters normally, the internal error score increases by the
     maximum of the insertion and deletion error scores, and the insertion and deletion error scores
     become zero.
   * The error score of the parse result at any given moment is the internal score plus the maximum
     of the insertion and deletion scores.
	# Fuchsia Shell Parser

	Provides a basic parser for the shell grammar.

	The parser is built on combinators, which means the parser is built out of other parsers.

	A parser in the source appears as a function taking a single argument of type `ParseResult` and
	returning a value of type `ParseResult`. The argument is the result of parsing a prefix before the
	current parse. As an example, consider the grammar:

	```
	S <- A B
	```

	Assuming we had parser functions for `A` and `B`, named `A()` and `B()` respectively, we could
	implement `S` as follows:

	```
	ParseResult S(ParseResult prefix) {
	return B(A(prefix));
	}
	```

	A `ParseResult` represents a position in the string. If parsing fails completely, the returned
	`ParseResult` is equal to `ParseResult::kEnd`, which tests `false` when used as a boolean.
	Generally, we hope that a failed parse will be recovered, so we will always get a valid parse from
	our grammar, but it may or may not contain error nodes where we had to recover a failure.

	`ParseResult` contains three pieces of information: the "tail" of the parse, representing the
	un-parsed data remaining after the parse, various information comprising an "error score", which
	indicates how much error recovery was necessary to make the parse succeed, and a stack, which
	contains the AST nodes we have produced.

	To understand the stack, consider our `A B` example above. After successfully parsing `S`, the stack
	might contain two nodes:

	```
	Node::A("A") Node::B("B")
	```

	The `ParseResult`'s `node` accessor would yield `Node::B`, as it yields the top of the stack. We can
	call `Reduce()` on a `ParseResult` to collapse the stack into a single node, so if we called
	`Reduce<Node::S>()` on our result, we would end up with one node on the stack:

	```
	Node::S(children = { Node::A Node::B })
	```

	`Reduce()` will consume the entire stack by default. This is inconvenient for more complicated
	parsers. To make `S` safe, we would insert a "marker node" to indicate the point in the stack where
	`Reduce()` should stop consuming. Thus our implementation looks like this:

	```
	ParseResult S(ParseResult prefix) {
	return B(A(prefix.Mark())).Reduce<Node::S>();
	}
	```

	This pattern is somewhat unwieldy, so we implement several combinator functions to produce it
	automatically. The `NT` ("NonTerminal") combinator takes a parser as input and automatically places
	a `Mark()` before it and a `Reduce()` after it, while the `Seq` combinator runs two parsers in
	sequence.

	```
	ParseResult S(ParseResult prefix) {
	return NT<Node::S>(Seq(A, B));
	}
	```

	## Error Handling in a Nutshell

	Every parse result has an error score. The error score determines how much error correction has been
	necessary to keep the parse going. To increase the error score, there are two special combinators:
	`ErInsert` and `ErSkip`.

	`ErSkip` takes a child parser, which it runs. If that child parser parses without error, the parsed
	text is discarded and an error is inserted instead. Here's an example:

	```
	Alt(Expression, ErSkip("Expected expression before '%MATCH%'", OnePlus(AnyCharBut(";"))));
	```

	This parser would attempt to parse an expression, and if it can't find one, it will consume up to
	the next semicolon and insert an error message. Note the `'%MATCH%'` in the message string; this
	will be substituted for the text matched before the semicolon. The error score for the returned
	parse result will increase by the length of the text matched by the error.

	If the parser given to `ErSkip` doesn't match without error, `ErSkip` fails completely, so error
	recovery does not occur.

	`ErInsert` is more straightforward than `ErSkip`. It injects the given amount of error at the
	current parse position without moving.

	```
	Alt(Expression, Alt(Token(";"), ErInsert("Expected semicolon", 1)));
	```

	With this parser, if we do not see an expression, we'll return a `ParseResult` with the same
	position as the one we were given and an error score increased by 1.

	Conceptually, `ErInsert` can be thought of as inserting text into the parse stream to allow the
	parse to continue. As such, it has an overload that allows you to specify the string inserted
	instead of a length:

	```
	Alt(Expression, Alt(Token(";"), ErInsert("Expected semicolon", ";")));
	```

	This is for readability purposes only; the only property of the string that affects behavior is
	its length.

	### Levenshtein Error Score Combining

	The way the error combinators affect error score has an important caveat. Consider the following:

	```
	Seq(ErInsert("Expected a thing", 2), ErSkip("Unexpected character", AnyChar));
	```

	The `ErInsert` adds an error score of 2, and the `ErSkip` parses a single skipped character, so it
	adds a score of 1. We would thus expect the total error score to be 3. However, it will actually be
	2.

	The reason is that the error score is a Levenshtein distance between the parse stream and the
	hypothetical parse stream which contains our modified, recovered parse. This means the error score
	increases by one for any character that is inserted, deleted, or replaced. We don't have a
	specific error combinator for replacement, but an insert and a skip consecutively will be merged
	into replacement if possible.

	This may be more intuitive if we consider the algorithm used to calculate the error score. The parse
	result actually maintains three error scores: an insertion error score, a deletion error score, and
	an internal error score. They are used as follows:

	* `ErSkip` increases the deletion error score
	* `ErInsert` increases the insertion error score
	* If we parse a non-zero number of characters normally, the internal error score increases by the
	maximum of the insertion and deletion error scores, and the insertion and deletion error scores
	become zero.
	* The error score of the parse result at any given moment is the internal score plus the maximum
	of the insertion and deletion scores.