Provides a basic parser for the shell grammar.
The parser is built on combinators, which means the parser is built out of other parsers.
A parser in the source appears as a function taking a single argument of type ParseResult
and returning a value of type ParseResult
. The argument is the result of parsing a prefix before the current parse. As an example, consider the grammar:
S <- A B
Assuming we had parser functions for A
and B
, named A()
and B()
respectively, we could implement S
as follows:
ParseResult S(ParseResult prefix) { return B(A(prefix)); }
A ParseResult
represents a position in the string. If parsing fails completely, the returned ParseResult
is equal to ParseResult::kEnd
, which tests false
when used as a boolean. Generally, we hope that a failed parse will be recovered, so we will always get a valid parse from our grammar, but it may or may not contain error nodes where we had to recover a failure.
ParseResult
contains three pieces of information: the “tail” of the parse, representing the un-parsed data remaining after the parse, error statistics, which includes the number of errors and the number of characters that were successfully parsed without error, and a stack, which contains the AST nodes we have produced.
To understand the stack, consider our A B
example above. After successfully parsing S
, the stack might contain two nodes:
Node::A("A") Node::B("B")
The ParseResult
's node
accessor would yield Node::B
, as it yields the top of the stack. We can call Reduce()
on a ParseResult
to collapse the stack into a single node, so if we called Reduce<Node::S>()
on our result, we would end up with one node on the stack:
Node::S(children = { Node::A Node::B })
Reduce()
will consume the entire stack by default. This is inconvenient for more complicated parsers. To make S
safe, we would insert a “marker node” to indicate the point in the stack where Reduce()
should stop consuming. Thus our implementation looks like this:
ParseResult S(ParseResult prefix) { return B(A(prefix.Mark())).Reduce<Node::S>(); }
This pattern is somewhat unwieldy, so we implement several combinator functions to produce it automatically. The NT
(“NonTerminal”) combinator takes a parser as input and automatically places a Mark()
before it and a Reduce()
after it, while the Seq
combinator runs two parsers in sequence.
ParseResult S(ParseResult prefix) { return NT<Node::S>(Seq(A, B)); }
To mark and recover an error during parsing, there are two special combinators: ErInsert
and ErSkip
.
ErSkip
takes a child parser, which it runs. If that child parser parses without error, the parsed text is discarded and an error is inserted instead. Here's an example:
Alt(Expression, ErSkip("Expected expression before '%MATCH%'", OnePlus(AnyCharBut(";"))));
This parser would attempt to parse an expression, and if it can't find one, it will consume up to the next semicolon and insert an error message. Note the '%MATCH%'
in the message string; this will be substituted for the text matched before the semicolon. The parse result moves ahead by the length of the match, but the internal count of successfully-parsed characters (accessed via parsed_successfully()
) does not increase.
If the parser given to ErSkip
doesn't match without error, ErSkip
fails completely, so error recovery does not occur.
ErInsert
is more straightforward than ErSkip
. It injects an error at the current parse position without moving.
Alt(Expression, Alt(Token(";"), ErInsert("Expected semicolon")));