The Fuchsia shell syntax is defined as a Parsing Expression Grammar. This means alternation in the specification of the grammar is explicitly sequential; A ← B | C
will always match B
if it can. By convention, we use the syntax A ← B / C
to make it explicit that alternation is sequential.
We will use the following syntax to specify our grammar in this document:
'while'
or '|>'
Expression
or FunctionBody
⊔
indicates a sequence of one or more whitespace characters. See below.*
to indicate zero or more repetitions, +
to indicate one or more repetitions, ?
to indicate zero or one occurrences, {m,n}
to indicate between m
and n
repetitions, and {n}
to indicate exactly n
repetitions./
are sequential alternatives.&
is a zero-length match. It will not consume the text it matches and terms after it will attempt to match at the same position the zero-length term did.!
is an inverse match. This behaves as a zero-length match, but parsing fails if matching this term succeeds and vice versa.∩
are intersected terms. We define the intersected term as a term which matches the longest possible string which matches both terms. By convention, the parse tree yielded from this operation is assumed to be the parse tree of the right-hand operand term.[
and ]
delineate a character match block, as would appear in a Perl-compatible regular expression.<nl>
indicates the newline character..
is a term which matches any single character.←
as in Addition ← Multiplication '+' Multiplication
←⊔
indicates a production where between each term, and each subterm in grouped sequences, the term ⊔?
is present, but has been elided for clarity. More plainly, ←⊔
indicates a term which is whitespace-insensitive.We will assume our input is a stream of UTF-8 Characters.
We define Whitespace as follows:
⊔ ← '#' (!<nl> .)* <nl> / AnyUnicodeWhitespace+
Where AnyUnicodeWhitespace
is any single character classified as whitespace by the Unicode standard. (NOTE: Today's parser only counts space, newline, carriage return, and tab).
Note that our comment syntax is embedded in our whitespace definition:
# This line will parse entirely as whitespace.
Identifiers are defined as follows:
UnescapedIdentifier ← [a-zA-Z0-9_]+ Identifier ← ![0-9] UnescapedIdentifier
Valid identifiers might include:
foo item_0 a_Mixed_Bag
Integers are defined as follows:
Digit ← [0-9] HexDigit ← [a-fA-F0-9] DecimalInteger ← 0 !Digit / !'0' Digit+ ( '_' Digit+ )* HexInteger ← '0x' HexDigit+ ( '_' HexDigit+ )* Integer ← DecimalInteger / HexInteger
Valid integers might include:
0 12345 12_345 0x1234abcd 0x12_abcd
Strings are defined as follows:
EscapeSequence ← '\n' / '\t' / '\r' / '\' <nl> / '\\' / '\"' / '\u' HexDigit{6} StringEntity ← !( '\' / '"' / <nl> ) . / EscapeSequence NormalString ← '"' StringEntity* '"' String ← NormalString / MultiString
TODO: Define MultiString
Valid strings might include:
"The quick brown fox jumped over the lazy dog." "A newline.\nA tab\tA code point\u00264b" "String starts here \ and keeps on going"
Paths are defined as follows:
PathCharacter ← ![`&;|/\()[]{}] . PathElement ← PathCharacter+ / '\' . / '`' ( !'`' . )* '`' RootPath ← ( '/' PathElement+ )+ Path ← '.'? RootPath '/'? / '.'? '/' / '.'
Valid paths might include:
/foo /foo/bar /foo/bar/ ./foo/bar/ ./ / .
Variable declarations are defined as follows:
KWVar ← 'var' !IdentifierCharacter KWConst ← 'const' !IdentifierCharacter VariableDecl ←⊔ ( KWVar / KWConst ) Identifier '=' Expression
Valid variable declarations might include:
var foo = 4 const foo = "Ham Sandwich"
Object literals are defined as follows:
Object ←⊔ '{' ObjectBody? '}' ObjectBody ←⊔ Field ( ',' Field )* ','? Field ←⊔ ( NormalString / Identifier ) ':' SimpleExpression
Valid object literals might include:
{} { foo: 6, "bar & grill": "Open now" } { foo: { bar: 6 }, "bar & grill": "Open now" }
Addition is defined as follows:
AddSub ← Value ( [+-] Value )*
It looks as you'd expect:
a + b
Values are defined as follows:
Value ← Object / Atom Atom ← Identifer / String / Real / Integer / Path
Expressions are defined as follows:
Expression ← Addition
A program is defined as:
Program ←⊔ VariableDecl ([;&] Program)?