Fuchsia Shell Syntax

The Fuchsia shell syntax is defined as a Parsing Expression Grammar. This means alternation in the specification of the grammar is explicitly sequential; A ← B | C will always match B if it can. By convention, we use the syntax A ← B / C to make it explicit that alternation is sequential.

Grammar Specification Conventions

We will use the following syntax to specify our grammar in this document:

  • Literal tokens will be reflected with single quotes, as in 'while' or '|>'
  • Non-terminals will be camel-cased and capitalized, as in Expression or FunctionBody
  • One or more terms listed consecutively with no operator between them are referred to as a “sequence”
  • indicates a sequence of one or more whitespace characters. See below.
  • A sequence surrounded by parentheses forms a single term.
  • A term can be suffixed with * to indicate zero or more repetitions, + to indicate one or more repetitions, ? to indicate zero or one occurrences, {m,n} to indicate between m and n repetitions, and {n} to indicate exactly n repetitions.
  • Sequences separated by / are sequential alternatives.
  • A term prefixed by & is a zero-length match. It will not consume the text it matches and terms after it will attempt to match at the same position the zero-length term did.
  • A term prefixed by ! is an inverse match. This behaves as a zero-length match, but parsing fails if matching this term succeeds and vice versa.
  • Two terms joined by are intersected terms. We define the intersected term as a term which matches the longest possible string which matches both terms. By convention, the parse tree yielded from this operation is assumed to be the parse tree of the right-hand operand term.
  • [ and ] delineate a character match block, as would appear in a Perl-compatible regular expression.
  • <nl> indicates the newline character.
  • . is a term which matches any single character.
  • Productions will be indicated with as in Addition ← Multiplication '+' Multiplication
  • ←⊔ indicates a production where between each term, and each subterm in grouped sequences, the term ⊔? is present, but has been elided for clarity. More plainly, ←⊔ indicates a term which is whitespace-insensitive.

We will assume our input is a stream of UTF-8 Characters.

Whitespace

We define Whitespace as follows:

⊔ ← '#' (!<nl> .)* <nl> / AnyUnicodeWhitespace+

Where AnyUnicodeWhitespace is any single character classified as whitespace by the Unicode standard. (NOTE: Today's parser only counts space, newline, carriage return, and tab).

Note that our comment syntax is embedded in our whitespace definition:

# This line will parse entirely as whitespace.

Identifiers

Identifiers are defined as follows:

UnescapedIdentifier ← [a-zA-Z0-9_]+
Identifier ← ![0-9] UnescapedIdentifier

Valid identifiers might include:

foo
item_0
a_Mixed_Bag

Integers

Integers are defined as follows:

Digit ← [0-9]
HexDigit ← [a-fA-F0-9]
DecimalInteger ← 0 !Digit / !'0' Digit+ ( '_' Digit+ )*
HexInteger ← '0x' HexDigit+ ( '_' HexDigit+ )*
Integer ← DecimalInteger / HexInteger

Valid integers might include:

0
12345
12_345
0x1234abcd
0x12_abcd

Strings

Strings are defined as follows:

EscapeSequence ← '\n' / '\t' / '\r' / '\' <nl> / '\\' / '\"' / '\u' HexDigit{6}
StringEntity ← !( '\' / '"' / <nl> ) . / EscapeSequence
NormalString ← '"' StringEntity* '"'
String ← NormalString / MultiString

TODO: Define MultiString

Valid strings might include:

"The quick brown fox jumped over the lazy dog."
"A newline.\nA tab\tA code point\u00264b"
"String starts here \
and keeps on going"

Paths

Paths are defined as follows:

PathCharacter ← ![`&;|/\()[]{}] .
PathElement ← PathCharacter+ / '\' . / '`' ( !'`' . )* '`'
RootPath ← ( '/' PathElement+ )+
Path ← '.'? RootPath '/'? / '.'? '/' / '.'

Valid paths might include:

/foo
/foo/bar
/foo/bar/
./foo/bar/
./
/
.

Variable Declarations

Variable declarations are defined as follows:

KWVar ← 'var' !IdentifierCharacter
KWConst ← 'const' !IdentifierCharacter
VariableDecl ←⊔ ( KWVar / KWConst ) Identifier '=' Expression

Valid variable declarations might include:

var foo = 4
const foo = "Ham Sandwich"

Object literals

Object literals are defined as follows:

Object ←⊔ '{' ObjectBody? '}'
ObjectBody ←⊔ Field ( ',' Field  )* ','?
Field ←⊔ ( NormalString / Identifier  ) ':' SimpleExpression

Valid object literals might include:

{}
{ foo: 6, "bar & grill": "Open now"  }
{ foo: { bar: 6  }, "bar & grill": "Open now"  }

Addition and subtraction

Addition is defined as follows:

AddSub ← Value ( [+-] Value )*

It looks as you'd expect:

a + b

Values

Values are defined as follows:

Value ← Object / Atom
Atom ← Identifer / String / Real / Integer / Path

Expressions

Expressions are defined as follows:

Expression ← Addition

Programs

A program is defined as:

Program ←⊔ VariableDecl ([;&] Program)?