blob: c29038db5bb0903f7226308214a68212c70e3457 [file] [log] [blame] [view]
# Fuchsia Shell Syntax
The Fuchsia shell syntax is defined as a [Parsing Expression
Grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar). This means alternation in the
specification of the grammar is explicitly sequential; `A ← B | C` will *always* match `B` if it
can. By convention, we use the syntax `A ← B / C` to make it explicit that alternation is
sequential.
## Grammar Specification Conventions
We will use the following syntax to specify our grammar in this document:
* Literal tokens will be reflected with single quotes, as in `'while'` or `'|>'`
* Non-terminals will be camel-cased and capitalized, as in `Expression` or `FunctionBody`
* One or more terms listed consecutively with no operator between them are referred to as a
"sequence"
* `⊔` indicates a sequence of one or more whitespace characters. See below.
* A sequence surrounded by parentheses forms a single term.
* A term can be suffixed with `*` to indicate zero or more repetitions, `+` to indicate one or more
repetitions, `?` to indicate zero or one occurrences, `{m,n}` to indicate between `m` and `n`
repetitions, and `{n}` to indicate exactly `n` repetitions.
* Sequences separated by `/` are sequential alternatives.
* A term prefixed by `&` is a zero-length match. It will not consume the text it matches and terms
after it will attempt to match at the same position the zero-length term did.
* A term prefixed by `!` is an inverse match. This behaves as a zero-length match, but parsing fails
if matching this term succeeds and vice versa.
* Two terms joined by `∩` are intersected terms. We define the intersected term as a term which
matches the longest possible string which matches both terms. By convention, the parse tree
yielded from this operation is assumed to be the parse tree of the right-hand operand term.
* `[` and `]` delineate a character match block, as would appear in a Perl-compatible regular
expression.
* `<nl>` indicates the newline character.
* `.` is a term which matches any single character.
* Productions will be indicated with `←` as in `Addition ← Multiplication '+' Multiplication`
* `←⊔` indicates a production where between each term, and each subterm in grouped sequences, the
term `⊔?` is present, but has been elided for clarity. More plainly, `←⊔` indicates a term which
is whitespace-insensitive.
We will assume our input is a stream of UTF-8 Characters.
### Whitespace
We define Whitespace as follows:
```
⊔ ← '#' (!<nl> .)* <nl> / AnyUnicodeWhitespace+
```
Where `AnyUnicodeWhitespace` is any single character classified as whitespace by the Unicode
standard. (NOTE: Today's parser only counts space, newline, carriage return, and tab).
Note that our comment syntax is embedded in our whitespace definition:
```
# This line will parse entirely as whitespace.
```
## Identifiers
Identifiers are defined as follows:
```
UnescapedIdentifier ← [a-zA-Z0-9_]+
Identifier ← ![0-9] UnescapedIdentifier
```
Valid identifiers might include:
```
foo
item_0
a_Mixed_Bag
```
## Integers
Integers are defined as follows:
```
Digit ← [0-9]
HexDigit ← [a-fA-F0-9]
DecimalInteger ← 0 !Digit / !'0' Digit+ ( '_' Digit+ )*
HexInteger ← '0x' HexDigit+ ( '_' HexDigit+ )*
Integer ← DecimalInteger / HexInteger
```
Valid integers might include:
```
0
12345
12_345
0x1234abcd
0x12_abcd
```
## Strings
Strings are defined as follows:
```
EscapeSequence ← '\n' / '\t' / '\r' / '\' <nl> / '\\' / '\"' / '\u' HexDigit{6}
StringEntity !( '\' / '"' / <nl> ) . / EscapeSequence
NormalString ← '"' StringEntity* '"'
String ← NormalString / MultiString
```
TODO: Define `MultiString`
Valid strings might include:
```
"The quick brown fox jumped over the lazy dog."
"A newline.\nA tab\tA code point\u00264b"
"String starts here \
and keeps on going"
```
## Paths
Paths are defined as follows:
```
PathCharacter ← ![`&;|/\()[]{}] .
PathElement ← PathCharacter+ / '\' . / '`' ( !'`' . )* '`'
RootPath ← ( '/' PathElement+ )+
Path ← '.'? RootPath '/'? / '.'? '/' / '.'
```
Valid paths might include:
```
/foo
/foo/bar
/foo/bar/
./foo/bar/
./
/
.
```
## Variable Declarations
Variable declarations are defined as follows:
```
KWVar ← 'var' !IdentifierCharacter
KWConst ← 'const' !IdentifierCharacter
VariableDecl ←⊔ ( KWVar / KWConst ) Identifier '=' Expression
```
Valid variable declarations might include:
```
var foo = 4
const foo = "Ham Sandwich"
```
## Object literals
Object literals are defined as follows:
```
Object ←⊔ '{' ObjectBody? '}'
ObjectBody ←⊔ Field ( ',' Field )* ','?
Field ←⊔ ( NormalString / Identifier ) ':' SimpleExpression
```
Valid object literals might include:
```
{}
{ foo: 6, "bar & grill": "Open now" }
{ foo: { bar: 6 }, "bar & grill": "Open now" }
```
## Addition and subtraction
Addition is defined as follows:
```
AddSub ← Value ( [+-] Value )*
```
It looks as you'd expect:
```
a + b
```
## Values
Values are defined as follows:
```
Value ← Object / Atom
Atom ← Identifer / String / Real / Integer / Path
```
## Expressions
Expressions are defined as follows:
```
Expression ← Addition
```
## Programs
A program is defined as:
```
Program ←⊔ VariableDecl ([;&] Program)?
```