| # Wuffs the Language |
| |
| Wuffs is an imperative, C-like language and much of it should look familiar to |
| somebody versed in any one of C, C++, D, Go, Java, JavaScript, Objective-C, |
| Rust, Swift, etc. Accordingly, this section is not an exhaustive specification, |
| merely a rough guide of where Wuffs differs. |
| |
| Like Go, [semi-colons can be omitted](https://golang.org/ref/spec#Semicolons), |
| and `wuffsfmt` will remove them. Similarly, the body of an `if` or `while` must |
| be enclosed by curly `{}`s. There is no 'dangling else' ambiguity. |
| |
| |
| ## Keywords |
| |
| 7 keywords introduce top-level concepts: |
| |
| - `const` |
| - `error` |
| - `func` |
| - `packageid` |
| - `struct` |
| - `suspension` |
| - `use` |
| |
| 2 keywords distinguish between public and private API: |
| |
| - `pri` |
| - `pub` |
| |
| 8 keywords deal with control flow within a function: |
| |
| - `break` |
| - `continue` |
| - `else` |
| - `if` |
| - `iterate` |
| - `return` |
| - `while` |
| - `yield` |
| |
| 5 keywords deal with assertions: |
| |
| - `assert` |
| - `inv` |
| - `post` |
| - `pre` |
| - `via` |
| |
| 2 keywords deal with types: |
| |
| - `nptr` |
| - `ptr` |
| |
| 1 keyword deals with local variables: |
| |
| - `var` |
| |
| TODO: categorize try, io\_bind. Also: and, or, not, as, ref, deref, false, |
| true, in, out, this, u8, u16, etc. |
| |
| |
| ## Operators |
| |
| There is no operator precedence. `a * b + c` is an invalid expression. You must |
| explicitly write either `(a * b) + c` or `a * (b + c)`. |
| |
| Some binary operators (`+`, `*`, `&`, `|`, `^`, `and`, `or`) are also |
| associative: `(a + b) + c` and `a + (b + c)` are equivalent, and can be written |
| as `a + b + c`. |
| |
| The logical operators, `&&` and `||` and `!` in C, are written as `and` and |
| `or` and `not` in Wuffs. |
| |
| TODO: ignore-overflow ops, equivalent to Swift's `&+`. |
| |
| Converting an expression `x` to the type `T` is written as `x as T`. |
| |
| |
| ## Types |
| |
| Types read from left to right: `ptr array [100] u32` is a pointer to a |
| 100-element array of unsigned 32-bit integers. `ptr` here means a non-null |
| pointer. Use `nptr` for a nullable pointer type. |
| |
| Integer types can also be refined: `var x u32[10..20]` defines a variable x |
| that is stored as 4 bytes (32 bits) and can be combined arithmetically (e.g. |
| added, compared) with other `u32`s, but whose value must be between 10 and 20 |
| inclusive. The syntax is reminiscent of Pascal's subranges, but in Wuffs, a |
| subsequent assignment like `x = 21` or even `x = y + z` is a compile time error |
| unless the right hand side can be proven to be within range. |
| |
| Types can also provide a default value, such as `u32[10..20] = 16`, especially |
| if zero is out of range. If not specified, the implicit default is zero. |
| |
| Refinement bounds may be omitted, where the base integer type provides the |
| implicit bound. `var x u8[..5]` means that `x` is between 0 and 5. `var y |
| i8[-7..]` means that `y` is between -7 and +127. `var z u32[..]` is equivalent |
| to `var z u32`. |
| |
| Refinement bounds must be constant expressions. `var x u32[..2+3]` is valid, |
| but `var x u32; var y u32[..x]` is not. Wuffs does not have dependent types. |
| Relationships such as `y <= x` are expressible as assertions (see below), but |
| not by the type system. |
| |
| |
| ## Structs |
| |
| Structs are a list of fields, enclosed in parentheses: `struct point(x i32, y |
| i32)`. The struct name may be followed by a question mark `?`, which means that |
| its methods may be coroutines. (See below). |
| |
| |
| ## Functions |
| |
| Function signatures read from left to right: `func max(x i32, y i32)(z i32)` is |
| a function that takes two `i32`s and returns one `i32`. Two pairs of |
| parentheses are required: a function that in other languages would return |
| `void` in Wuffs returns the empty struct `()`, also known as the unit type. |
| |
| When calling a function, each argument must be named. It is `m = max(x:10, |
| y:20)` and not `m = max(10, 20)`. |
| |
| The function name, such as `max`, may be followed by either an exclamation mark |
| `!` or a question mark `?` but not both. An exclamation mark means that the |
| function is impure, and may assign to things other than its local variables. A |
| question mark means that the function is impure and furthermore a coroutine: it |
| can return a (recoverable) suspension code, such as needing more input data, or |
| return a (fatal) error code, such as being invalid with respect to a file |
| format. If suspended, calling that function again will resume at the suspension |
| point, not necessarily at the top of the function body. If an error was |
| returned, calling that function again will return the same error. |
| |
| Some functions are methods, with syntax `func foo.bar(etc)(etc)`, where `foo` |
| names a struct type and `bar` is the method name. Within the function body, an |
| implicit `this` argument will point to the receiving struct. Methods can also |
| be marked as impure or coroutines. |
| |
| |
| ## Variables |
| |
| Variable declarations are [hoisted like |
| JavaScript](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/var), |
| so that all variables have function scope. Proofs are easier to work with, for |
| both humans and computers, when one variable can't shadow another variable with |
| the same name. |
| |
| |
| ## Assertions |
| |
| Assertions state a boolean typed, pure expression, such as `assert x >= y`, |
| that the compiler must prove. As a consequence, the compiler can then prove |
| that, if `x` and `y` have type `u32`, the expression `x - y` will not |
| underflow. |
| |
| An assertion applies for a particular point in the program. Subsequent |
| assignments or impure function calls, such as `x = z` or `f!()`, can invalidate |
| previous assertions. See the "Facts" section below for more details. |
| |
| Arithmetic inside assertions is performed in ideal integer math, working in the |
| integer ring ℤ. An expression like `x + y` in an assertion never overflows, |
| even if `x` and `y` have a realized (non-ideal) integer type like `u32`. |
| |
| The `assert` keyword makes an assertion as a statement, and can occur inside a |
| block of code like any other statement, such as an assignment `x = y` or an |
| `if` statement. The `pre`, `post` and `inv` keywords are similar to `assert`, but |
| apply only to `func` bodies and `while` loops. |
| |
| `pre` states a pre-condition: an assertion that must be proven on every |
| function call or on every entry to the loop. Loop entry means the initial |
| execution of the loop, plus every explicit `continue` statement that targets |
| that `while` loop, plus the implicit `continue` at the end of the loop. For |
| `while` loops, the pre-condition must be proven before the condition (the `x < |
| y` in `while x < y { etc }`) executes. |
| |
| `post` states a post-condition: an assertion that must be proven on every |
| function return (whether by an explicit `return` statement or by an implicit |
| `return` at the end of the function body) or on every loop exit (whether by an |
| explicit `break` statement or by the implicit `break` when the condition |
| evaluates false). |
| |
| For example, a `while` loop that contains no explicit `break` statements can |
| always claim the inverse of its condition as a post-condition: `while x < y, |
| post x >= y { etc }`. |
| |
| `inv` states an invariant, which is simply an assertion that is both a |
| pre-condition and post-condition. |
| |
| When a `func` or `while` has multiple assertions, they must be listed in `pre`, |
| `inv` and then `post` order. |
| |
| |
| ## Facts |
| |
| Static constraints are expressed in the type system and apply for the lifetime |
| of the variable, struct field or function argument. For example, `var x u32` |
| and `var p ptr u32` constrain `x` to be non-negative and `p` to be non-null. |
| Furthermore, `var y u32[..20]` constrains `y` to be less than or equal to `20`. |
| |
| Dynamic constraints, also known as facts, are previously seen assertions, |
| explicit or implicit, that are known to hold at a particular point in the |
| program. The set of known facts can grow and shrink over the analysis of a |
| program's statements. For example, the assignment `x = 7` can both add the |
| implicit assertion `x == 7` to the set of known facts, as well as remove any |
| other facts that mention the variable `x`, as `x`'s value might have changed. |
| TODO: define exactly when facts are dropped or updated. |
| |
| Similarly, the sequence `assert x < y; z = 3` would result in set of known |
| facts that include both the explicit assertion `x < y` and the implicit |
| assertion `z == 3`, if none of `x`, `y` and `z` alias another (e.g. they are |
| all local variables). |
| |
| Wuffs has two forms of non-sequential control flow: `if` branches (including |
| `if`, `else if`, `else if` chains) and `while` loops. |
| |
| For an `if` statement, such as `if b { etc0 } else { etc1 }`, the condition `b` |
| is a known fact inside the if-true branch `etc0` and its inverse `not b` is a |
| known fact inside the if-false branch `etc1`. After that `if` statement, the |
| overall set of known facts is the intersection of the set of known facts after |
| each non-terminating branch. A terminating branch is a non-empty block of code |
| whose final statement is a `return`, `break`, `continue` or an `if`, `else if` |
| chain where the final `else` is present and all branches terminate. TODO: also |
| allow ending in `while true`? |
| |
| For a `while` statement, such as `while b { etc }`, the set of known facts at |
| the start of the body `etc` is precisely the condition `b` plus all `pre` and |
| `inv` assertions. No other prior facts carry into the loop body, as the loop |
| can re-start coming from other points in the program (i.e. an explicit or |
| implicit `continue`) If the `while` loop makes no such `pre` or `inv` |
| assertions, no facts are known other than `b`. |
| |
| Similarly, the set of known facts after the `while` loop exits is precisely its |
| `inv` and `post` assertions, and no other. In other words, a `while` loop must |
| explicitly list its state of the world just before and just after it executes. |
| This includes facts (i.e. invariants) about variables that are not mentioned at |
| all by the `while` condition or its body but are proven before the `while` loop |
| and assumed after it. |
| |
| |
| ## Proofs |
| |
| Wuffs' assertion and bounds checking system is a proof checker, not a full SMT |
| solver or automated theorem prover. It verifies explicit annotations instead of |
| the more open-ended task of searching for implicit proof steps. This involves |
| more explicit work by the programmer, but compile times matter, so Wuffs is |
| fast (and dumb) instead of smart (and slow). |
| |
| Nonetheless, the Wuffs syntax is regular (and unlike C++, does not require a |
| symbol table to parse), so it should be straightforward to transform Wuffs code |
| to and from file formats used by more sophisticated proof engines. |
| |
| Some rules are applied automatically by the proof checker. For example, if `x |
| <= 10` and `y <= 5` are both known true, whether by a static constraint (the |
| type system) or dynamic constraint (an asserted fact), then the checker knows |
| that the expression `x + y` is bounded above by `10 + 5` and therefore will not |
| overflow a `u8` (but would overflow a `u8[..12]`). |
| |
| TODO: rigorously specify these automatic rules, when we have written more Wuffs |
| code and thus have more experience on what rules are needed to implement |
| multiple, real world image codecs. |
| |
| Other rules are built in to the proof checker but are not applied automatically |
| (see "fast... instead of smart" above). Such rules have double-quote enclosed |
| names that look a little like mathematical statements. They are axiomatic, in |
| that these rules are assumed, not proved, by the Wuffs toolchain. They are |
| typically at a higher level than e.g. Peano axioms, as Wuffs emphasizes |
| practicality over theoretical minimalism. As they are axiomatic, they endeavour |
| to only encode 'obvious' mathematical rules. For example, the rule named `"a < |
| b: a < c; c <= b"` is one expression of transitivity: the assertion `a < b` is |
| proven if both `a < c` and `c <= b` are provable, for some expression `c`. |
| Terms like `a`, `b` and `c` here are all integers in ℤ, they do not encompass |
| floating point concepts like negative zero, `NaN`s or rounding. |
| |
| Such rules are invoked by the `via` keyword. For example, `assert n_bits < 12 |
| via "a < b: a < c; c <= b"(c:width)` makes the assertion `n_bits < 12` by |
| applying that transitivity rule, where `a` is `n_bits`, `b` is `12` and `c` is |
| `width`. The trailing `(c:width)` syntax is deliberately similar to a function |
| call (recall that when calling a function, each argument must be named), but |
| the `"a < b: a < c; c <= b"` named rule is not a function-typed expression. |
| |
| TODO: specify these built-in `via` rules, again after more experience. |
| |
| |
| ## Miscellaneous Language Notes |
| |
| Labeled `break` and `continue` statements enable jumping out of loops that |
| aren't the most deeply nested. The syntax for labels is `while:label` instead |
| of Java's `label:while`, as the former is slightly easier to parse, and Wuffs |
| does not otherwise use labels for switch cases or goto targets. |
| |
| TODO: describe the built in `buf1` and `buf2` types: 1- and 2-dimensional |
| buffers of bytes, such as an I/O stream or a table of pixel data. |
| |
| |
| --- |
| |
| Updated on April 2018. |