| # Related Work |
| |
| Wuffs is an industrial project, not an academic project, and this Related Work |
| section is not as extensive or as rigorous as would be found in an academic |
| thesis or journal article. The underlying goal is usefulness, not novelty. For |
| Wuffs' users, it'd be perfectly fine to have multiple implementations of a good |
| idea, even if the later ones wouldn't earn a Ph. D. |
| |
| A number of other projects are listed below. Wuffs is not exactly like any of |
| them, for one reason or another. We're not claiming that "use a state of the |
| art theorem prover" or "write it in Rust" are unworkable approaches. They're |
| just different approaches, with different trade-offs. |
| |
| For example, Wuffs is primarily concerned about *safety* and *performance*. |
| Formal *correctness*, such as being able to mathematically express that "when |
| this function returns, that array's elements will be sorted in increasing |
| order", is less of a concern for Wuffs the Language than for other projects. |
| This is especially as higher level concerns like "correctly implements the JPEG |
| specification" are less amenable to formalization than foundational concepts |
| like searching and sorting. Even with something as mathematical and |
| conceptually simple as Quicksort, compare a two-line [Haskell |
| implementation](https://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Haskell) |
| with the essay that is an [Idris |
| implementation](https://github.com/bmsherman/blog/wiki/Quicksort-in-Idris). |
| Proving *correctness* is an admirable goal, but it is not Wuffs' goal. |
| |
| For Wuffs the Library, confidence about correctness is instead based on |
| testing, both unit testing and for e.g. media codecs, corpora of test files. A |
| small corpus is included in the Wuffs source code repository. Other, much |
| larger corpora (e.g. based on a sample of a web crawl), are not included for |
| download size and copyright reasons. |
| |
| |
| ## Static Checkers |
| |
| Wuffs is a language by itself, integrated with a compiler, not something |
| embedded in the comments of another language's program, supported by a separate |
| program checker, as the bounds check elimination affects the generated code. |
| |
| [Checked C](https://www.microsoft.com/en-us/research/project/checked-c/) has a |
| similar goal of safety, but aims to be backwards compatible with C, including |
| the idiomatic ways in which C code plays fast and loose with pointers and |
| arrays, and implicitly inserting run time checks while maintaining existing |
| data layout (e.g. the sizeof a struct type) for interoperability with other |
| systems. This is a much harder problem than Wuffs' approach of a new, simple, |
| domain-specific programming language. |
| |
| Extended Static Checker for Java (ESC/Java) and its successor |
| [OpenJML](http://www.openjml.org/), which obviously target the Java programming |
| language, similarly has to analyze a language that is more complicated than |
| Wuffs. Java is also not commonly used for writing e.g. low level image codecs, |
| as the language lacks unsigned integer types, it is garbage collected and |
| idiomatic code often allocates. |
| |
| Similarly, [Whiley](http://whiley.org/) is a programming language with extended |
| static checking. In contrast to Wuffs, its `int` type uses unbounded |
| arithmetic, instead of e.g. 32-bit twos-complement representation. That can be |
| easier for theorem provers to reason about, but this can have a performance |
| impact when such integers are used in inner loops. In theory, a compiler could |
| recognize "an `int` restricted to the range [0, 99]" as a `uint8` instead of a |
| potentially 'big integer', but even so, it may improve performance to |
| explicitly use a `uint32` here even when a `uint8` would do. |
| |
| [Spark ADA](http://libre.adacore.com/tools/spark-gpl-edition/) targets a subset |
| of the Ada programming language, which again is more complicated than Wuffs. It |
| also allows for pluggable theorem provers such as Z3, similar to |
| [Dafny](http://research.microsoft.com/dafny). This can increase programmer |
| productivity in that it can take less effort to prove more programs safe, but |
| it also affects reproducibility (when some programs that are provable in one |
| programmer's configuration are unprovable in another) and the size of the |
| trusted computing base. It's also not obvious up front what effect different |
| theorem provers, and their different heuristics, will have on compile times or |
| proof times. |
| |
| In constrast, Wuffs' proof system is fast (and dumb) instead of smart (and |
| slow). For example, [Vale (Verified Assembly Language for |
| Everest)](https://github.com/project-everest/vale) is a promising approach that |
| uses Dafny to verify implementations of cryptographic algorithms. However, |
| ["Vale: Verifying High-Performance Cryptographic Assembly |
| Code"](http://www.andrew.cmu.edu/user/bparno/papers/vale.pdf) reports in Table |
| 1 that verification times are measured in *minutes*. Wuffs aims for |
| *sub-second* compilation times, including proofs of (memory) safety. Figure 15 |
| in that paper suggests that Vale's verification times can grow exponentially in |
| some cases, and also goes on to say, "Dafny/Z3 fails to verify the AES key |
| inversion for 6 unrolled iterations and 9 unrolled iterations, indicating that |
| SMT solvers like Z3 are still occasionally unpredictable". Of course, Vale is |
| also tackling a much harder problem than Wuffs: proving correctness, not just |
| safety. |
| |
| Once again, we're not claiming that these other approaches are unworkable, or |
| not useful, just different, with different trade-offs. |
| |
| |
| ## Why a New Language? |
| |
| Simpler languages are easier to prove things about. Macros, inheritance, |
| closures, generics, operator overloading, goto's and built-in container types |
| are all useful features, but as mentioned in the top-level |
| [README](/README.md), Wuffs is not a general purpose programming language. |
| Instead, its focus is on compile-time provable memory safety, with C-like |
| performance, for CPU-intensive file format decoders. |
| |
| Nonetheless, Wuffs is an imperative language, not a functional language, |
| despite the long history of functional languages and provability. Inner loop |
| performance usage matters, and it is easier to match the optimization |
| techniques of incumbent C libraries with a C-like language than with a |
| functional language. Compared to something like |
| [F\*](https://www.fstar-lang.org/), [Idris](https://www.idris-lang.org/) or |
| [Liquid Haskell](https://ucsd-progsys.github.io/liquidhaskell-blog/), Wuffs has |
| no garbage collector overhead, as it has no garbage collector. |
| |
| Memory usage also matters, again considering per-pixel costs can be multiplied |
| by millions of pixels. It is important to represent e.g. an RGBA pixel as four |
| u8 values (or one u32), not as four naturally-sized-for-the-CPU integers or |
| four 'big integers'. |
| |
| [F\* / KreMLin](https://github.com/FStarLang/kremlin) is a subset of F\* that |
| is similar in some sense to Wuffs, in that it transpiles to C and cares about |
| both safety and performance. Unlike Wuffs, it is a functional language (with |
| dependent types), not an imperative one (with a simpler type system), and uses |
| a sophisticated theorem prover like Z3, with the same trade-offs as discussed |
| above for Spark ADA and Dafny. It also tackles a more complicated problem, in |
| attempting to prove correctness properties, not just safety properties. Further |
| reading is in ["Everest: Towards a Verified, Drop-in Replacement of |
| HTTPS"](https://project-everest.github.io/assets/snapl2017.pdf) and ["Verified |
| Low-Level Programming Embedded in F\*"](https://arxiv.org/pdf/1703.00053.pdf). |
| |
| [Cryptol](https://github.com/GaloisInc/cryptol) is another project that, like |
| Everest (including F\* / KreMLin and its HACL\* sub-project, and Dafny / Vale), |
| focuses on cryptographic algorithms, rather than Wuffs' focus on file formats, |
| and also relies on a sophisticated theorem prover like Z3. |
| |
| Once again, we're not claiming that these other approaches are unworkable, or |
| not useful, just different, with different trade-offs. |
| |
| |
| ## Why Not ATS? |
| |
| ATS (Applied Type System) is a statically typed programming language that |
| unifies implementation with formal specification. Its goals sound similar to |
| Wuffs in the abstract, but they differ in their approach. That's not to say |
| that one is better or worse than the other, they're just different. |
| |
| Graydon Hoare, on the topic of ["Extended static checking (ESC), refinement |
| types, general dependent-typed |
| languages"](https://graydon2.dreamwidth.org/253769.html) says of existing |
| approaches, including ATS, "So far, most exercises in this space have ended in |
| frustration... the complexity wall here can make other areas of computering |
| [sic] look... a bit like child's play. It gets very dense, very fast." |
| |
| In contrast, Wuffs aims to be significantly simpler. The trade-off is, of |
| course, that Wuffs is not and does not aim to be a general purpose language. |
| ATS might be more powerful and therefore placed higher in the [Blub |
| Paradox](http://www.paulgraham.com/avg.html) analogy, but the flip side of that |
| is that, unlike Lisp, ATS is much more complicated to learn and to understand, |
| perhaps dauntingly so. For example, [its |
| manual](http://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML/book1.html) |
| mentions that types, sorts, views, viewtypes, dataviews, datatypes and |
| dataviewtypes are all similar-sounding but distinct ATS concepts. As of |
| February 2018, the [ATS home page](http://www.ats-lang.org/) says that "the |
| current implementation... [consists] of more than 180K lines of code" and that |
| "in general, one should expect to encounter many unfamiliar programming |
| concepts and features in ATS and be prepared to spend a substantial amount of |
| time on learning them." In the [words of one |
| commentator](https://www.reddit.com/r/rust/comments/7d85sj/puffs_a_new_language_for_parsing_untrusted_files/dpx0hp7/), |
| "the cognitive overhead of managing your proof-threading in ATS is much higher |
| than managing your lifetimes/borrows in Rust, which is by far the biggest cliff |
| of Rust's learning curve. Not to mention ATS is syntactically... challenging". |
| |
| On verification, [ATS' Examples](http://www.ats-lang.org/Examples.html) says |
| that "A fully verified implementaion of the [Fibonacci] function in ATS can now |
| be given... Please click |
| [here](http://www.ats-lang.org/SERVER/MYCODE/Patsoptaas_serve.php?mycode_fil=fibats) |
| if you are interested in compiling and running this example on-line." Modifying |
| that example's "N" value from 10 to 10000 yields "fibats(10000) = Infinity", |
| which is clearly incorrect. Somewhere along the way, there's been a breakdown |
| between numbers in the ideal mathematical sense (often used by proof systems) |
| and numbers in practice (whether fixed width integers or floating point) as |
| actually computed on by CPUs. Regardless of where that breakdown is in this |
| case, it doesn't build confidence that, in general, ATS programs are guaranteed |
| free of arithmetic overflow despite compiler-verified proofs. |
| |
| Once again, we're not claiming that these other approaches are unworkable, or |
| not useful, just different, with different trade-offs. |
| |
| |
| ## Why Not Rust? |
| |
| Rust is a general purpose programming language, which aims for C-like |
| performance in general. Wuffs aims for C-like performance specifically for the |
| limited problem space of bits-and-bytes level CPU-intensive computation (while |
| doing that safely). As one data point, a [GIF decoding |
| benchmark](https://github.com/google/wuffs/commit/9a463d46) measures Wuffs as |
| 4x faster than two separate Rust implementations, including the one that |
| [crates.io calls "gif"](https://crates.io/crates/gif), and one based on Nom |
| (discussed below). |
| |
| Wuffs also transpiles to standard C99 code with no dependencies, and should run |
| anywhere that has a C99 compliant C compiler. An existing C/C++ project, such |
| as the Chromium web browser, can choose to replace e.g. libpng with Wuffs PNG, |
| without needing any additional toolchains. Sure, languages like D and Rust have |
| great binary-level interop with C code, and Mozilla are reporting progress with |
| parsing media formats in Rust, but it's still an operational hurdle to grow a |
| project's build process and to assess build times and binary sizes. |
| |
| [Nom](https://github.com/Geal/nom) is a parser combinator library in Rust, |
| described in ["Writing parsers like it is |
| 2017"](http://spw17.langsec.org/papers/chifflier-parsing-in-2017.pdf). Wuffs |
| differs from nom by itself, in that Wuffs is an end to end implementation, not |
| limited to that part of a file format that is easily expressible as a formal |
| grammar. In particular, it also handles entropy encodings such as LZW (for |
| GIF), ZLIB (for PNG) and Huffman (for JPEG, TODO). Wuffs differs from nom |
| combined with other Rust code (e.g. a Rust LZW implementation) in that bounds |
| and overflow checks are not just ubiquitous but also completely eliminated at |
| compile time. |
| |
| [Kaitai Struct](http://kaitai.io/) is in a similar space, generating safe |
| parsers for multiple target programming languages from one declarative |
| specification. Again, Wuffs differs in that it is a complete (and performant) |
| end to end implementation, not just for the structured parts of a file format. |
| Repeating a point in the previous paragraph, the difficulty in decoding the GIF |
| format isn't in the regularly-expressible part of the format, it's in the LZW |
| compression. [Kaitai's GIF parser](http://formats.kaitai.io/gif/index.html) |
| returns the compressed LZW data as an opaque blob. |
| |
| Once again, we're not claiming that these other approaches are unworkable, or |
| not useful, just different, with different trade-offs. |
| |
| |
| --- |
| |
| Updated on February 2018. |