If you wish to expand on this document, or have a more experienced Rust contributor add anything else to it, please get in touch:
or file a bug:
https://github.com/rust-lang/rust/issues
Your concerns are probably the same as someone else's.
You may also be interested in the Rust Forge, which includes a number of interesting bits of information.
Finally, at the end of this file is a GLOSSARY defining a number of common (and not necessarily obvious!) names that are used in the Rust compiler code. If you see some funky name and you'd like to know what it stands for, check there!
Rustc consists of a number of crates, including syntax
, rustc
, rustc_back
, rustc_trans
, rustc_driver
, and many more. The source for each crate can be found in a directory like src/libXXX
, where XXX
is the crate name.
(NB. The names and divisions of these crates are not set in stone and may change over time -- for the time being, we tend towards a finer-grained division to help with compilation time, though as incremental improves that may change.)
The dependency structure of these crates is roughly a diamond:
rustc_driver / | \ / | \ / | \ / v \ rustc_trans rustc_borrowck ... rustc_metadata \ | / \ | / \ | / \ v / rustc | v syntax / \ / \ syntax_pos syntax_ext
The rustc_driver
crate, at the top of this lattice, is effectively the “main” function for the rust compiler. It doesn't have much “real code”, but instead ties together all of the code defined in the other crates and defines the overall flow of execution. (As we transition more and more to the query model, however, the “flow” of compilation is becoming less centrally defined.)
At the other extreme, the rustc
crate defines the common and pervasive data structures that all the rest of the compiler uses (e.g., how to represent types, traits, and the program itself). It also contains some amount of the compiler itself, although that is relatively limited.
Finally, all the crates in the bulge in the middle define the bulk of the compiler -- they all depend on rustc
, so that they can make use of the various types defined there, and they export public routines that rustc_driver
will invoke as needed (more and more, what these crates export are “query definitions”, but those are covered later on).
Below rustc
lie various crates that make up the parser and error reporting mechanism. For historical reasons, these crates do not have the rustc_
prefix, but they are really just as much an internal part of the compiler and not intended to be stable (though they do wind up getting used by some crates in the wild; a practice we hope to gradually phase out).
Each crate has a README.md
file that describes, at a high-level, what it contains, and tries to give some kind of explanation (some better than others).
The Rust compiler is in a bit of transition right now. It used to be a purely “pass-based” compiler, where we ran a number of passes over the entire program, and each did a particular check of transformation.
We are gradually replacing this pass-based code with an alternative setup based on on-demand queries. In the query-model, we work backwards, executing a query that expresses our ultimate goal (e.g., “compile this crate”). This query in turn may make other queries (e.g., “get me a list of all modules in the crate”). Those queries make other queries that ultimately bottom out in the base operations, like parsing the input, running the type-checker, and so forth. This on-demand model permits us to do exciting things like only do the minimal amount of work needed to type-check a single function. It also helps with incremental compilation. (For details on defining queries, check out src/librustc/ty/maps/README.md
.)
Regardless of the general setup, the basic operations that the compiler must perform are the same. The only thing that changes is whether these operations are invoked front-to-back, or on demand. In order to compile a Rust crate, these are the general steps that we take:
.rs
files and produces the AST (“abstract syntax tree”)syntax/ast.rs
. It is intended to match the lexical syntax of the Rust language quite closely.#[cfg]
nodes, and hence may strip things out of the AST as well.src/librustc/hir/
; that module also includes the lowering code.((1 + 2) + 3)
and 1 + 2 + 3
parse into distinct trees, even though they are equivalent. In the HIR, however, parentheses nodes are removed, and those two expressions are represented in the same way.x.f
-- we can‘t know what field f
is being accessed until we know the type of x
) and associated type references (T::Item
-- we can’t know what type Item
is until we know what T
is).TypeckTables
) that include the types of expressions, the way to resolve methods, and so forth..o
files (one for each “codegen unit”)..o
files are linked together.The compiler uses a number of...idiosyncratic abbreviations and things. This glossary attempts to list them and give you a few pointers for understanding them better.
syntax
crate; reflects user syntax very closely.DefId
-- an index identifying a definition (see librustc/hir/def_id.rs
). Uniquely identifies a DefPath
.librustc/hir
.HirId
-- identifies a particular node in the HIR by combining a def-id with an “intra-definition offset”.'gcx
-- the lifetime of the global arena (see librustc/ty
).librustc/infer
)src/librustc/mir/
module, but much of the code that manipulates it is found in src/librustc_mir
.librustc/traits
.NodeId
-- an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with HirId
.librustc/maps
.librustc/maps
.Span
datatype for more.i32, u32
in HashMap<i32, u32>
)librustc/ty
).librustc/ty
).librustc/ty
).