blob: 99b8b0f6cf842bbb441b2ee225bb2b559a706c87 [file] [log] [blame]
==============
Build Engine
==============
This document currently only provides additional information on some aspects of
the build engine design. See the source code documentation comments for
information on the actual ``BuildEngine`` APIs.
Order-only Dependencies
=======================
The build engine supports "order-only" dependencies (in the style of Ninja)
through a "must-follow" input request or barrier. Internally, this is treated
much the same as a normal input request, in that the task is responsible for
calling ``taskMustFollow()`` on the engine during its preparation phase.
However, these input requests do not need to be scanned or persisted. This is
because the dependency is only relevant when the dependee is executing, and in
those cases the task will be responsible for dynamically supplying the
barrier. Thus, the implementation can piggy-back on the existing mechanisms for
ensuring input requests are available prior to executing the task, and simply
discard the order-only input request once satisfied.
Parallelism
===========
There are two obvious approaches to introducing parallelism into the core build
engine.
1. The first approach (which is taken currently) changes the task definition so
that it gets a callback ``inputsAvailable()``, and adds a new task API on the
``BuildEngine`` called ``taskIsComplete()`` which is allowed to be called
concurrently.
When the task gets all of its inputs, then it is responsible for dispatching
its work in a parallel fashion and informing the engine when that is
complete. The engine tracks the task as pending completion until it received
that callback.
This keeps the engine simple, as it doesn't really deal with concurrency at
all except for in specific cases that should be easy to reason about. It also
makes the engine separable from the specific concurrency model, which may
make it easier to plug in schedulers and the like (for example, it is
somewhat clear how a posix_spawn + ppoll/pselect concurrency model works in
this system).
2. The alternate approach is to change the engine so that it manages all of the
concurrency, changing the Task contract such that it stays synchronous but
can be called on any thread.
This means that all clients of the engine get concurrency automatically, but
also more intimately ties the engine implementation to the concurrency model.
Dependency Scanning
===================
The engine is designed to scan dependencies concurrently with other build
work. This introduces a fair amount of complexity into the engine, but is
important because we want to ensure that the engine is able to dispatch any
identified tasks as soon they are found.
Recursive Dependency Scanning
-----------------------------
The current design of the engine handles dependency scanning in the same manner
as other tasks, it associates a scan request with a rule and then enqueues the
actual scanning work so that it can proceed in parallel with other work.
Currently, the dependency scanning processes each input for an individual rule
in sequence, and it suspends itself when it is waiting on an individual input to
complete. This is made more complex by the need to sometimes execute a task
before being able to determine that its downstream clients need to rerun (when
the result has not changed). To manage that, the engine tracks for each task
which other tasks are waiting for it to complete as well as which tasks are
waiting for it to be scanned.
The current implementation is not yet capable of scanning multiple inputs for a
particular task concurrently, because when doing it is necessary to track
additional state about whether or not the input was *actually* requested by a
task or just a dependency present on a previous iteration. For example, consider
the case where a task depended upon **A** and **B** in the last build. If a
concurrent scan indicates the task should rerun because **B** has changed, it is
not necessarily true that **B** itself should rerun (because the task may no
longer request **B** as an input). The current implementation side-steps this
problem by scanning dependencies one at a time.