| ============== |
| Build Engine |
| ============== |
| |
| This document currently only provides additional information on some aspects of |
| the build engine design. See the source code documentation comments for |
| information on the actual ``BuildEngine`` APIs. |
| |
| Order-only Dependencies |
| ======================= |
| |
| The build engine supports "order-only" dependencies (in the style of Ninja) |
| through a "must-follow" input request or barrier. Internally, this is treated |
| much the same as a normal input request, in that the task is responsible for |
| calling ``taskMustFollow()`` on the engine during its preparation phase. |
| |
| However, these input requests do not need to be scanned or persisted. This is |
| because the dependency is only relevant when the dependee is executing, and in |
| those cases the task will be responsible for dynamically supplying the |
| barrier. Thus, the implementation can piggy-back on the existing mechanisms for |
| ensuring input requests are available prior to executing the task, and simply |
| discard the order-only input request once satisfied. |
| |
| Parallelism |
| =========== |
| |
| There are two obvious approaches to introducing parallelism into the core build |
| engine. |
| |
| 1. The first approach (which is taken currently) changes the task definition so |
| that it gets a callback ``inputsAvailable()``, and adds a new task API on the |
| ``BuildEngine`` called ``taskIsComplete()`` which is allowed to be called |
| concurrently. |
| |
| When the task gets all of its inputs, then it is responsible for dispatching |
| its work in a parallel fashion and informing the engine when that is |
| complete. The engine tracks the task as pending completion until it received |
| that callback. |
| |
| This keeps the engine simple, as it doesn't really deal with concurrency at |
| all except for in specific cases that should be easy to reason about. It also |
| makes the engine separable from the specific concurrency model, which may |
| make it easier to plug in schedulers and the like (for example, it is |
| somewhat clear how a posix_spawn + ppoll/pselect concurrency model works in |
| this system). |
| |
| 2. The alternate approach is to change the engine so that it manages all of the |
| concurrency, changing the Task contract such that it stays synchronous but |
| can be called on any thread. |
| |
| This means that all clients of the engine get concurrency automatically, but |
| also more intimately ties the engine implementation to the concurrency model. |
| |
| |
| Dependency Scanning |
| =================== |
| |
| The engine is designed to scan dependencies concurrently with other build |
| work. This introduces a fair amount of complexity into the engine, but is |
| important because we want to ensure that the engine is able to dispatch any |
| identified tasks as soon they are found. |
| |
| Recursive Dependency Scanning |
| ----------------------------- |
| |
| The current design of the engine handles dependency scanning in the same manner |
| as other tasks, it associates a scan request with a rule and then enqueues the |
| actual scanning work so that it can proceed in parallel with other work. |
| |
| Currently, the dependency scanning processes each input for an individual rule |
| in sequence, and it suspends itself when it is waiting on an individual input to |
| complete. This is made more complex by the need to sometimes execute a task |
| before being able to determine that its downstream clients need to rerun (when |
| the result has not changed). To manage that, the engine tracks for each task |
| which other tasks are waiting for it to complete as well as which tasks are |
| waiting for it to be scanned. |
| |
| The current implementation is not yet capable of scanning multiple inputs for a |
| particular task concurrently, because when doing it is necessary to track |
| additional state about whether or not the input was *actually* requested by a |
| task or just a dependency present on a previous iteration. For example, consider |
| the case where a task depended upon **A** and **B** in the last build. If a |
| concurrent scan indicates the task should rerun because **B** has changed, it is |
| not necessarily true that **B** itself should rerun (because the task may no |
| longer request **B** as an input). The current implementation side-steps this |
| problem by scanning dependencies one at a time. |