docs/buildengine.rst - third_party/swift-llbuild - Git at Google

 ==============
  Build Engine
 ==============

 This document currently only provides additional information on some aspects of
 the build engine design. See the source code documentation comments for
 information on the actual ``BuildEngine`` APIs.

 Order-only Dependencies
 =======================

 The build engine supports "order-only" dependencies (in the style of Ninja)
 through a "must-follow" input request or barrier. Internally, this is treated
 much the same as a normal input request, in that the task is responsible for
 calling ``taskMustFollow()`` on the engine during its preparation phase.

 However, these input requests do not need to be scanned or persisted. This is
 because the dependency is only relevant when the dependee is executing, and in
 those cases the task will be responsible for dynamically supplying the
 barrier. Thus, the implementation can piggy-back on the existing mechanisms for
 ensuring input requests are available prior to executing the task, and simply
 discard the order-only input request once satisfied.

 Parallelism
 ===========

 There are two obvious approaches to introducing parallelism into the core build
 engine.

 1. The first approach (which is taken currently) changes the task definition so
    that it gets a callback ``inputsAvailable()``, and adds a new task API on the
    ``BuildEngine`` called ``taskIsComplete()`` which is allowed to be called
    concurrently.

    When the task gets all of its inputs, then it is responsible for dispatching
    its work in a parallel fashion and informing the engine when that is
    complete. The engine tracks the task as pending completion until it received
    that callback.

    This keeps the engine simple, as it doesn't really deal with concurrency at
    all except for in specific cases that should be easy to reason about. It also
    makes the engine separable from the specific concurrency model, which may
    make it easier to plug in schedulers and the like (for example, it is
    somewhat clear how a posix_spawn + ppoll/pselect concurrency model works in
    this system).

 2. The alternate approach is to change the engine so that it manages all of the
    concurrency, changing the Task contract such that it stays synchronous but
    can be called on any thread.

    This means that all clients of the engine get concurrency automatically, but
    also more intimately ties the engine implementation to the concurrency model.


 Dependency Scanning
 ===================

 The engine is designed to scan dependencies concurrently with other build
 work. This introduces a fair amount of complexity into the engine, but is
 important because we want to ensure that the engine is able to dispatch any
 identified tasks as soon they are found.

 Recursive Dependency Scanning
 -----------------------------

 The current design of the engine handles dependency scanning in the same manner
 as other tasks, it associates a scan request with a rule and then enqueues the
 actual scanning work so that it can proceed in parallel with other work.

 Currently, the dependency scanning processes each input for an individual rule
 in sequence, and it suspends itself when it is waiting on an individual input to
 complete. This is made more complex by the need to sometimes execute a task
 before being able to determine that its downstream clients need to rerun (when
 the result has not changed). To manage that, the engine tracks for each task
 which other tasks are waiting for it to complete as well as which tasks are
 waiting for it to be scanned.

 The current implementation is not yet capable of scanning multiple inputs for a
 particular task concurrently, because when doing it is necessary to track
 additional state about whether or not the input was *actually* requested by a
 task or just a dependency present on a previous iteration. For example, consider
 the case where a task depended upon **A** and **B** in the last build. If a
 concurrent scan indicates the task should rerun because **B** has changed, it is
 not necessarily true that **B** itself should rerun (because the task may no
 longer request **B** as an input). The current implementation side-steps this
 problem by scanning dependencies one at a time.
	==============
	Build Engine
	==============

	This document currently only provides additional information on some aspects of
	the build engine design. See the source code documentation comments for
	information on the actual ``BuildEngine`` APIs.

	Order-only Dependencies
	=======================

	The build engine supports "order-only" dependencies (in the style of Ninja)
	through a "must-follow" input request or barrier. Internally, this is treated
	much the same as a normal input request, in that the task is responsible for
	calling ``taskMustFollow()`` on the engine during its preparation phase.

	However, these input requests do not need to be scanned or persisted. This is
	because the dependency is only relevant when the dependee is executing, and in
	those cases the task will be responsible for dynamically supplying the
	barrier. Thus, the implementation can piggy-back on the existing mechanisms for
	ensuring input requests are available prior to executing the task, and simply
	discard the order-only input request once satisfied.

	Parallelism
	===========

	There are two obvious approaches to introducing parallelism into the core build
	engine.

	1. The first approach (which is taken currently) changes the task definition so
	that it gets a callback ``inputsAvailable()``, and adds a new task API on the
	``BuildEngine`` called ``taskIsComplete()`` which is allowed to be called
	concurrently.

	When the task gets all of its inputs, then it is responsible for dispatching
	its work in a parallel fashion and informing the engine when that is
	complete. The engine tracks the task as pending completion until it received
	that callback.

	This keeps the engine simple, as it doesn't really deal with concurrency at
	all except for in specific cases that should be easy to reason about. It also
	makes the engine separable from the specific concurrency model, which may
	make it easier to plug in schedulers and the like (for example, it is
	somewhat clear how a posix_spawn + ppoll/pselect concurrency model works in
	this system).

	2. The alternate approach is to change the engine so that it manages all of the
	concurrency, changing the Task contract such that it stays synchronous but
	can be called on any thread.

	This means that all clients of the engine get concurrency automatically, but
	also more intimately ties the engine implementation to the concurrency model.


	Dependency Scanning
	===================

	The engine is designed to scan dependencies concurrently with other build
	work. This introduces a fair amount of complexity into the engine, but is
	important because we want to ensure that the engine is able to dispatch any
	identified tasks as soon they are found.

	Recursive Dependency Scanning
	-----------------------------

	The current design of the engine handles dependency scanning in the same manner
	as other tasks, it associates a scan request with a rule and then enqueues the
	actual scanning work so that it can proceed in parallel with other work.

	Currently, the dependency scanning processes each input for an individual rule
	in sequence, and it suspends itself when it is waiting on an individual input to
	complete. This is made more complex by the need to sometimes execute a task
	before being able to determine that its downstream clients need to rerun (when
	the result has not changed). To manage that, the engine tracks for each task
	which other tasks are waiting for it to complete as well as which tasks are
	waiting for it to be scanned.

	The current implementation is not yet capable of scanning multiple inputs for a
	particular task concurrently, because when doing it is necessary to track
	additional state about whether or not the input was actually requested by a
	task or just a dependency present on a previous iteration. For example, consider
	the case where a task depended upon A and B in the last build. If a
	concurrent scan indicates the task should rerun because B has changed, it is
	not necessarily true that B itself should rerun (because the task may no
	longer request B as an input). The current implementation side-steps this
	problem by scanning dependencies one at a time.