| // Copyright 2019 The LUCI Authors. |
| // |
| // Licensed under the Apache License, Version 2.0 (the "License"); |
| // you may not use this file except in compliance with the License. |
| // You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| |
| // Package luciexe documents the "LUCI Executable" protocol, and contains |
| // constants which are part of this protocol. |
| // |
| // # Summary |
| // |
| // A LUCI Executable ("luciexe") is a binary which implements a protocol to: |
| // |
| // - Pass the initial state of the 'build' from a parent process to the luciexe. |
| // - Understand the build's local system contracts (like the location of cached data). |
| // - Asynchronously update the state of the build as it runs. |
| // |
| // This protocol is recursive; A luciexe can run another luciexe such that the |
| // child's updates are reflected on the parent's output. |
| // |
| // The protocol has 3 parts: |
| // |
| // - Host Application - This sits at the top of the luciexe process |
| // hierarchy and sets up singleton environmental requirements for the whole |
| // tree. |
| // - Invocation of a luciexe binary - This invocation process occurs both |
| // for the topmost luciexe, as well as all internal invocations of other |
| // luciexe's within the process hierarchy. |
| // - The luciexe binary - The binary has a couple of responsibilities to be |
| // compatible with this protocol. Once the binary has fulfilled it's |
| // responsibilities it's free to do what it wants (i.e. actually do its |
| // task). |
| // |
| // In general, we strive where possible to minimize the complexity of the |
| // luciexe binary. This is because we expect to have a small number of 'host |
| // application' implementations, and a relatively large number of 'luciexe' |
| // implementations. |
| // |
| // # The Host Application |
| // |
| // At the root of every tree of luciexe invocations there is a 'host' |
| // application which sets up an manages all environmental singletons (like the |
| // Logdog 'butler' service, the LUCI ambient authentication service, etc.). This |
| // Host Application is responsible for intercepting and merging all |
| // 'build.proto' streams emitted within this tree of luciexes (see "Recursive |
| // Invocation"). The Host Application may choose what happens to these |
| // intercepted build.proto messages. |
| // |
| // The Host Application MUST: |
| // - Run a logdog butler service and expose all relevant LOGDOG_* environment |
| // variables such that the following client libraries can stream log data: |
| // - Golang: go.chromium.org/luci/logdog/client/butlerlib/bootstrap |
| // - Python: infra_libs.logdog.bootstrap |
| // - Hook the butler to intercept and merge build.proto streams into a single |
| // build.proto (zlib-compressed) stream. |
| // - Set up a local LUCI ambient authentication service which luciexe's can |
| // use to mint auth tokens. |
| // - Prepare an empty directory which will house tempdirs and workdirs for |
| // all luciexe invocations. The Host Application MAY clean this directory |
| // up, but it may be useful to leak it for debugging. It's permissible for |
| // the Host Application to defer this cleanup to an external process (e.g. |
| // buildbucket's agent may defer this to swarming). |
| // |
| // The Host Application MAY hook additional streams for debugging/logging; it is |
| // frequently convenient to hook the stderr/stdout streams from the top level |
| // luciexe and tee them to the Host Application's stdout/stderr. |
| // |
| // For example: the `go.chromium.org/luci/buildbucket/cmd/agent` binary forwards |
| // these merged build.proto messages to the Buildbucket service, and also |
| // uploads all streams to them to the Logdog cloud service. Other host |
| // implementations may instead choose to write all streams to disk, send them to |
| // /dev/null or render them as html. However, from the point of view of the |
| // luciexe that they run, this is transparent. |
| // |
| // Host Applications MAY implement 'backpressure' on the luciexe binaries by |
| // throttling the rate at which the Logdog butler accepts data on its various |
| // streams. However, doing this could introduce timing issues in the luciexe |
| // binaries as they try to run, so this should be done thoughtfully. |
| // |
| // If a Host Application detects a protocol violation from a luciexe within its |
| // purview, it SHOULD report the violation (in a manner of its choosing) and |
| // MUST consider the entire Build status to be INFRA_FAILURE. In addition the |
| // Host Application SHOULD attempt to kill (via process group SIGTERM/SIGKILL on |
| // *nix, and CTRL+BREAK/Terminate on windows) the luciexe hierarchy. The host |
| // application MAY provide a window of time between the initial "please stop" |
| // signal and the "you die now" signal, but this comes with the usual caveats of |
| // cleanups and deadlines (notably: best-effort clean up is just that: |
| // best-effort. It cannot be relied on to run to completion (or to run |
| // completely)). |
| // |
| // # Invocation |
| // |
| // When invoking a luciexe, the parent process has a couple responsibilities. It |
| // must: |
| // |
| // - Set $TEMPDIR, $TMPDIR, $TEMP, $TMP and $MAC_CHROMIUM_TMPDIR to all point |
| // to the same, empty directory. |
| // This directory MUST be located on the same file system as CWD. |
| // This directory MUST NOT be the same as CWD. |
| // |
| // - Set $LUCI_CONTEXT["luciexe"]["cache_dir"] to a cache dir which makes sense |
| // for the luciexe. |
| // The cache dir MAY persist/be shared between luciexe invocations. |
| // The cache dir MAY NOT be on the same filesystem as CWD. |
| // |
| // - Set the $LOGDOG_NAMESPACE to a prefix which namespaces all logdog streams |
| // generated from the luciexe. |
| // |
| // - Set the Status of initial buildbucket.v2.Build message to `STARTED`. |
| // |
| // - Set the CreateTime and StartTime of initial buildbucket.v2.Build message. |
| // |
| // - Clear following fields in the initial buildbucket.v2.Build message. |
| // |
| // - EndTime |
| // |
| // - Output |
| // |
| // - StatusDetails |
| // |
| // - Steps |
| // |
| // - SummaryMarkdown |
| // |
| // - UpdateTime |
| // |
| // The CWD is up to your application. Some contexts (like Buildbucket) will |
| // guarantee an empty CWD, but others (like recursive invocation) may explicitly |
| // share CWD between multiple luciexe's. |
| // |
| // The tempdir and workdir paths SHOULD NOT be cleaned up by the invoking |
| // process. Instead, the invoking process should defer to the Host Application |
| // to provide this cleanup, since the Host Application may be configured to leak |
| // these for debugging purposes. |
| // |
| // The invoker MUST attach the stdout/stderr to the logdog butler as text |
| // streams. These MUST be located at `$LOGDOG_NAMESPACE/std{out,err}`. |
| // Typical luciexe implementations will use these for debug logging and output, |
| // but are not required to do so. |
| // |
| // The invoker MUST write a binary-encoded buildbucket.v2.Build to the stdin of |
| // the luciexe which contains all the input parameters that the luciexe needs to |
| // know to run successfully. |
| // |
| // # The luciexe binary |
| // |
| // Once running, the luciexe MUST read a binary-encoded buildbucket.v2.Build |
| // message from stdin until EOF. |
| // |
| // As per the invoker's responsibility, the luciexe binary MAY expect the |
| // status of the initial Build message to be `STARTED` and CreateTime and |
| // StartTime are populated. It MAY also expect following fields to be empty. |
| // It MUST NOT assume other fields in the Build message are set. However, the |
| // Host Application or invoker MAY fill in other fields they think are useful. |
| // |
| // EndTime |
| // Output |
| // StatusDetails |
| // Steps |
| // SummaryMarkdown |
| // Tags |
| // UpdateTime |
| // |
| // As per the Host Application's responsibilities, the luciexe binary MAY expect |
| // the "luciexe" and "local_auth" sections of LUCI_CONTEXT to be filled. Other |
| // sections of LUCI_CONTEXT MAY also be filled. See the LUCI_CONTEXT docs: |
| // https://chromium.googlesource.com/infra/luci/luci-py/+/HEAD/client/LUCI_CONTEXT.md |
| // |
| // !!NOTE!! The paths supplied to the luciexe MUST NOT be considered stable |
| // across invocations. Do not hard-code these, and try not to rely on their |
| // consistency (e.g. for build reproducibility). |
| // |
| // # The luciexe binary - Updating the Build state |
| // |
| // A luciexe MAY update the Build state by writing to a "build.proto" Logdog |
| // stream named "$LOGDOG_NAMESPACE/build.proto". A "build.proto" Logdog stream |
| // is defined as: |
| // |
| // Content-Type: "application/luci+proto; message=buildbucket.v2.Build" |
| // Type: Datagram |
| // |
| // Additionally, a build.proto stream MAY append "; encoding=zlib" to the |
| // Content-Type (and compress each message accordingly). This is useful for when |
| // you potentially have very large builds. |
| // |
| // Each datagram MUST be a valid binary-encoded buildbucket.v2.Build message. |
| // The state of the build is defined as the last Build message sent on this |
| // stream. There's no implicit accumulation between sent Build messages. |
| // |
| // The Step.Log.Url field in the emitted Build messages can be either absolute |
| // (has a valid url scheme) or relative. If an absolute `Url` is provided, the |
| // luciexe is also responsible for supplying corresponding `ViewUrl` and the |
| // host application won't validate or make any adjustments to those urls. This |
| // allows luciexe to include logs from other sources into this build. If a |
| // relative `Url` is provided, the host application will namespace the `Url` |
| // with $LOGDOG_NAMESPACE of the build.proto stream. For example, if the host |
| // application is parsing a Build.proto in a stream named |
| // "logdog://host/project/prefix/+/something/build.proto", then a Log with a Url |
| // of "hello/world/stdout" will be transformed into: |
| // |
| // Url: logdog://host/project/prefix/+/something/hello/world/stdout |
| // ViewUrl: <implementation defined> |
| // |
| // The `ViewUrl` field in this case SHOULD be left empty, and will be filled in |
| // by the host application running the luciexe (if supplied it will be |
| // overwritten). |
| // |
| // The following Build fields will be read from the luciexe-controlled |
| // build.proto stream: |
| // |
| // EndTime |
| // Output |
| // Status |
| // StatusDetails |
| // Steps |
| // SummaryMarkdown |
| // Tags |
| // UpdateTime |
| // |
| // # The luciexe binary - Reporting final status |
| // |
| // A luciexe MUST report its success/failure by sending a Build message with |
| // a terminal `status` value before exiting. If the luciexe exits before sending |
| // a Build message with a terminal Status, the invoking application MUST |
| // interpret this as an INFRA_FAILURE status. The exit code of the luciexe |
| // SHOULD be ignored, except for advisory (logging/reporting) purposes. The host |
| // application MUST detect this case and fill in a final status of |
| // INFRA_FAILURE, but MUST NOT terminate the process hierarchy in this case. |
| // |
| // # Recursive Invocation |
| // |
| // To support recursive invocation, a luciexe MUST accept the flag: |
| // |
| // --output=path/to/file.{pb,json,textpb} |
| // |
| // The value of this flag MUST be an absolute path to a non-existent file in |
| // an existing directory. The extension of the file dictates the data format |
| // (binary, json or text protobuf). The luciexe MUST write it's final Build |
| // message to this file in the correct format. If `--output` is specified, |
| // but no Build message (or an invalid/improperly formatted Build message) |
| // is written, the caller MUST interpret this as an INFRA_FAILURE status. |
| // |
| // NOTE: JSON outputs SHOULD be written with the original proto field names, |
| // not the lowerCamelCase names; downstream users may not be using jsonpb |
| // unmarshallers to interpret the JSON data. |
| // |
| // This may need to be revised in a subsequent version of this API |
| // specification. |
| // |
| // LUCI Executables MAY invoke other LUCI Executables as sub-steps and have the |
| // Steps from the child luciexe show in the parent's Build updates. This is one |
| // of the responsibilities of the Host Application. |
| // |
| // The parent can achieve this by recording a Step S (with no children), and |
| // a Step.MergeBuild.from_logdog_stream which points to a "build.proto" stream |
| // (see "Updating the Build State"). This is called a "Merge Step", and is |
| // a directive for the host to merge Build messages from in the |
| // from_logdog_stream log here. Note that this stream name should follow the |
| // same provisions specified for "Step.Log.Url". It's the invoker's |
| // responsibility to populate $LOGDOG_NAMESPACE with the new, full, namespace. |
| // Failure to do this will result in missing logs/step data. |
| // |
| // (SIDENOTE: There's an internal proposal to improve the Logdog "butler" API so |
| // that application code only needs to handle the relative namespaces as well, |
| // which would make this much less confusing). |
| // |
| // The Host Application MUST append all steps from the child build.proto |
| // stream to the parent build as substeps of step S and copy the following |
| // fields of the child Build to the equivalent fields of step S *only if* |
| // step S has *non-final* status. It is the caller's responsibility to populate |
| // rest of the fields of step S if the caller explicitly marks the step |
| // status as final. |
| // |
| // SummaryMarkdown |
| // Status |
| // EndTime |
| // Output.Logs (appended) |
| // |
| // This rule applies recursively, i.e. the child build MAY have Merge Step(s). |
| // |
| // Each luciexe's step names should be emitted as relative names. e.g. say |
| // a build runs a sub-luciexe with the name "a|b". This sub-luciexe then runs |
| // a step "x|y|z". The top level build.proto stream will show the step |
| // "a|b|x|y|z". |
| // |
| // The graph of datagram streams MUST be a tree. This follows from the |
| // namespacing rules of the Log.Url fields; Since Log.Url fields are relative to |
| // their build's namespace, it's only possible to have a merge step point |
| // further 'down' the tree, making it impossible to create a cycle. |
| // |
| // # Recursive Invocation - Legacy ChromeOS style |
| // |
| // Note that there's an option in MergeBuild called 'legacy_global_namespace'. |
| // This mode is NOT RECOMMENDED, and was only added to support legacy ChromeOS |
| // builders which relied on this functionality prior to luciexe. |
| // |
| // When this option is turned on, output properties on the build stream will be |
| // merged into the top-level build containing the Merge Step, and it will also |
| // de-namespace the step names (i.e. a MergeStep called "Parent" whose |
| // build.proto stream contains a name "Step" would normally show on the build as |
| // "Parent|Step", will instead simply show as "Step" in this mode. This also |
| // means that these merge step processes must make sure not to emit a step which |
| // duplicates the name of one emitted in the parent build (otherwise the overall |
| // build will be invalid)). |
| // |
| // Using this functionality has the downside that properties emitted by the |
| // parent build WILL BE OVERWRITTEN by the merged build stream, if written to |
| // the same top-level property keys. In ChromeOS's case this is "OK" because the |
| // legacy recipes do a short bootstrap followed by the main payload which does |
| // all the actual stuff. |
| // |
| // There's a version of this feature which would be more-supportable, which |
| // would be to introduce namespaced outputs so that the top-level build could |
| // reflect the outputs from a given step (i.e. allow Step to actually have its |
| // own Output message). If you feel like you need this functionality, please |
| // contact ChOps Foundation team to discuss, rather than trying to use the |
| // legacy_global_namespace option. |
| // |
| // # Related Libraries |
| // |
| // For implementation-level details, please refer to the following: |
| // |
| // - "go.chromium.org/luci/luciexe" - low-level protocol details (this module). |
| // - "go.chromium.org/luci/luciexe/host" - the Host Application library. |
| // - "go.chromium.org/luci/luciexe/invoke" - luciexe invocation. |
| // - "go.chromium.org/luci/luciexe/exe" - luciexe binary helper library. |
| // |
| // # Other Client Implementations |
| // |
| // Python Recipes (https://chromium.googlesource.com/infra/luci/recipes-py) |
| // implement the LUCI Executable protocol using the "luciexe" subcommand. |
| // |
| // TODO(iannucci): Implement a luciexe binary helper in `infra_libs` analogous |
| // to go.chromium.org/luci/luciexe/exe and implement Recipes' support in terms |
| // of this. |
| // |
| // # LUCI Executables on Buildbucket |
| // |
| // Buildbucket accepts LUCI Executables as CIPD packages containing the |
| // luciexe to run with the fixed name of "luciexe". |
| // |
| // On Windows, this may be named "luciexe.exe" or "luciexe.bat". Buildbucket |
| // will prefer the first of these which it finds. |
| // |
| // Note that the `recipes.py bundle` command generates a "luciexe" wrapper |
| // script for compatibility with Buildbucket. |
| package luciexe |