tree: 7bf49a5c9f753dbd2fdd51464b0d4bee02bc9f2f [path history] [tgz]

codifier/README.md

Codifier

Codifier is a bulk update tool for GN build files.

It uses operator chaining to create readable code, considerably reducing the boilerplate code that would otherwise be needed. For example, the following snippet reads a build file, then extracts a “tests” clause from which it reads the dependencies and picks one that isn't listed in a skiplist file:

  p := NewProcFromFile(bringupFilename).ScopeNamed("tests").ExtractList("deps").
         PickUnskiplisted("deps", skiplistfilename, "target")

The selected dependency is stored in the Proc p under the name “target”.

If there are any errors, they will be logged and p will be nil. If more precise error checking is needed, the chain can be shortened and p can be checked earlier.

This design allows developers to program by intention: Chains of operators can be created expressing intentions at a task level. The snippet above, for example, expresses the task “Find a target that's not in the skiplist.”

As shown, none of the intermediate operator‘s errors are checked. So if the final p is nil, we have task-level semantics: p==nil means that a target that’s not in the skiplist could not be found.

Codifier is a tool for bulk updates, so this design facilitates iterative development until the bulk update succeeds. Codifier will change code, build it, run tests, and then commit the successful changes using Git. The final result is a complete CL ready for upload and review.

  p = p.DoFnWith("target", convertBootfsTests).ApplyChanges().
          WriteFile().GNFormat().Build().Test(newPackageKey, false)
  p = p.CommitOrRestore(commitMsg, amend).AddToSkiplist("target", skiplistfilename)

This snippet illustrates a few techniques. First, note that developers can write their own functions to modify the text extracted by Codifier using DoFn() and DoFnWith() operators. Next, note that the CommitOrRestore function tracks whether the build and test operators succeeded, so it can commit successful changes or remove all the changes associated with the current set of operations. In this way, Codifier can operate safely on a large codebase, always leaving the repo in a buildable and tested state.

Finally, note that after processing a gn build target, it is added to a skiplist using built-in skiplist operators. This facility is optional, but allows Codifier to be run incrementally, repeatedly skipping already-successful targets, problem targets, or other targets that the developer doesn‘t want to modify for any reason. This feature enables rapid development by allowing iteration without repeating previously successful modifications. It allows Codifier to focus on a potentially large number of files that can be successfully modified when some can’t be. It allows developers to focus their coding effort away from problems and maximize successful bulk updates.

Quick start

The code includes a main.go that demonstrates Codifier usage in a real-world example. The demo code was used to update ~80 bringup tests, converting bootfs_test() clauses to fuchsia_unittest_package() and moving these tests from bringup to core. In addition, because including a test in core requires modifying parent build files to include the correct child tests, the demo shows a complex parent chain update.

To run the demo:

  $ go run .
  2021/02/25 09:40:25 === === === Codifier started === === ===
  2021/02/25 09:40:25  [stop_after = 30] Stopping after 30 successful updates
  2021/02/25 09:40:25  [amend = false] Will create new commit message
  2021/02/25 09:40:25 ExtractList(): list "deps" has 113 items
  ...

Depending on the state of the codebase when you run the code, the current Codifier main may or may not find bringup tests that can be modified as intended.

Restarts

When coded as shown in the main.go demo, Codifier will build and test changes after they're made. If the build and tests succeed, the changes are committed using Git. Otherwise, the changes are reset. This design solves the problem of knowing how to undo changes that lead to failures, especially for files that have already been changed by prior successful iterations of Codifier. It also creates a review-ready CL for immediate upload after a successful Codifier run.

To use this feature effectively, be sure to start Codifier in a local repo that has no local or staged changes. Otherwise, they‘ll be swept into Codifier’s new commits.

Also, if Codifier stops prematurely such as by being terminated via ctrl-C, there will likely remain either unstaged or newly committed changes. To continue running Codifier, you'll want to reset the local repo appropriately.

If there are local, unstaged changes, you can git restore . them.
If there are committed changes that you want to undo, you can git reset --hard JIRI_HEAD. This will also reset local unstaged changes.
If there are committed change that you want to keep, ie, because they're successful, then restart Codifier using go run . --amend. The --amend flag will ensure that the next run of Codifier will amend Git commits to the previous one.

Base Concepts

Codifier is designed to be simple to understand and use. There are just a few key concepts to know.

Procs

The basic container of state in Codifier is called a Proc. It holds the original text, the replacement text, and other state markers such as Fuchsia directory and whether a build or test has succeeded. Operators load the original text via file or argument. The bulk of the operators extract and manipulate the text in the replacement field. Finally, the replacement field can be written to disk to complete the change. Learn more in proc.go.

To maximize accuracy and safety, several operators restrict the parts of a file that other operators act on. For example, you can focus operators to only act on a specific, named clause in a build file, such as bootfs_test("boot_test") {deps=[...] }. In Codifier, this reduced field is called a “scope”. It allows subsequent operators to extract and modify deps, for example, without accidentally modifying the deps in other clauses in the build file.

Scopes correspond to typed and named clauses in gn build files that prefix bracket-enclosed text. These are the instantiated templates in a build file.

  group("core") {
    testonly = true

    public_deps = [
      "//bundles:amlogic_hw_dependent_tests",
      "//garnet/packages/prod:cmdutils",
      "//garnet/packages/prod:scenic",
      "//garnet/packages/testing:run_test_component",
      "//src/developer/memory/monitor",    ...

In this example, there is a scope of type group with name core. The scope contains the type, the name, and all of the text between the brackets. Once the scope is extracted, the key/values such as testonly/true and lists such as public_deps can be extracted.

Scopes, too, can be chained, allowing developers to write code that focuses on ever narrower regions of the build file.

Learn more in scope_ops.go.

lists

In gn, lists of values may be given like:

  action("cargo_toml_gen") {
    # Nothing in-tree should depend on Cargo files.
    visibility = [
      "//:additional_base_packages",
      "//:additional_cache_packages",
      "//:additional_universe_packages",
    ]
    ...

In Codifier, these can be extracted and managed via List operators. For the above, .ExtractList("visibility") would extract the list of strings in visibility and put them in the store under the key visibility.

See the List functions in scope_ops.go.

The store

As a method of passing data to later operators, each Proc contains a store that maps string key to string or string slice values. Operators that extract values or lists hold them in the store. Developers can also use operators that set and retrieve key/value pairs from the store.

Learn more in store_ops.go.

Guidance

Given the large collection of operators and possible ways to organize them, this section provides guidance and best practices for constructing and consuming operator chains.

One chain per file

An operator chain design facilitates readability because it effectively lists the steps of a bulk update. There may be a temptation to create long chains that accomplish many changes across many files. However, as chain length increases, fewer checks can be done, complicating debugging. To keep operator chains debuggable, don't try to do too much in a single chain.

A good rule of thumb is one chain per file or target.

Keep the chains simple by using DoFn and DoFnWith

The simple pattern is: read the file, makes some changes, then write the file. If complicated changes are to be made, or if they require additional changes in other files, then use a DoFn or DoFnWith to encapsulate them in a function called from the chain.

  p := NewProcFromFile(bringupFilename).<make changes>.WriteFile()

If that function needs to change files, it can create its own operator chain for modifications:

  func myFunc(s string) (string, error) {
    return s + "bar", nil
  }
  p := NewProcFromFile(bringupFilename).DoFn(myFunc).WriteFile()

Each of these can include DoFn*s as needed to handle their dependent file changes:

  func myFunc(s string) (string, error) {
    return s + "bar", nil
  }

  func myFunc2(s string) (string, error) {
    q := NewProcFromFile(bringupFilename).DoFn(myFunc).WriteFile()
    return s + q.replacement, nil
  }

  p := NewProcFromFile(bringupFilename).DoFn(myFunc2).WriteFile()

The top-level chain, upon completion of the nested DoFn*s, can call the build, test, and commit/restore operators to complete an iteration.

  p := NewProcFromFile(bringupFilename).DoFn(myFunc2).WriteFile()
         .GNFormat().Build().Test(newPackageKey, false)
           .CommitOrRestore(commitMsg, amend)

See the main.go demo for an example of this pattern.

Let chains run

It's poor form to run one operator, then perform checks on the current Proc state. Such code would be hard to read, defeating the purpose of simple, clear chains that accomplish specific tasks. Instead, prefer longer chains that assume each operator succeeds. Longer chains make the intent clear by inspection. Further, it allows the code to clearly express the coders intention at a task level.

All of the operators return nil on error. All of them are safe to chain to a operator that has been set to nil. What happens in a long chain upon error is that the error is logged, then all subsequent operators do nothing. The developer running the code can immediately see in the logs where the error was.

For example, this code reads starts with a Proc called bootfsScope, reads a file into the store under key “bothlist”, checks if the target is in the list called “bothlist”, if not deletes the current scope in bootfsScope, clears the store, and finally applies any changes made. In this code we don‘t care about the details of which step failed. We just want to know if the task completed. If there is any error targetBuildFileProc will be nil, so we’ll return an error to the calling function.

  targetBuildFileProc = bootfsScope.ReadFileIntoList("bothlist.txt", "bothlist").
    StopIfInList(targetPath, "bothlist").Delete().ClearStore().ApplyChanges()

  if targetBuildFileProc == nil {
    return nil, errors.New("  updateDeps() error: failed to delete bootfs_test")
  }

As a bulk update tool, we want the code to succeed as much as possible, accomplishing as much work as possible despite errors. Therefore, what happens with long chains when a step fails, if coded ideally, is that the set of changes fails, so that iteration fails, so Codifier resets the changes and continues on to a new target. In this way, the developer does not spend time fixing issues that block progress, but instead making considerable progress, but with a log that shows some errors to be fixed for subsequent runs.

Break up chains for control

Given the above, it‘s still necessary to break up chains to manage control flow and errors. For example, suppose we need to fail if a file can’t be read. Unlike an example above, where we read a file, then chain additional operators following the read, here we'll stop and check:

  targetBuildFileProc := NewProcFromFile(buildFile)
  if targetBuildFileProc == nil {
    return "", fmt.Errorf("  convertBootfsTests() error: couldn't read build file %q: %w",
        buildFile,   err)
  }

Branch Procs

Because the operators act on pointer to Procs, we can create derivative changes, then decide whether to keep them. For example, here we start with a Proc that reads a file. Next, we would like to test if there‘s a scope like group("tests") present in the build file. The Scope operator will return nil if a scope is not found. So we use a separate pointer to keep the result. If it’s non-nil, then we know the scope was found and can assign p to it and continue.

  p := NewProcFromFile(bringupFilename)
  // Use inner scope like group("tests") if present.
  groupTestsScope := p.Scope("group", "tests")
  if groupTestsScope != nil {
    p = groupTestsScope
    ...
  }
  // p is now the original scope or the inner scope.
  p = p.ExtractList("deps")

Keep CLs small

Codifier can do bulk updates of thousands of files. You may want to break these up into smaller CLs to ease reviewer burden. Just use the --stop_after flag to limit the number of successful iterations of Codifier changes.

  go run . --stop_after 20

Note that this feature is part of the demo in main.go and is not required in your main.

Reuse boilerplate from the example `main.go`

The original main.go in cmd/examples shows a number of uses of Codifier operators, but also lays out useful iteration code and flags. Copy it as a skeleton for your bulk update code.

Next Steps

Codifier was created to solve the bulk update problem described above, but designed so that later developers could reuse and expand the Codifier operator library. With Codifier, developers don't have to write one-off bulk update code each time the need arises. Instead, they can use the Codifier operators to write their code simply and quickly. And by building on Codifier, adding new operators or generalizing the existing ones, developers can improve the system for later build update projects and other developers.

Create a new main.go for your work, following cmd/examples/main.go.
Try out simple operator chains to see how they work. Break them up to add error checking and other processing where needed.
Study the proc.go and *_ops.go files to learn about Codifier operators.
Add new operators for task that are general enough to be needed by other developers.
Be sure to check in your code to the current Codifier code location, ensuring that your improvements will be available to other developers.
Add examples and improve the documentation.

Limitations, TODOs, and Future Work

Handle foreach functions. In gn syntax, foreach functions are similar to Codifiers scopes, but take two arguments. Substituting the loop variable into extracted code will allow it to be manipulated without actually looping through the loop variable values.
Handle if functions. In gn syntax, if functions are similar to Codifiers scopes, but take no arguments. The extraction function should allow developers to specify whether the extracted scope should be the true clause or the false clause.
Handle nested values. In gn syntax, values may be nested like:

  # This group is included in every Rust test.
  group("rust_test_metadata") {
    metadata = {
      cmx_patches_data = [
        {
          sandbox = {
            services = [ "fuchsia.process.Launcher" ]
          }
        },
      ]
    }
  }

Codifier should be modified to parse these. Currently, metadata could be pulled as a value, but modifications are needed to extract a scope from a value in the store.

Authors

gboone@google.com [Original design and first version]