blob: 876dc9175b64c0af5b8cc8a7a04ac0ab05e3eb08 [file] [log] [blame] [view]
# Troubleshooting failures
This guide provides an overview of common scenarios encountered when
troubleshooting Compatibility Test for Fuchsia (CTF) test failures and walks
through some reasons for breakages and ways to unblock CL submission.
CTF tests assert on interactions between software frozen on a release branch and
software built on the `main` branch of `fuchsia.git`. If those assertions fail
due to an incompatibility, CL submission will be blocked.
The purpose of CTF is not to ban compatibility-breaking changes, rather it is
meant to make such breakages visible so that they can be appropriately
addressed.
## Scenarios
In each scenario, a real compatibility issue is introduced which causes a CTF
test to fail and block CL submission. The compatibility issue can be as simple
as changing the return value of an existing FIDL protocol used by a test. We
assume the following context common to each scenario (see [motivation] for more
testing scenarios):
- A FIDL client is frozen as a `ctf_fuchsia_package` on F19 called
`echo-service-tests`. This connects to a FIDL protocol and asserts on the
responses.
- [`generate_ctf_tests.gni`][generate_ctf_tests] contains a rule
`template("generate_echo-service-tests")` which merges the incoming client
package with a server package built on `main`. (see the [user guide] for
details).
- The client calls a method on the server called `Echo` which takes as input a
string and returns as output a string as follows:
```c++
auto server_proxy = connect_to_named_protocol("my_protocol.EchoService");
ASSERT_EQ(
server_proxy.echo("Hello"),
"Hello"
);
```
Suppose we want to change the behavior of echo to instead return the *lowercase*
representation of the string it is passed. We can update our test on `main` to
do the following:
```c++
ASSERT_EQ(
server_proxy.echo("Hello"),
"hello"
);
```
This passes for `ctf_in_development` tests, but when the frozen client with the
old assertion is run against this new server, the test will fail:
```bash
FAILURE: "hello" != "Hello"
```
CL submission is now blocked, and the path forward depends on the reason for
this breakage.
### Unintentional breaking change - soft transition
In this scenario, we do not intend to break compatibility. This is the case if
pre-built or out-of-tree components depend on the old behavior. Running those
components against a platform containing the changes to `Echo` will result in
unexpected behavior.
In this case, the safe thing to do is a soft transition to the new behavior:
1. Introduce a new method with the new behavior:
```fidl
protocol Echo {
// ...
@available(added=20)
EchoLowercase(struct {input string}) -> (string);
};
```
1. Mark the old method as deprecated or removed (optional):
```fidl
protocol Echo {
@available(removed=20)
Echo(/* ... */) -> (string);
// ...
};
```
1. Implement the new method in the server.
1. Change `fuchsia.git` callers to use the new method instead of the old one.
The server must support both methods until all API levels in which the old
method exists are no longer supported. In the above example, this will be when
F19 is unsupported, since the old Echo method is removed in F20 (due to
`@available(removed=20)`).
### Intentional breaking change - change test in release branch
In this scenario, the compatibility breakage is intentional. This may arise for
several reasons:
- We want to change the behavior of an API, and we accept the risk of
pre-built or out-of-tree clients breaking. This is a *true positive*
failure, which we will explicitly acknowledge.
- The behavior that changed is internal to a test, and does not represent a
breaking change of the SDK surface itself. This is a *false positive*
failure, because we caught a test incompatibility rather than an SDK
incompatibility.
In either case, the solution is to modify the release branch so that the test
will no longer fail when run against either the pre-change or post-change code
on `main`.
The change can be made as follows:
1. Check out the release branch:
```bash
fx sync-to refs/heads/releases/{{ '<var>' }}f19{{ '</var>' }}
```
1. Modify the assertion so it accepts both outputs (or comment it out):
```c++
EXPECT_THAT(
server_proxy.echo("Hello"),
AnyOf(Eq("Hello"), Eq("hello"))
);
```
1. Test that your changes will work on `main` (see below)
1. Commit and push changes:
```bash
git push origin HEAD:refs/for/releases/{{ '<var>' }}f19{{ '</var>' }}
```
1. Get CL reviewed and submit.
1. Land the blocked CL.
1. Clean up the release branch to accept only the new behavior (optional).
```c++
EXPECT_EQ(
server_proxy.echo("Hello"),
"hello"
);
```
You can test that your changes will work when applied to `main`. You need two
checkouts of Fuchsia, one synced to the release branch and one synced to `main`.
Do the following:
1. On the **release branch** checkout, build a new CTF bundle.
```bash
fx set core.x64 --with-tests //sdk/ctf
fx build
```
1. On the **`main`** checkout, build the CTF release tests.
```bash
fx set core.x64 --with-tests //sdk/ctf/release:tests
fx build
```
1. In the output directory for the **release branch**, copy the built CTF
bundle to the **`main`** repository.
```bash
cp -fR \
{{ '<var>' }}$RELEASE_BRANCH_FUCHSIA_OUT_DIR{{ '</var>' }}/cts/* \
{{ '<var>' }}$MAIN_BRANCH_FUCHSIA_DIR{{ '</var>' }}/prebuilt/ctf/{{ '<var>' }}f19{{ '</var>' }}/linux-x64/cts/
```
1. Rebuild the **`main`** checkout, repave device or restart emulator, and run
the tests.
1. Revert back to the version from CIPD once tests pass.
```bash
jiri run-hooks
```
#### Example: SDK compatibility breakage (true positive)
[This CL](https://fxrev.dev/1041178) introduced a real compatibility breakage to
`fuchsia.ui.policy.MediaButtonsListener` protocol. A new parameter was added to
the `MediaButtonsEvent` with correct versioning annotations, however, the
behavior if that parameter is left empty changed from the previous
implementation. This incompatibility was caught by the CTF test for F19:
```
../../src/ui/tests/conformance_input_tests/media-button-validator.cc:243: Failure
Expected equality of these values:
ToString(listener.events_received()[0])
Which is: "\n volume: 0\n mic_mute: 0\n pause: 0\n camera_disable: 0\n power: 0"
ToString(MakePowerEvent())
Which is: "\n volume: 0\n mic_mute: 0\n pause: 0\n camera_disable: 0\n power: 1"
```
It was determined that this change is acceptable because there are not any
pre-built clients of this protocol targeting API level 19.
CLs [1044994](https://fxrev.dev/1044994) and
[1049612](https://fxrev.dev/1049612) were cherry-picked into the F19 release
branch using the instructions above. Once the changes were rolled into the CTF
release used on `main`, the original CL was submitted without modification.
#### Example: Test harness compatibility breakage (false positive)
[This CL](https://fxrev.dev/1007583) introduced a compatibility breakage to the
`fuchsia.tracing.controller.Controller` protocol. The `StopTracing` method was
made `flexible`, and changing a method from `strict` to `flexible` is not
ABI-safe.
The WLAN hw-sim CTF test crashes under this change because it asserts that
tracing successfully stops during the test, even though this is unrelated to the
tested WLAN protocols and is not needed for correctness. This is a false
positive compatibility failure because it affects only the test harness itself.
[This CL](https://fxrev.dev/1014607) was landed to turn tracing failures into
warnings rather than fatal assertions, and this change was cherry-picked to the
F18 release branch. Once the change was rolled into the CTF release used on
`main`, the original CL was submitted without modification.
### Intentionally drop support for an API level for specific tests
In this scenario, we want to explicitly drop support for an API level on a
per-test basis. While [`version_history.json`][version_history] provides the
canonical listing of supported API levels, there are reasons we would want to
drop compatibility guarantees for specific protocols on a specific API level.
For example, a major refactor of a non-critical subsystem where breaking old
clients is acceptable.
In this case, we can take modify the thawing process to skip tests targeting the
API level we want to drop:
1. Modify [`generate_ctf_tests.gni`][generate_ctf_tests] on the `main` branch
to conditionally skip thawing the test.
```gn
template("generate_my-service-tests") {
forward_variables_from(invoker, [ "test_info" ])
if (defined(invoker.api_level) && invoker.api_level == "15") {
# Do not thaw this test for F15
not_needed([ "test_info" ])
group(target_name) {
}
} else {
// ...
}
}
```
1. Commit and submit this CL.
The above example skips thawing the entire artifact at F15. If you do not want
to skip all tests, see the previous section for how to skip individual test
cases on the release branch.
#### Example: Major refactor to diagnostics subsystem
[This CL](https://fxrev.dev/1019042) deleted support for the `DirectoryReady`
component event, which is no longer in use. Its primary use was to support
obtaining diagnostics data from components (which is now published using the
`fuchsia.inspect.InspectSink`) protocol.
Unfortunately, components built at F15 still published their diagnostics using
`DirectoryReady`, and all diagnostics CTF tests of this behavior would fail on
`main` following its removal.
Fortunately, there are no pre-built components targeting F15 for which we need
to obtain diagnostics data. A CL was submitted to [disable][disable-cl] all
diagnostics CTF tests originating in F15.
<!-- Reference links -->
[generate_ctf_tests]: https://fuchsia.googlesource.com/fuchsia/+/HEAD/sdk/ctf/build/generate_ctf_tests.gni
[motivation]: /docs/development/testing/ctf/compatibility_testing.md#motivation
[user guide]: /docs/development/testing/ctf/user_guide.md
[version_history]: https://fuchsia.googlesource.com/fuchsia/+/HEAD/sdk/version_history.json
[disable-cl]: https://fuchsia-review.googlesource.com/c/fuchsia/+/1015705/19/sdk/ctf/build/generate_ctf_tests.gni#40