blob: 09a3dd1f05f4acf0813da683c01ed85f68d440ce [file] [log] [blame] [view] [edit]
# Troubleshooting components {#troubleshooting-components}
This document contains tips for troubleshooting the following kinds of problems
when using the [component framework][doc-intro]:
- [Errors from the capability routing static analyzer] (#static-analyzer)
- [Error when trying to use a capability from the namespace](#troubleshoot-use)
- [Test does not start](#troubleshoot-test)
## Errors from the capability routing static analyzer {#static-analyzer}
You can run the capability routing static analyzer on a host with this command:
```
ffx scrutiny verify routes
```
The static analyzer will soon also run in CQ, so you may see CQ failures attributed
to capability routing errors.
The static analyzer currently supports [directory][doc-directory] and
[protocol][doc-protocol] capabilities. For each `use` of one of these capabilities,
the analyzer attempts to walk the [capability route][doc-routing] to its source via
a chain of valid `offer` and `expose` declarations.
If the walk fails, then the analyzer returns an error containing the following:
- The [moniker][doc-monikers] of the component instance which attempted to `use`
the capability.
- The name of the capability as stated in the `use` declaration.
- A description of the problem (e.g., a missing `offer`) and the moniker of the
component instance where the problem was detected.
For example, this error indicates that the component instance `/core/system-metrics-logger`
attempted to `use` the `config-data` capability from its parent instance `/core`, but
`/core` did not offer the capability to `/core/system-metrics-logger`:
```json5
{
"capability": "config-data",
"error": "no offer declaration for `/core` with name `config-data`",
"using_node": "/core/system-metrics-logger"
}
```
In this case, if you did want `/core/system-metrics-logger` to be able to use the `config-data`
capability, you could fix the error by updating the `/core` component manifest to include
an `offer` of the `config-data` capability to `/core/system-metrics-logger`. If the use was
unnecessary, you could fix the error by removing the `use` declaration from
`/core/system-metrics-logger`'s component manifest.
For directory capabilities, the analyzer also reports an error if a component instance
attempts to `use` a directory with broader rights than were offered. The error contains the
moniker of the component instance which first (counting from the source of the capability)
offered narrower rights. To fix such an error, you could either `use` the capability with
the narrower rights, or loosen the restriction placed by the source component if that is
appropriate.
One particular type of routing failure is not reported as an error: if a capability
route passes through a component instance which is not present in the build, then the
CQ-blocking analyzer ignores the route. This situation is expected to occur in some
cases. However, if you are trying to debug a problem and want to check if your capability
route might be unintentionally broken, you can enter the Scrutiny shell with `ffx scrutiny`
and then run
```
verify.capability_routes --capability_types directory protocol --response_level warning
```
(possibly omitting either `directory` or `protocol` depending on which capability type
you'd like to analyze). In addition to any errors, you will see warnings like the following:
```json5
{
"capability": "fuchsia.sessionmanager.Startup",
"using_node": "/startup",
"warning": "failed to find component: `no node found with path `/core/session-manager`"
}
```
This example warning indicates that the `/startup` component instance declared a `use` of the
`fuchsia.sessionmanager.Startup` capability, but that this capability was routed through a
component instance `/core/session-manager` that was not included in the build.
## Got an error when trying to use a capability from the namespace {#troubleshoot-use}
Sometimes, when connecting to a capability such as a [protocol][doc-protocol],
[service][doc-service], or [directory][doc-directory] in your
[namespace][glossary.namespace], the channel returns an error when you try to
use it. For example, consider the following snippet:
```rust
use fuchsia_component::client;
use log::info;
...
let echo = client::connect_to_protocol::<fidl_fuchsia_echo::EchoMarker>().expect("error connecting to echo");
if let Some(err) = echo.echo_string(Some("Hippos rule!")).await {
info!("Echo failed: {}", err);
}
```
In this Rust example, the code connects to the `Echo` protocol in the namespace
through the usual means, by calling the `connect_to_protocol` API in the
`fuchsia_component` crate. This call should succeed as long as the protocol was
mapped into the component's namespace by a `use` declaration in the component's
[manifest][doc-manifests]:
```json5
use: [
{ protocol: "/svc/fuchsia.echo.Echo" },
...
],
```
However, when the `connect_to_protocol` call returns successfully, it does not
necessarily mean the protocol will be available. If it's not available, the
usual symptom is that a call to the protocol over the channel fails. The
snippet above checks for this and logs the error.
There are a few conditions that can cause these errors:
- [Channel was closed after connecting to a capability in the namespace](#troubleshoot-use-routing)
- [Component fails to start](#troubleshoot-use-start)
- [Component terminated or closed the channel](#troubleshoot-use-terminated)
### Channel was closed after connecting to a capability in the namespace {#troubleshoot-use-routing}
When a protocol or service is opened in the namespace, or a directory in the
namespace is used for the first time, component manager will perform
[capability routing][doc-routing] to find the source of the capability. It's
possible that routing will fail if one of the component manifests in the
routing path was configured incorrectly. For example, it's possible that an
offer or expose declaration is missing from some component in the path, or one
of the components in the chain could not be resolved.
There are a couple ways to check if a routing failure was the cause of channel
closure:
- Check for an [epitaph][doc-epitaphs] on the closed channel.
- Check the component manager logs with `fx log --only component_manager`
- Run `ffx scrutiny verify routes` on the host in order to statically check
protocol and directory capability routes, and see
[Errors from the capability routing static analyzer](#static-analyzer)
for more information about the results.
See [checking a closed channel](#troubleshoot-closed-channel) for details on
how to check if a channel was closed and get an epitaph if there was one.
Normally, the epitaph set for a routing failure is `ZX_ERR_UNAVAILABLE`.
For a more detailed description of the error, check the kernel debuglog. Look
for a message beginning with `ERROR: Failed to route` that contains the
requesting component's [moniker][doc-monikers]. This error should give you a
hint about what went wrong. Example:
```
[component_manager] ERROR: Failed to route protocol `/svc/fuchsia.echo.Echo`
from component `/core:0/echo_client:0`: A `use from realm` declaration was
found at `/echo_client:0` for `/svc/fuchsia.echo.Echo`, but no matching
`offer` declaration was found in the parent
```
Depending on where the component runs the log may be tagged as belonging to the
component, for example `[my_component]` instead of `[component_manager]`. For a
self-contained example of failed routing that demonstrates the content of this
section, refer to
[//examples/components/routing_failed][example-routing-failed].
### Component fails to start {#troubleshoot-use-start}
It's possible that the capability was [routed](#troubleshoot-use-routing)
successfully, but something went wrong when the [runner][doc-runners] tried to
start the component. Here's a couple ways this can happen:
- The [`program`][doc-manifests-program] declaration was misconfigured. For
example, the binary's path was spelled incorrectly.
- The binary or some other resource needed to start the component was not
included in its [package][doc-packages].
When this happens, the runner closes the channel with a `PEER_CLOSED` status,
with no epitaph. See [checking a closed channel](#troubleshoot-closed-channel)
for details on how to check if a channel was closed and get an epitaph if there
was one.
Note that just from the state of the channel, it's impossible to distinguish
whether the runner failed to start the component, or the [component terminated
or closed the channel itself](#troubleshoot-use-terminated).
For a more detailed description of the error, check the logs. The log to check
depends on the runner:
- For the ELF runner, check the component manager logs with `fx log --only
component_manager`
- For other runners, check the [logs][doc-logs] of the runner component. You
can do this by running `fx log --tag <runner-name>`.
The form of the error message is runner-dependent. For the ELF runner, look for a message starting
with `ERROR: Failed to start component`:
```
[component_manager] ERROR: Failed to start component
`fuchsia-pkg://fuchsia.com/components-routing-failed-example#meta/echo_server_bad.cm`: unable to
load component with url
"fuchsia-pkg://fuchsia.com/components-routing-failed-example#meta/echo_server_bad.cm": error
loading executable: "reading object at "bin/routing_failed_echo_server_oops" failed: A FIDL
client's channel was closed: PEER_CLOSED"
```
In this case, the component failed to start because its binary was not present.
For an example of a component that failed to start due to a misconfigured
component manifest, refer to
[//examples/components/routing_failed][example-routing-failed].
### Component terminated or closed the channel {#troubleshoot-use-terminated}
If you have verified that [routing succeeded](#troubleshoot-routing) and the
[component started successfully](#troubleshoot-use-start), then the final
possibility is that the source component closed the channel itself. This can
happen while the component was running, or can be a side effect of the
component terminating.
If the component terminated because it crashed, you can look for a crash report
in `fx log` that starts like this:
```
[00177.191] 01775:02371> crashsvc: exception received, processing
[00177.191] 01775:02371> <== fatal : process echo_client.cm[21090] thread initial-thread[21092]
<stack trace follows...>
```
Note that you'll see name of the component manifest in the dump (this is
actually the process name).
If the component closed the channel itself, there's no universal way to debug
if this happened. You can look in the component's [logs][doc-logs], or in the
case of a protocol capability, search the source code for the name of the
source code in a language-appropriate format. For example, for the
`fuchsia.Echo` protocol in Rust, you might search for a `use` statement for
`fidl_fuchsia_echo`, then follow the identifier to where it's used.
The final possibility is that a component may have already been started by a
previous capability request, but has since terminated on its own.
### Checking if a channel was closed {#troubleshoot-closed-channel}
If a protocol channel was closed, you'll normally notice when trying to make a
call on it, if the call is awaited on. For example:
```rust
let res = echo.echo_string(Some("Hippos rule!")).await;
match res {
Ok(_) => { info!("Call succeeded!"); }
Err(fidl::Error::ClientChannelClosed { status, service_name } => {
error!("Channel to service {} was closed with status: {}", service_name, status);
}
Err(e) => {
error!("Unexpected error: {}", e);
}
};
```
If the call doesn't return a value (i.e. it is a one-way method), you'll only
get an error if the channel was closed prior to the call. However, if your
protocol pipelines a call that does return a value, you can also check that:
```rust
let (echo_resp, echo_resp_svc) = fidl::endpoints::create_proxy();
let res = echo_async.echo_string_pipelined(Some("Hippos rule!"), echo_resp_svc);
match res {
Ok(_) => {
info!("EchoString succeeded!");
}
Err(fidl::Error::ClientChannelClosed { status, service_name } => {
error!("Channel to service {} was closed with status: {}", service_name, status);
}
Err(e) => {
error!("Unexpected error: {}", e);
}
};
let res = echo_resp.get_result().await;
match res {
Ok(_) => { info!("GetResult succeeded!"); }
Err(fidl::Error::ClientChannelClosed { status, service_name } => {
error!("Channel to service {} was closed with status: {}", service_name, status);
}
Err(e) => {
error!("Unexpected error: {}", e);
}
};
```
If `echo_resp` is closed, it's likely that's indirectly because `echo_async` was closed.
In the case of [routing failure](#troubleshoot-use-routing), component manager
sets an [epitaph][doc-epitaphs] on the channel that was opened through the
namespace. You can get the epitaph on a closed channel as follows:
```rust
let stream = echo.take_event_stream();
match stream.next().await {
Some(Err(fidl::Error::ClientChannelClosed { status, .. })) => {
info!("Echo channel was closed with epitaph, probably due to \
failed routing: {}", status);
}
Some(m) => {
info!("Received message other than epitaph or peer closed: {:?}", m);
}
None => {
info!("Component failed to start or Echo channel was closed by server");
}
}
```
Note: in the `echo_async` example, the epitaph would be set on `echo_async`,
not `echo_resp`.
## Test does not start {#troubleshoot-test}
You write component tests using the [Test Runner Framework][doc-trf].
Sometimes, if one of the test components is configured incorrectly, this can
result in the test failing to run.
If this happens, you'll see an error like the following from `fx test`:
```
Test suite encountered error trying to run tests: getting test cases
Caused by:
The test protocol was closed. This may mean `fuchsia.test.Suite` was not configured correctly.
Refer to: https://fuchsia.dev/fuchsia-src/development/components/v2/troubleshooting#troubleshoot-test
```
Misconfigurations can happen in a few test-specific ways:
- [The test failed to expose `fuchsia.test.Suite` to test manager](#troubleshoot-test-root)
- [The test driver failed to expose `fuchsia.test.Suite` to the root](#troubleshoot-test-routing)
- [The test driver does not use a test runner](#troubleshoot-test-runner)
If you're still seeing the same error after trying the preceding solutions, consider following
[the troubleshooting steps for using capabilities](#troubleshoot-use). The troubleshooting steps may
help fix issues from routing the `fuchsia.test.Suite` capability in integration tests.
### The test failed to expose `fuchsia.test.Suite` to test manager {#troubleshoot-test-root}
This happens when the test root fails to expose `fuchsia.test.Suite` from the
[test root][doc-trf-root]. The simple fix is to add an `expose` declaration:
```json5
// test_root.cml
expose: [
...
{
protocol: "/svc/fuchsia.test.Suite",
from: "self", // If a child component is the test driver, put `from: "#driver"`
},
],
```
### The test driver failed to expose `fuchsia.test.Suite` to the root {#troubleshoot-test-routing}
If the [test driver][doc-trf-driver] and [test root][doc-trf-root] are
different components, the test driver must also expose `fuchsia.test.Suite` to
its parent, the test root.
Make sure this is in the driver's CML:
```json5
// test_driver.cml
expose: [
...
{
protocol: "/svc/fuchsia.test.Suite",
from: "self",
},
],
```
If this is the problem, you can expect to see an error like this in the logs:
```
ERROR: Failed to route protocol `/svc/fuchsia.test.Suite` from component
`/test_manager:0/...`: An `expose from #driver` declaration was found at `/test_manager:0/...`
for `/svc/fuchsia.test.Suite`, but no matching `expose` declaration was found in the child
```
### The test driver does not use a test runner {#troubleshoot-test-runner}
The [test driver][doc-trf-driver] must use the appropriate [test
runner][doc-trf-runner] corresponding to the language and test framework the
test is written with. For example, the driver of a Rust test needs the
following declaration:
```json5
// test_driver.cml
include: [ "//src/sys/test_runners/rust/default.shard.cml" ]
```
Also, if the test driver is a child of the [test root][trf-test-root], you need
to offer it to the driver:
```json5
// test_root.cml
offer: [
{
runner: "rust_test_runner",
to: [ "#driver" ],
},
],
```
[glossary.namespace]: /docs/glossary/README.md#namespace
[doc-directory]: /docs/concepts/components/v2/capabilities/directory.md
[doc-epitaphs]: /docs/reference/fidl/language/wire-format/README.md#epitaphs
[doc-trf-driver]: /docs/concepts/testing/v2/test_runner_framework.md#test-roles
[doc-trf-root]: /docs/concepts/testing/v2/test_runner_framework.md#tests-as-components
[doc-trf-runner]: /docs/concepts/testing/v2/test_runner_framework.md#test-runners
[doc-trf]: /docs/concepts/testing/v2/test_runner_framework.md
[doc-intro]: /docs/concepts/components/v2/introduction.md
[doc-logs]: /docs/concepts/diagnostics/logs/README.md
[doc-manifests-program]: /docs/concepts/components/v2/component_manifests.md#program
[doc-manifests]: /docs/concepts/components/v2/component_manifests.md
[doc-monikers]: /docs/concepts/components/v2/monikers.md
[doc-packages]: /docs/concepts/packages/package.md
[doc-protocol]: /docs/concepts/components/v2/capabilities/protocol.md
[doc-routing]: /docs/concepts/components/v2/component_manifests.md#routing
[doc-runners]: /docs/concepts/components/v2/capabilities/runners.md
[doc-service]: /docs/concepts/components/v2/capabilities/service.md
[example-routing-failed]: /examples/components/routing_failed/README.md