blob: 21a446e4f14cb2d94b2a8699e6cabeae438c6210 [file] [log] [blame] [view]
# Why Fuchsia devices reboot
This document lists why a Fuchsia device may reboot. Some are self-explanatory
while others require some additional context.
Outline:
- [Terminology](#terminology)
- [Reboot reasons listed](#reboot-reasons)
- [Where to find reboot reasons](#where-to-find)
## Terminology {#terminology}
### Ungraceful reboot
An ungraceful reboot is a reboot that is initiated by either the kernel in
response to an error, such as a kernel panic, or performed by the hardware
without software intervention, such as a hardware watchdog timeout.
### Graceful reboot
A graceful reboot is a reboot that is initiated by a userspace process. The
process may initiate the reboot in response to an error, like when a devices
temperature is too high, but Fuchsia should have the opportunity to undergo an
orderly shutdown.
## Reboot reasons listed {#reboot-reasons}
### Kernel panic
If the kernel is unable to recover from an internal error, that error is
considered fatal and the system will reboot.
### The system runs out of memory
If the kernel detects that the amount of free physical memory falls below a
threshold, the system will reboot. The kernel does not kill processes to try to
reclaim memory before rebooting, meaning a single process could cause a
system-wide shortage of memory and force the device to reboot.
### Cold boot
If a device loses power for long enough between when it is shut down and it
boots back up, the system will determine this to be a cold boot.
### Brownout
A device browns out when its voltage dips below an acceptable threshold. This
should only occur when there is an issue with a devices power supply or its
power related hardware.
### Hardware watchdog timeout
Zircon sets up a hardware watchdog timer that will reboot the device if it is
not reset within a specified period of time.
### Software watchdog timeout
A software watchdog timer may reboot the device if someone sets one up.
### Brief loss of power
If a device loses power for a short period of time, like when a user unplugs a
device and rapidly plugs it back in, it may be unable to determine that the
reboot was cold and will consider the reboot a result of a brief power loss. It
is important to note that there is *not* a quantitative measure of what brief is
and is hardware dependent.
### User request
The end user of the device explicitly requested a reboot of the device.
### Developer request
The developer initiated the reboot, typically via a shell command or similar
interface, including within an automated test.
### System update
A component responsible for system updates must update a package, or multiple
packages, that cannot be updated ephemerally. These packages are canonically
know as base packages.
### Retry system update
A component responsible for system updates fails to apply an update, so the
device reboots to try again (or possibly revert the update).
### ZBI swap
If the Zircon boot image is swapped, the device reboots to apply the change.
### High temperature
A component responsible for power management detects that a device's temperature
is too high and the system cannot adequately reduce the device's temperature by
throttling the CPU or reducing the audio volume.
### Session failure
If the session manager is unable to restart a crashed session or a session
determines it has failed in an unrecoverable manner, the device reboots.
### Critical component failure
If a component marked `"on_terminate": "reboot"` crashed, the device reboots.
### Factory data reset
Following a data reset to the factory defaults, the device reboots.
### Root job termination
If the userspace root job is terminated, e.g., because one of its critical
processes crashes, the device reboots.
### Generic graceful
The platform can know whether the reboot was graceful, but cannot distinguish
between a software update, a user request or some higher-level component
detecting the device as overheating. All the platform knows is that the reboot
was graceful.
### Unknown
There are some scenarios in which the platform cannot determine the specific
reboot reason nor can it determine if the reboot was graceful or ungraceful.
## Where to find reboot reasons {#where-to-find}
Fuchsia exposes the reason a device last (re)booted through
[FIDL](/sdk/fidl/fuchsia.feedback/last_reboot_info.fidl) and tracks it on Cobalt
and the crash server.
#### Culprits
Reboots that at are the result of an error in a specific component have crash
signatures that attribute that component as the cause of the reboot. They follow
a general pattern of combining the reboot reason and the component deemed
responsible for the reboot, a.k.a. the culprit.
Reboot reason | **FIDL** | **Cobalt event** | **Crash signature**
:--------------------------- | :--------------------------- | :------------------------- | :------------------
Kernel panic | `KERNEL_PANIC` | `KernelPanic` | Function responsible for the crash, exactly like a userspace crash report
System running out of memory | `SYSTEM_OUT_OF_MEMORY` | `SystemOutOfMemory` | `fuchsia-oom` or `fuchsia-oom-$CULPRIT`
Cold boot | `COLD` | `Cold` | N/A\*
Brownout | `BROWNOUT` | `Brownout` | `fuchsia-brownout`
Hardware watchdog timeout | `HARDWARE_WATCHDOG_TIMEOUT` | `HardwareWatchdogTimeout` | `fuchsia-hw-watchdog-timeout`
Software watchdog timeout | `SOFTWARE_WATCHDOG_TIMEOUT` | `SoftwareWatchdogTimeout` | `fuchsia-sw-watchdog-timeout`
Brief power loss | `BRIEF POWER LOSS` | `BriefPowerLoss` | `fuchsia-brief-power-loss`
User request | `USER_REQUEST` | `UserRequest` | N/A\*
Developer request | `DEVELOPER_REQUEST` | `DeveloperRequest` | N/A\*
System update | `SYSTEM_UPDATE` | `SystemUpdate` | N/A\*
Retry system update | `RETRY_SYSTEM_UPDATE` | `RetrySystemUpdate` | `fuchsia-retry-system-update`
ZBI swap | `ZBI_SWAP` | `ZbiSwap` | N/A\*
High temperature | `HIGH_TEMPERATURE` | `HighTemperature` | `fuchsia-reboot-high-temperature`
Session failure | `SESSION_FAILURE` | `SessionFailure` | `fuchsia-session-failure`
Critical component failure | `CRITICAL_COMPONENT_FAILURE` | `CriticalComponentFailure` | `fuchsia-critical-component-failure` or `fuchsia-reboot-$CULPRIT-terminated`
Factory data reset | `FACTORY_DATA_RESET` | `FactoryDataReset` | N/A\*
Root job termination | `ROOT_JOB_TERMINATION | `RootJobTermination` | `fuchsia-root-job-termination` or `fuchsia-reboot-$CULPRIT-terminated`
Generic graceful | *graceful* field set to true | `GenericGraceful` | `fuchsia-undetermined-userspace-reboot`
Unknown | *graceful* field not set | `Unknown` | `fuchsia-reboot-log-not-parseable`
\* Not a crash. \