Why Fuchsia devices reboot

This document lists why a Fuchsia device may reboot. Some are self-explanatory while others require some additional context.

Outline:

Terminology

Ungraceful reboot

An ungraceful reboot is a reboot that is initiated by either the kernel in response to an error, such as a kernel panic, or performed by the hardware without software intervention, such as a hardware watchdog timeout.

Graceful reboot

A graceful reboot is a reboot that is initiated by a userspace process. The process may initiate the reboot in response to an error, like when a device’s temperature is too high, but Fuchsia should have the opportunity to undergo an orderly shutdown.

Reboot reasons listed

Kernel panic

If the kernel is unable to recover from an internal error, that error is considered fatal and the system will reboot.

The system runs out of memory

If the kernel detects that the amount of free physical memory falls below a threshold, the system will reboot. The kernel does not kill processes to try to reclaim memory before rebooting, meaning a single process could cause a system-wide shortage of memory and force the device to reboot.

Cold boot

If a device loses power for long enough between when it is shut down and it boots back up, the system will determine this to be a cold boot.

Brownout

A device browns out when its voltage dips below an acceptable threshold. This should only occur when there is an issue with a device’s power supply or its power related hardware.

Hardware watchdog timeout

Zircon sets up a hardware watchdog timer that will reboot the device if it is not reset within a specified period of time.

Software watchdog timeout

A software watchdog timer may reboot the device if someone sets one up.

Brief loss of power

If a device loses power for a short period of time, like when a user unplugs a device and rapidly plugs it back in, it may be unable to determine that the reboot was cold and will consider the reboot a result of a brief power loss. It is important to note that there is not a quantitative measure of what brief is and is hardware dependent.

User request

A user or a component acting on behalf of a user, such as SL4F or RCS, determines a reboot is necessary.

System update

A component responsible for system updates must update a package, or multiple packages, that cannot be updated ephemerally. These packages are canonically know as base packages.

Retry system update

A component responsible for system updates fails to apply an update, so the device reboots to try again (or possibly revert the update).

ZBI swap

If the Zircon boot image is swapped, the device reboots to apply the change.

High temperature

A component responsible for power management detects that a device‘s temperature is too high and the system cannot adequately reduce the device’s temperature by throttling the CPU or reducing the audio volume.

Session failure

If the session manager is unable to restart a crashed session or a session determines it has failed in an unrecoverable manner, the device reboots.

Sysmgr failure

If the system manager for legacy components (sysmgr) crashes, the device reboots.

Critical component failure

If a critical component managed by sysmgr crashed, the device reboots.

Factory data reset

Following a data reset to the factory defaults, the device reboots.

Root job termination

If the userspace root job is terminated, e.g., because one of its critical processes crashes, the device reboots.

Generic graceful

The platform can know whether the reboot was graceful, but cannot distinguish between a software update, a user request or some higher-level component detecting the device as overheating. All the platform knows is that the reboot was graceful.

Generic ungraceful

There are some scenarios in which a specific reboot reason cannot be determined, i.e. we don’t know if it was a kernel panic or a watchdog timeout, but we still know the reboot was ungraceful.

Unknown

There are some scenarios in which the platform cannot determine the specific reboot reason nor can it determine if the reboot was graceful or ungraceful.

Where to find reboot reasons

Fuchsia exposes the reason a device last (re)booted through FIDL and tracks it on Cobalt and the crash server.

Reboot reasonFIDLCobalt eventCrash signature
Kernel panicKERNEL_PANICKernelPanicfuchsia-kernel-panic
System running out of memorySYSTEM_OUT_OF_MEMORYSystemOutOfMemoryfuchsia-oom
Cold bootCOLDColdN/A*
BrownoutBROWNOUTBrownoutfuchsia-brownout
Hardware watchdog timeoutHARDWARE_WATCHDOG_TIMEOUTHardwareWatchdogTimeoutfuchsia-hw-watchdog-timeout
Software watchdog timeoutSOFTWARE_WATCHDOG_TIMEOUTSoftwareWatchdogTimeoutfuchsia-sw-watchdog-timeout
Brief power lossBRIEF POWER LOSSBriefPowerLossfuchsia-brief-power-loss
User requestUSER_REQUESTUserRequestN/A*
System updateSYSTEM_UPDATESystemUpdateN/A*
Retry system updateRETRY_SYSTEM_UPDATERetrySystemUpdatefuchsia-retry-system-update
ZBI swapZBI_SWAPZbiSwapN/A*
High temperatureHIGH_TEMPERATUREHighTemperaturefuchsia-high-temperature-reboot
Session failureSESSION_FAILURESessionFailurefuchsia-session-failure
Sysmgr failureSYSMGR_FAILURESysmgrFailurefuchsia-sysmgr-failure
Critical component failureCRITICAL_COMPONENT_FAILURECriticalComponentFailurefuchsia-critical-component-failure
Factory data resetFACTORY_DATA_RESETFactoryDataResetN/A*
Root job termination`ROOT_JOB_TERMINATIONRootJobTerminationfuchsia-root-job-termination
Generic gracefulgraceful field set to trueGenericGracefulN/A*
Generic ungracefulgraceful field set to falseGenericUngracefulN/A**
Unknowngraceful field not setUnknownfuchsia-reboot-log-not-parseable

* Not a crash.
** Currently not implemented.