| # Hardware watchdog timers |
| |
| ## Overview |
| |
| A hardware watchdog timer (WDT) is a special piece of hardware responsible for |
| resetting a system in the event of a hard system-wide lockup. They are |
| frequently present in system-on-a-chip (SoC) designs, especially in SoCs |
| targeted towards smaller embedded systems applications. They can be an important |
| aspect of system design in that they can trigger a reboot of a system that has |
| locked up so completely that is has become incapable of managing mission |
| critical tasks, such as active thermal management, before the system suffers |
| irreparable damage. In addition, they can help to mitigate a poor user |
| experience in an embedded system by automatically resetting a hard-locked system |
| without user intervention. No one wants to have to deal with a locked-up device, |
| but if the worst happens, it is much more preferable that a system automatically |
| reset itself instead of forcing a user to suffer a non-responsive system until |
| they decide that it needs to be power cycled, and then have to go and physically |
| unplug the device in order to recover it. |
| |
| A watchdog timer (just "watchdog" or "WDT" for short) typically works by being |
| configured to count at a particular rate up to a threshold count. If the counter |
| reaches the threshold before being reset by software, the WDT will |
| automatically, and un-gracefully, reboot the system. The act of resetting the |
| WDT from software is commonly referred to as "petting" the watchdog. The period |
| of a hardware watchdog in a system tends to be rather large (one or more |
| seconds) as this mechanism is an absolute worst case fail-safe. The system |
| should be well and truly locked-up before the watchdog ever fires. |
| |
| ## Usage in Zircon |
| |
| A WDT in Zircon, when available, is used to protect the absolute lowest level of |
| the system. When enabled, it is nominally pet somewhere between 1/4 and 1/2 of |
| the way through it cycle by a kernel level timer, meaning that it is pet in a |
| hard IRQ context independent of threads. Petting the watchdog is not subject to |
| thread weights, deadlines, or any other scheduler behavior. For a hardware |
| watchdog to fire and reboot the system, the system needs locked up to the point |
| that timer IRQs cannot be serviced. |
| |
| As mentioned above, watchdog timers are hardware specific entities. Whether or |
| not one exists, what is it capable of doing, and specifically how to operate one |
| (when it exists), are not common to an architecture like x64 or ARM64. Given the |
| location of the "pet" operation at the absolute lowest level of the Zircon |
| kernel, it is up to the kernel to pet the dog, not any hardware specific |
| user-mode drivers. |
| |
| Because of this, it is up to the bootloader to configure the watchdog properly |
| and communicate to the kernel (through the ZBI) the important details of whether |
| or not the WDT exists, whether it is enabled, how frequently it must be pet, and |
| how to pet, enable, or disable it. A system running zircon only "has" a WDT if |
| the bootloader tells it that it does and how to operate it. While a bootloader |
| must tell the kernel how to pet the watchdog when present and enabled, it might |
| not tell the kernel how to disable it. This could be the result of either a |
| system design decision, or because the WDT cannot be disabled from the kernel. |
| |
| Typically, hardware WDTs are configured and enabled by the bootloader just |
| before control is transferred to the kernel. This way, if the kernel completely |
| locks up during startup, the WDT will reset the system. On the other side of the |
| fence, the kernel attempts to recognize and pet the WDT as early as possible in |
| the boot sequence. Later on, it will settle into a pattern of periodically |
| petting the dog once boot has progressed to the point where it is possible to |
| set timers. |
| |
| ## Methods for controlling the watchdog during development |
| |
| The vast majority of developers should never need to do anything with a watchdog |
| timer, or even be aware of it existing. For it to fire during normal operation |
| is an indication of something going rather badly wrong. In some situations, |
| however, developers may be in a situation where they need to hold off hard interrupt |
| requests (IRQs) for excessive amounts of time as part of investigating a bug, or |
| other performance issues. In these situations, it is good to know what options |
| exist for controlling the watchdog, and not getting bit at inappropriate times. |
| |
| ### Use the kernel shell extension |
| |
| If you have access to the kernel shell and the system is stable enough to boot |
| to the point where the kernel shell is accessible, you can use the shell |
| extension to manipulate the WDT. Run `k wdt help` to see a list of the |
| available commands. Run `k wdt status` to see if the kernel is aware of any |
| hardware WDTs at all, and if it is, whether the WDT is enabled or not, what the |
| nominal pet period is, and how long ago the timer was last pet. If needed, you |
| can run `k wdt disable` to disable the watchdog. You can only disable the WDT if |
| the bootloader has told the kernel how to disable the WDT. |
| |
| ### Use the kernel command line |
| |
| You can pass kernel command line arguments to control the watch dog. You can |
| send `kernel.force-watchdog-disabled=true` to tell the kernel to force disable |
| the watchdog as early as possible during the boot. This can be useful if |
| problems are causing the watchdog to fire before it gets to the point where the |
| kernel shell is easily accessible. However, this is only an option if the |
| bootloader has told the kernel how to disable the watchdog. |