[zircon][debugger] Disallow setting non-canonical rip addresses.

This CL modifies the `arch_set_general_regs` function by always checking
the `rip` for non-canonical addresses.

Without this change, callers of `zx_thread_write_state` could set the
`rip` to a non-canonical address on a suspended thread, causing the
`iretq` instruction to trigger a general protection fault.

Given that the `iretq` instruction would fail after executing the
`swapgs` instruction, the #GP exception would be handled with the `gs`
register from userspace. However, given that there was no change of
privileges, the handler will not issue another `swapgs`: causing the
handler to use the `gs` register from userspace.

Note that before, top-bit set addresses (kernel addresses) were
also disallowed. This change uses `x86_is_vaddr_canonical` instead (the
same mechanism used elsewhere in the kernel) and disallows kernel
addresses with `is_kernel_address`.

I also updated the `NoncanonicalRipAddressSyscall` test to use a real
kernel address (it was using a high address, but that was out of the
kernel address space).

I added a unit test in to test this behavior: writing a non-canonical
return address to the rip results in an invalid argument. Without
this fix, resuming that thread would crash the system.

To run the test:

```
$ fx set bringup.x64 \
    --with-base //garnet/packages/tests:zircon \
    --with-base //bundles/bringup:tests
```

And run it with `runtests -t threads-test`

BUG=50633
This vulnerability was reported by: Quarkslab

Change-Id: I09eaf28c48c4dfbb308cfa82617d6b682df31412
Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/383356
Commit-Queue: Marco Vanotti <mvanotti@google.com>
Reviewed-by: Mark Seaborn <mseaborn@google.com>
Reviewed-by: Travis Geiselbrecht <travisg@google.com>
Reviewed-by: Nick Maniscalco <maniscalco@google.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Reviewed-by: Kostya Kortchinsky <kostyak@google.com>
Testability-Review: Nick Maniscalco <maniscalco@google.com>
diff --git a/docs/concepts/kernel/sysret_problem.md b/docs/concepts/kernel/sysret_problem.md
index f8172d4..4fcb73f 100644
--- a/docs/concepts/kernel/sysret_problem.md
+++ b/docs/concepts/kernel/sysret_problem.md
@@ -1,25 +1,34 @@
-# Avoiding a problem with the SYSRET instruction
+# Avoiding a problem with the SYSRET and IRETQ instructions
 
-On x86-64, the kernel uses the SYSRET instruction to return from system
-calls.  We must be careful not to use a non-canonical return address with
-SYSRET, at least on Intel CPUs, because this causes the SYSRET instruction
-to fault in kernel mode, which is potentially unsafe.  (In contrast, on AMD
-CPUs, SYSRET faults in user mode when used with a non-canonical return
-address.)
+On x86-64, the kernel uses the `SYSRET` and `IRETQ` instructions to return
+from system calls and interrupts, respectively. We must be careful not to
+use a non-canonical return address in these instructions, at least on Intel
+CPUs, because this causes the instructions to fault in kernel mode, which
+is unsafe. In contrast, on AMD CPUs, `SYSRET` faults in user mode when used
+with a non-canonical return address.
 
-Usually, the lowest non-negative non-canonical address is 0x0000800000000000
+One of the problems with these instructions faulting in kernel mode is that
+the instructions occur at the end of the interrupt or syscall handling
+mechanism, after the `gs` register has been swapped from the kernel
+`x86_percpu` variable to a value that is controlled by userspace. When an
+exception occurs in kernel mode, the `gs` register is not changed because
+it assumes that the current `gs` register belongs to the kernel. This would
+lead to the kernel handling the fault using a `x86_percpu` structure
+controlled by the user and could easily lead to kernel code execution.
+
+Usually, the lowest non-negative non-canonical address is `0x0000800000000000`
 (== 1 << 47).  One way that a user process could cause the syscall return
 address to be non-canonical is by mapping a 4k executable page immediately
-below that address (at 0x00007ffffffff000), putting a SYSCALL instruction
-at the end of that page, and executing the SYSCALL instruction.
+below that address (at `0x00007ffffffff000`), putting a `SYSCALL` instruction
+at the end of that page, and executing the `SYSCALL` instruction.
 
 To avoid this problem:
 
 * We disallow mapping a page when the virtual address of the following page
   will be non-canonical.
 
-* We disallow setting the RIP register to a non-canonical address using
-  [`zx_thread_write_state()`] when the address would be used with SYSRET.
+* We disallow setting the `RIP` register to a non-canonical address using
+  [`zx_thread_write_state()`].
 
 For more background, see "A Stitch In Time Saves Nine: A Case Of Multiple
 OS Vulnerability", Rafal Wojtczuk
diff --git a/zircon/kernel/arch/x86/debugger.cc b/zircon/kernel/arch/x86/debugger.cc
index da8673f..b1a5808 100644
--- a/zircon/kernel/arch/x86/debugger.cc
+++ b/zircon/kernel/arch/x86/debugger.cc
@@ -219,21 +219,21 @@
   if (!x86_is_vaddr_canonical(in->gs_base))
     return ZX_ERR_INVALID_ARGS;
 
+  // fxbug.dev/50633: Disallow setting RIP to a non-canonical address, to
+  // prevent returning to such addresses using the SYSRET or IRETQ
+  // instructions. See docs/concepts/kernel/sysret_problem.md.
+  //
+  // The code also restricts the RIP to userspace addresses. There is no use
+  // case for setting the RIP to a kernel address.
+  if (!x86_is_vaddr_canonical(in->rip) || is_kernel_address(in->rip))
+    return ZX_ERR_INVALID_ARGS;
+
   DEBUG_ASSERT(thread->arch_.suspended_general_regs.gregs);
   switch (GeneralRegsSource(thread->arch_.general_regs_source)) {
     case GeneralRegsSource::Iframe:
       x86_fill_in_iframe_from_gregs(thread->arch_.suspended_general_regs.iframe, in);
       break;
     case GeneralRegsSource::Syscall: {
-      // Disallow setting RIP to a non-canonical address, to prevent
-      // returning to such addresses using the SYSRET instruction.
-      // See docs/sysret_problem.md.  Note that this check also
-      // disallows canonical top-bit-set addresses, but allowing such
-      // addresses is not useful and it is simpler to disallow them.
-      uint8_t addr_width = x86_linear_address_width();
-      uint64_t noncanonical_addr = ((uint64_t)1) << (addr_width - 1);
-      if (in->rip >= noncanonical_addr)
-        return ZX_ERR_INVALID_ARGS;
       x86_fill_in_syscall_from_gregs(thread->arch_.suspended_general_regs.syscall, in);
       break;
     }
diff --git a/zircon/system/utest/core/threads/register-set.cc b/zircon/system/utest/core/threads/register-set.cc
index 1c96610..2e6902b 100644
--- a/zircon/system/utest/core/threads/register-set.cc
+++ b/zircon/system/utest/core/threads/register-set.cc
@@ -53,8 +53,9 @@
                  (1 << 21);   // ID: used for testing for CPUID support
 
   // Set these to canonical addresses to avoid an error.
-  regs->fs_base = 0;
-  regs->gs_base = 0;
+  regs->fs_base = 0x0;
+  regs->gs_base = 0x0;
+  regs->rip = 0x0;
 #elif defined(__aarch64__)
   // Only set the 4 flag bits that are readable and writable by the
   // instructions "msr nzcv, REG" and "mrs REG, nzcv".
diff --git a/zircon/system/utest/core/threads/threads.cc b/zircon/system/utest/core/threads/threads.cc
index c1dcc35..61375159 100644
--- a/zircon/system/utest/core/threads/threads.cc
+++ b/zircon/system/utest/core/threads/threads.cc
@@ -1317,7 +1317,7 @@
 // because if the kernel returns to that address using SYSRET, that can
 // cause a fault in kernel mode that is exploitable.  See
 // sysret_problem.md.
-TEST(Threads, NoncanonicalRipAddress) {
+TEST(Threads, NoncanonicalRipAddressSyscall) {
 #if defined(__x86_64__)
   zx_handle_t event;
   ASSERT_EQ(zx_event_create(0, &event), ZX_OK);
@@ -1338,7 +1338,7 @@
   // Example addresses to test.
   uintptr_t noncanonical_addr = ((uintptr_t)1) << (x86_linear_address_width() - 1);
   uintptr_t canonical_addr = noncanonical_addr - 1;
-  uint64_t kKernelAddr = 0xffff800000000000;
+  uint64_t kKernelAddr = 0xffffff8000000000UL;
 
   zx_thread_state_general_regs_t regs_modified = regs;
 
@@ -1376,6 +1376,49 @@
 #endif
 }
 
+// Test that zx_thread_write_state() does not allow setting RIP to a
+// non-canonical address for a thread that was suspended inside an interrupt,
+// because if the kernel returns to that address using IRET, that can
+// cause a fault in kernel mode that is exploitable.
+// See docs/concepts/kernel/sysret_problem.md
+TEST(Threads, NoncanonicalRipAddressIRETQ) {
+#if defined(__x86_64__)
+  // Example addresses to test.
+  uintptr_t noncanonical_addr = ((uintptr_t)1) << (x86_linear_address_width() - 1);
+  uintptr_t kernel_addr = 0xffffff8000000000UL;
+
+  // canonical address that is safe to resume the thread to.
+  uintptr_t canonical_addr = reinterpret_cast<uintptr_t>(&spin_address);
+
+  auto test_rip_value = [&](uintptr_t address, zx_status_t expected) {
+    zx_thread_state_general_regs_t func_regs;
+    RegisterReadSetup<zx_thread_state_general_regs_t> setup;
+    setup.RunUntil(&spin_with_general_regs, &func_regs, reinterpret_cast<uintptr_t>(&spin_address));
+
+    zx_thread_state_general_regs_t regs;
+    ASSERT_OK(zx_thread_read_state(setup.thread_handle(), ZX_THREAD_STATE_GENERAL_REGS, &regs,
+                                   sizeof(regs)));
+
+    regs.rip = address;
+    EXPECT_EQ(expected, zx_thread_write_state(setup.thread_handle(), ZX_THREAD_STATE_GENERAL_REGS,
+                                              &regs, sizeof(regs)));
+
+    // Resume and re-suspend the thread. Even if the zx_thread_write_state
+    // returns an error but sets the registers, we still want to observe the
+    // crash. Note that there is no guarantee that it would happen, as the
+    // thread might get supsended before it even resumes execution.
+    setup.Resume();
+    setup.Suspend();
+  };
+
+  test_rip_value(canonical_addr, ZX_OK);
+
+  test_rip_value(noncanonical_addr, ZX_ERR_INVALID_ARGS);
+  test_rip_value(kernel_addr, ZX_ERR_INVALID_ARGS);
+
+#endif  // defined(__x86_64__)
+}
+
 // Test that, on ARM64, userland cannot use zx_thread_write_state() to
 // modify flag bits such as I and F (bits 7 and 6), which are the IRQ and
 // FIQ interrupt disable flags.  We don't want userland to be able to set