[kernel][thread] Consistent lock acquisition order in thread creation

This resolves a lock ordering violation where the ThreadDispatcher acquires its
lock before calling into the ProcessDispatcher. It is a ordering violation as
the ProcessDispatcher already has a code path, in KillAllThreadsLocked, where it
calls into the ThreadDispatcher.

The change here is to invert the logic of ThreadDispatcher::Start such that the
ProcessDispatcher, and not the ThreadDispatcher, is responsible for driving the
operation and acquiring the first lock. This inversion works as both the
ProcessDispatcher and ThreadDispatcher will 'not fail' after performing their
initial checks. As such although the checks now happen in a different order the
state updates still happen under the same total conditions, and are done with
their respective locks held and so are still atomic from the point of view of
any observers.

Without this change building and running with enable_lock_dep=true results in
a consistent lock order violation during early user startup.

Test: Kerenl unit tests and e2e tests

Change-Id: Ida54022e00bd42fb060a44adad3739de29930f5c
