ChainLocks and ChainLockTransactions in the kernel

What is a “ChainLock”?

ChainLocks are a type of synchronization building block similar to a SpinLock.

Exclusive locks generally have ordering requirements which prevent deadlock. For example, if there exists a place in code where two locks (A, and B) are acquired in the order A then B, then there cannot exist any place in code where the locks are acquired in the order B then A which can execute concurrently with the first sequence. Failure to follow this requirement leads to deadlock potential.

ChainLocks are exclusive locks whose design includes a built in protocol which can be used to obtain a set of locks in an arbitrary order, even when there might exist other places in the code where an overlapping set of locks might be obtained in a different order (which could traditionally lead to deadlock). They were introduced as part of the effort to remove the kernel's global thread_lock and reduce overall SpinLock contention.

How do they work?

Internally, the state of a ChainLock is represented by a 64 bit unsigned integer, referred to as the lock “Token”. A reserved sentinel value of 0 is used to indicate that the lock is in the unlocked state.

To acquire a set of ChainLocks (S), a user must first generate a Token from a global monotonically increasing token generator. This Token can be thought of as a type of priority or ticket. Lower internal integer values (Tokens generated earlier in the sequence) have a “higher priority” then Tokens with a higher internal integer value. This generated token must be the token used to obtain all of the locks in S.

When a thread A attempts to obtain a ChainLock L with a token T, a compare and exchange of the lock's state with T is performed, expecting the state of the lock to be the unlocked sentinel value. If this succeeds (the state value was “unlocked”), the lock has been acquired and A can continue to execute.

If the CMPX operation fails, it implies that some other thread, B, currently holds the lock using a different Token, X. In this situation, there are three possibilities.

T < X : A is “higher priority” than B. A should simply continue to attempt to obtain the lock, as it would have done with a SpinLock, until it finally succeeds.
T > X : A is “lower priority” than B. A now needs to back-off, dropping all of the ChainLocks it currently holds. It should then wait a short amount of time before re-attempting the operation. This prevents deadlock in the B is concurrently attempting to obtain a member of S which is already held by A when A is attempting to acquire L.
T == X : A is attempting to re-acquire L, which it already holds, and A == B. This indicates that A is following a lock cycle, and would have otherwise deadlocked. Typically, this would be considered to be a fatal error, however there are places in kernel code where this can happen because of a request made by user-mode code, and the kernel needs to detect the situation and implement proper policy to deal with the errant user-mode program instead of triggering a panic.

See also //zircon/system/ulib/concurrent/models/chainlock for a TLA+ model of the locking protocol. This model shows how the protocol avoids deadlock while also guaranteeing that all threads will eventually obtain their sets of locks, regardless of how the sets overlap, or the order in which the locks are obtained.

How are they used in the kernel?

With the removal of the global thread_lock, new locks needed to be introduced in order to replace the global locks functionality. While a number of new classes of locks needed to be introduced, the primary new lock classes are:

A ChainLock for each Thread instance (the “thread's lock”).
A ChainLock for each WaitQueue instance.
A SpinLock for each Scheduler instance (the Scheduler's “queue_lock”)

Because of profile inheritance, Threads and WaitQueues have a relationships which form “profile inheritance graphs”. Adding and removing edges to the global profile inheritance graphs requires holding sets of locks at the same time, and user requests which cause these graph manipulations have the potential to produce concurrent sequences of acquisition which could result in an ephemeral cycle and deadlock were it not for the back-off protocol implemented by the ChainLocks. This was the primary motivation for the initial introduction of the ChainLock, however other reasons for using them have show up during the process of breaking the GTL.

Read-Only vs. Read-Write access.

A general rule for the use of ChainLocks in the kernel goes something like this.

In order to perform some operation X, a thread may need to hold some set of locks S.
S may not be initially completely known. For example, if a thread T is to block in an owned wait queue W, we know that we must hold both T and W's locks, but we do not yet know if W has an owner which also needs to be locked.
For operations involving a set of locks S, the general rule is that the state of an object O protected by O.lock may be examined if O.lock is held, but no mutations of any of the objects involved in X may take place until all of the locks in S have been obtained exclusively.

So, if we wanted to block a thread T in a wait queue W, we would:

Generate a token A.
Obtain T.lock using A.
Obtain W.lock using A.
Check to see if W has an owner, and if so, obtain W.owner.lock using A
Check to see if W.owner is blocked in a WaitQueue, and if so, obtain W.owner.blocking_queue.lock using A.
Repeat this process until we hold all of the locks in the priority inheritance chain.
Now that all of the locks are held, add the edge between T and W, and propagate the profile inheritance consequences through the PI chain to the target node.
Drop all of the locks.

If, at any point in time, attempting to obtain a lock results in a back-off result, all of the current locks must be dropped and the sequence restarted from step #2. Note that we do not want to regenerate A when we back-off and retry. Keeping our original token A guarantees that eventually we become the most important lock operation in the system, and be able to obtain all of our locks without needing to back off.

ChainLockTransactions

While ChainLocks can be a generalized tool, in the kernel, they are used to solve a specific class of problems where operations on objects protected by a set of locks need to be performed, but the order in which the locks are obtained is not always guaranteed to be the same. The ergonomics of attempting to manually manage ChainLocks and Tokens can become burdensome in general, and also lead to situations where it can be difficult to implement common static and dynamic assertions related to lock ownership.

In order to make this easier to use and less error prone overall, the kernel introduces a new helper object called the ChainLockTransaction (or CLT).

What is a ChainLockTransaction, logically speaking?

Logically speaking, a ChainLockTransaction represents a sequence of operations similar to the “block T in W” sequence previously described. At the start of the transaction, A is generated, and the transaction is in the “non-finalized” state. The transaction becomes the “active transaction”, and its generated token A is used to acquire the set of required locks. Once all of the locks are acquired, the transaction becomes “finalized” (more on this later) and no new chain locks may be obtained. Finally, any required mutations take place, the locks are dropped, and the transaction is complete.

What is a ChainLockTransaction, practically speaking?

In code, CLTs are RAII style objects which define the scope of an operation involving a set of locks (henceforth, a transaction). They are always instantiated on the stack at the start of a transaction. It is an error to ever instantiate one in a global scope, a static local scope, or on the heap. For a given thread, there may only ever be one active CLT at a time, and a pointer to this CLT is stored in the CPU's percpu data structure.

There are a number of rules in the kernel which need to be followed when using ChainLocks, and CLTs are meant to provide a way to both easily follow those rules, as well as to enforce that the rules are being followed, either statically or dynamically (using DEBUG_ASSERTs). These rules include:

The token, A used during a transaction must be the same token used for all of the locks involved in the transaction. CLTs and the kernel's specialization guarantee this by having the CLT generate A, and not allowing any modification of this token after it was generated. Additionally, the kernel version of a ChainLock does not provide a way to pass a token during an Acquire operation, instead demanding that the token be provided by a currently active CLT registered in the percpu data structure.
Like SpinLocks, no interrupt processing or context switching can take place while any ChainLocks are held. CLTs help to enforce these requirements by providing specialized version which can automatically disable interrupts and preemption as appropriate based on the scope of the transaction. The fact that no context switching or IRQ handling can take place during a CLT is also what makes it possible to store the pointer to the currently active transaction in the percpu data structure, making the transaction available outside of the immediate scope the transaction was instantiated in.
There may only ever be a single CLT active at once. Overlapping transactions are illegal. CLTs help enforce this rule two different ways, both statically and dynamically. A global static analysis token (the chainlock_transaction_token) is introduced, and the instantiation of a CLT will TA_ACQUIRE this token. Note that this token is not actually a lock, and “acquiring” it is a no-op from an instruction execution perspective. That said, it gives the static analyzer the ability to understand that we are in a currently active CLT, and any attempt to start another transaction will result in a compile time error, as it would look to the analyzer like we are attempting to acquire the same lock twice. Additionally, methods can use this token to statically demand that we either are or are not involved in a CLT, as appropriate. Second, we sometimes may need to dynamically enforce that we are or are not in an active CLT. The percpu registration of the CLT allows us to check this at runtime, and additionally allows us to implement an AssertHeld style check on the chainlock_transaction_token, meaning that we can use the dynamic check to satisfy the static analyzer in situations where it would otherwise not be possible.
ChainLocks may only be obtained in the pre-finalized stage of a CLT. After the CLT becomes finalized, no new locks may be obtained. CLTs can be used to enforce this requirement by tracking internally whether or not the have entered the finalized state, and triggering a panic in the case that someone attempts to acquire a ChainLock after the transaction was finalized. Note that this is additional CLT state which is not strictly needed, and can be disabled in production builds if desired.

Other practical uses of ChainLockTransactions

Dynamic assertion that a ChainLock is currently held.

There are several places in the kernel where we need to be able to demonstrate to the static analyzer that a particular lock is held, but where it is not possible to do so from existing static annotations. Kernel SpinLocks can currently do this by invoking a method on the lock called AssertHeld which is annotated with a TA_ASSERT static annotations. The state of a kernel SpinLock is an integer whose value is zero of the lock is not held, and the 1s-indexed CPU id of the CPU which holds the lock in the case that it is held. A SpinLock's AssertHeld method simply needs to DEBUG_ASSERT that its state equals current_cpu_number + 1. The current CPU (when interrupts are off and context switching is disabled) is a constant property of the thread of execution, and can be fetched from any context to establish whether or not a given SpinLock instance is currently held.

The state of a ChainLock, however, is the Token which was used to obtain it. Generally speaking, this is not available to every scope unless the token was explicitly plumbed into the scope, passed either by parameter or lambda capture.

The CLT, as used in the kernel, removes this restriction. To hold any ChainLocks at all requires that we have an active CLT, and that active CLT holds our current token. This allows us to implement our own AssertHeld from any scope, even those where the token would not otherwise be immediately available.

Common places where this pattern shows up include:

Places where lambda functions are used, especially where the are injected into a method in functional programming pattern. The “migrate” function for a thread is a good example of this.
Places involved in context switching, especially when the context switch is to a newly created thread's Trampoline function. Lock handoff from the old thread to the new thread needs to take place at this point in time (which is a bit of a tricky process to begin with), but the Trampoline function has no static way to demonstrate that it holds the locks which need to be traded. CLTs provide the way to do this, even though no Tokens were explicitly plumbed.

Quantitative contention analysis.

Quantitative analysis of how much time was lost to contention on a SpinLock instance is relatively straightforward. If an attempt to acquire a SpinLock fails the first time, the current time is observed and stored. When the lock is finally obtained, the amount of time from the first contention to the eventual acquisition can be computed, and the amount of time spent spinning for the lock instance can be recorded (typically via kernel tracing).

ChainLocks are a bit different. Talking about the “time spent spinning” on a ChainLock is difficult, because when a lock is contended, the result is sometimes that the back-off protocol needs to be invoked. All of the locks involved in the transaction need to be dropped and reacquired. During re-acquisition, we may end up contending on a different lock involved in the transaction, or the same lock, or no lock at all. None of this time would be properly accounted for if we simply attempted to track the time from first conflict to subsequent acquisition of the same individual lock.

In general, it does not make a lot of sense to talk about the “time spent contending” an individual ChainLock. The CLT, however, gives us a useful analog of this measurement. Instead of talking about the contention time for an individual lock, we can talk about the time spent obtaining all of the locks involved in a transaction if any contention event was experienced along the way.

CLTs are be integrated with the kernel's version of ChainLocks to record a contention start time the very first time any ChainLock contention event happens during a non-finalized transaction. When all of the locks are finally obtained, the transaction becomes finalized, and the time between when the first contention event happened and when the transaction was finally finalized can be recorded as the overall “spin contention time” for the transaction.

This functionality can be enabled/disabled at compile time, dropping completely out of the build when not needed.

Implementing proper `Relax` behavior.

As noted earlier, when a CLT encounters a situation where a back-off is required, it must drop all locks and wait a small amount of time before trying again. At a minimum, we want to arch::Yield at this point in time, but sometimes we should go further.

For example, if we had to disable both preemption and interrupts at the start of our transaction, we probably do not want to keep them disabled during our back off period as this risks holding off the servicing of time critical interrupts and threads during acquisition.

The specialized versions of the CLT (instantiated on the stack by users) provides us a better option. CLT instances provide a Relax() method which can be used to implement the proper behavior automatically. In the simplest case, where interrupts and preemption where already appropriately disabled, all the CLT needs to do is arch::Yield. In cases where the CLT instance is managing the IRQ/Preempt state, it can:

Unregister the current transaction as the active transaction.
Restore the preempt/IRQ state. If we are preempted or take an interrupt at this point in time, our per-cpu state properly reflects the fact that we are not in a currently active transaction and hold no ChainLocks.
arch::Yield()
Undo step #2, disabling preemption/IRQs as appropriate.
Re-register the CLT as the active transaction.

Users do not need to explicitly implement the proper behavior, the type of the instantiated transaction handles this for them.

Enforcing lock count consistency checks.

There are places in code where we would like to DEBUG_ASSERT things about the number of ChainLocks currently held. For example, at the start of a RescheduleCommon operation, we expect exactly one ChainLock to be held, that of the currently running thread. The dynamic assertion capability of CLTs allows us to assert that we hold the current thread‘s lock, but we’d like to go further and assert that this is the only chain lock held.

Other examples of where we want to make a check like this is during a Relax operation, and at the end of a transaction (when the CLT instance destructs). At both of these points in time, we want to be able to assert that we hold no ChainLocks at all.

CLTs give us a tool to perform this style of consistency check. They are hooked into the kernel specific implementation of ChainLocks in a way which allows them to monitor Acquire/Release operations, and can maintain a count of the number of locks currently held. This takes extra state in the CLT (the count of locks held), but like all of the debug checks made possible by CLTs, they can be disabled at compile time and completely drop out of the build when not needed.

Types of ChainLockTransactions.

As noted earlier, multiple different versions of ChainLockTransaction are provided to make it easier for users to satisfy preemption and IRQ enabled/disabled requirements. All of these specific types of CLT are derived from a common ChainLockTransaction class. This base class defines the common CLT interface, and a ChainLockTransaction* is the type of pointer which is stored in the percpu data structure when a transaction is active. Users, however, may never instantiate a base ChainLockTransaction object. Instead, the following specific types of transactions are available for use instead.

ChainLockTransactionNoIrqSave. This transaction does not manipulate either preempt or IRQ enabled/disabled state. Instead, it simply asserts that IRQs are disabled when it is generated, and implements a simple arch::Yield if/when the transaction is Relaxed.
ChainLockTransactionIrqSave. Automatically saves and restores the IRQ disabled state. Also automatically restores IRQ state and un/re-registers the active transaction during a Relax operation.
ChainLockTransactionPreemptDisableIrqAndSave. The same as ChainLockTransactionIrqSave, but goes one step further, automatically disabling preemption, and then IRQs.
ChainLockTransactionEagerReschedDisableIrqAndSave. The same as ChainLockTransactionPreemptDisableIrqAndSave, but goes even further, disabling eager rescheduling in addition to preemption.

A usage example.

Let‘s take a look at a simple (but heavily commented) example of using ChainLockTransactions and ChainLocks. For our simplified scenario, let’s pretend that we want to block a Thread in an OwnedWaitQueue but we do not have to deal with ownership reassignment.

extern ChainLock::LockResult AcquireDownstream(OwnedWaitQueue& owq)
  TA_REQ(chainlock_transaction_token, owq.lock_);

extern void LinkAndPropagate(Thread& t, OwnedWaitQueue& owq)
  TA_REQ(chainlock_transaction_token, t.lock_, owq.lock_);

extern void ReleaseDownstream(OwnedWaitQueue& owq)
  TA_REQ(chainlock_transaction_token) TA_REL(owq.lock_);

extern ChainLock::LockResult BlockInScheduler(Thread& t)
  TA_REQ(chainlock_transaction_token, t.lock_);

void BlockInQueue(OwnedWaitQueue& owq) {
  using LockResult = ChainLock::LockResult;
  Thread* const current_thread = Thread::Current::Get();

  // Create our transaction.  This will "acquire" the chainlock_transaction
  // token, which is a static requirement for acquiring any ChainLocks.  It will
  // also handle disabling preemption and interrupts for us.  Additionally, it
  // will ensure that a consistent token is used throughout the transaction.
  ChainLockTransactionPreemptDisableAndIrqSave clt("BlockInQueue");

  // Enter into our retry loop.  We will cycle through this loop any time we
  // need to back off, automatically relaxing the transaction as we do.  The
  // relax operation will dynamically assert that we have properly released all
  // of our locks before we relax.
  for (;; clt.Relax()) {
    // "Unconditionally" acquire the current thread's lock.  An "unconditional"
    // acquisition of a ChainLock means that the acquire operation will continue
    // to attempt to obtain the lock, even if it is held by a higher priority
    // transaction.  This is legal _only_ because we know that at this point in
    // time that we hold no other locks, therefore we cannot be "in the way" of
    // any other concurrently executing CLTs.  IOW - we have no locks to drop,
    // and can simply keep trying.
    //
    // The advantage of unconditional acquisition here that we can acquire the
    // lock using a traditional RAII style guard which will handle both
    // releasing the guard for us, as well as establishing that we hold the lock
    // statically.
    UnconditionalChainLockGuard guard{current_thread->lock_};

    if (const LockResult res = owq.lock_.Acquire(); res != LockResult::kOk) {
      // Cycles should be impossible here.  The only possible non-ok result
      // should be "Backoff".
      DEBUG_ASSERT(res == LockResult::kBackoff);

      // All we need to do here is continue.  We only hold the current thread's
      // lock, and our guard will automatically drop it before our for loop
      // automatically relaxes the transaction for us.
      continue;
    }

    // Now that we have successfully obtained the lock, assert that we just
    // acquired it. AssertAcquired is similar to AssertHeld, but goes a step
    // further.  Not only are we asserting that we hold it for static analysis
    // purposes, we are asserting that we just _acquired_ the lock as well,
    // meaning that we _must_ release the lock in our scope as well.
    owq.lock_.AssertAcquired();

    // If this lock has an owner, grab the downstream locks as well.  If this
    // fails, drop our locks and start over.
    if (const LockResult res = AcquireDownstream(owq); res != LockResult::kOk) {
      // Same as before, cycles should be impossible here.
      DEBUG_ASSERT(res == LockResult::kBackoff);

      // Drop the queue lock, and continue which will drop the thread lock and
      // relax the transaction for us.  We cannot forget to drop the queue lock
      // because of the AssertAcquired we executed before.
      owq.lock_.Release();
      continue;
    }

    // We have all of the locks we need!  Finalize our transaction (which will
    // automatically handle logging any contention statistics if enabled).
    clt.Finalize();

    // Link the thread and the queue, and propagate the PI consequences.  The
    // requirement that we hold both the thread's lock and the queue's lock is
    // statically enforced, while the fact that the downstream lock chain needs
    // to be held can be dynamically enforced thanks to our active CLT.
    LinkAndPropagate(*current_thread, owq);

    // Now that we have the bookkeeping handled, we can release all of the locks
    // at and downstream of the queue we are blocking in.
    ReleaseDownstream(owq);

    // Finally, block in the scheduler. This requires that our current thread's
    // lock be held, but no other ChainLocks.  We statically require that the
    // current thread's lock be held, and the scheduler's code will dynamically
    // assert that this is the only lock held.  Our thread's lock will
    // automatically be release and reacquired before and after the block
    // operation.
    BlockInScheduler(*current_thread);

    // Finally, break out of our loop.  Our guard will automatically release our
    // (now re-acquired) thread's lock.
    break;
  }

  // Exit the scope, destroying our CLT in the process.  During destruction, we
  // will automatically:
  //
  // 1) Check to make sure we hold no longer hold ChainLocks.
  // 2) Check to make sure that the transaction was properly finalized.
  // 3) Unregister the transaction as the active transaction.
  // 4) Restore the previous preempt/IRQ enabled/disabled state.
}