| # ChainLocks and ChainLockTransactions in the kernel |
| |
| ## What is a "ChainLock"? |
| |
| ChainLocks are a type of synchronization building block similar to a SpinLock. |
| |
| Exclusive locks generally have ordering requirements which prevent |
| deadlock. For example, if there exists a place in code where two locks (`A`, |
| and `B`) are acquired in the order `A` then `B`, then there cannot exist any |
| place in code where the locks are acquired in the order `B` then `A` which can |
| execute concurrently with the first sequence. Failure to follow this |
| requirement leads to deadlock potential. |
| |
| ChainLocks are exclusive locks whose design includes a built in protocol which |
| can be used to obtain a set of locks in an arbitrary order, even when there |
| might exist other places in the code where an overlapping set of locks might be |
| obtained in a different order (which could traditionally lead to deadlock). |
| They were introduced as part of the effort to remove the kernel's global |
| `thread_lock` and reduce overall SpinLock contention. |
| |
| ## How do they work? |
| |
| Internally, the state of a ChainLock is represented by a 64 bit unsigned |
| integer, referred to as the lock "Token". A reserved sentinel value of 0 is |
| used to indicate that the lock is in the unlocked state. |
| |
| To acquire a set of ChainLocks (`S`), a user must first generate a Token from a |
| global monotonically increasing token generator. This Token can be thought of |
| as a type of priority or ticket. Lower internal integer values (Tokens |
| generated earlier in the sequence) have a "higher priority" then Tokens with a |
| higher internal integer value. This generated token must be the token used to |
| obtain *all* of the locks in `S`. |
| |
| When a thread `A` attempts to obtain a ChainLock `L` with a token `T`, a compare |
| and exchange of the lock's state with `T` is performed, expecting the state of |
| the lock to be the unlocked sentinel value. If this succeeds (the state value |
| was "unlocked"), the lock has been acquired and `A` can continue to execute. |
| |
| If the `CMPX` operation fails, it implies that some other thread, `B`, currently |
| holds the lock using a different Token, `X`. In this situation, there are three |
| possibilities. |
| |
| 1) `T` < `X` : `A` is "higher priority" than `B`. `A` should simply continue to |
| attempt to obtain the lock, as it would have done with a SpinLock, until it |
| finally succeeds. |
| 2) `T` > `X` : `A` is "lower priority" than `B`. `A` now needs to back-off, |
| dropping _all_ of the ChainLocks it currently holds. It should then wait a |
| short amount of time before re-attempting the operation. This prevents |
| deadlock in the `B` is concurrently attempting to obtain a member of `S` |
| which is already held by `A` when `A` is attempting to acquire `L`. |
| 3) `T` == `X` : `A` is attempting to re-acquire `L`, which it already holds, and |
| `A` == `B`. This indicates that `A` is following a lock cycle, and would have |
| otherwise deadlocked. Typically, this would be considered to be a fatal |
| error, however there are places in kernel code where this can happen because |
| of a request made by user-mode code, and the kernel needs to detect the |
| situation and implement proper policy to deal with the errant user-mode |
| program instead of triggering a panic. |
| |
| See also //zircon/system/ulib/concurrent/models/chainlock for a TLA+ model of |
| the locking protocol. This model shows how the protocol avoids deadlock while |
| also guaranteeing that all threads will eventually obtain their sets of locks, |
| regardless of how the sets overlap, or the order in which the locks are |
| obtained. |
| |
| ## How are they used in the kernel? |
| |
| With the removal of the global `thread_lock`, new locks needed to be introduced |
| in order to replace the global locks functionality. While a number of new |
| classes of locks needed to be introduced, the primary new lock classes are: |
| |
| 1) A ChainLock for each `Thread` instance (the "thread's lock"). |
| 2) A ChainLock for each `WaitQueue` instance. |
| 3) A SpinLock for each Scheduler instance (the Scheduler's "queue_lock") |
| |
| Because of profile inheritance, `Thread`s and `WaitQueue`s have a relationships |
| which form "profile inheritance graphs". Adding and removing edges to the |
| global profile inheritance graphs requires holding sets of locks at the same |
| time, and user requests which cause these graph manipulations have the potential |
| to produce concurrent sequences of acquisition which could result in an |
| ephemeral cycle and deadlock were it not for the back-off protocol implemented |
| by the ChainLocks. This was the primary motivation for the initial introduction |
| of the ChainLock, however other reasons for using them have show up during the |
| process of breaking the GTL. |
| |
| ### Read-Only vs. Read-Write access. |
| |
| A general rule for the use of ChainLocks in the kernel goes something like this. |
| |
| 1) In order to perform some operation `X`, a thread may need to hold some set of |
| locks `S`. |
| 2) `S` may not be initially completely known. For example, if a thread `T` is to |
| block in an owned wait queue `W`, we know that we must hold both `T` and |
| `W`'s locks, but we do not yet know if `W` has an owner which also needs to |
| be locked. |
| 3) For operations involving a set of locks `S`, the general rule is that the |
| state of an object `O` protected by `O.lock` may be examined if `O.lock` is |
| held, but *no* mutations of any of the objects involved in `X` may take place |
| until *all* of the locks in `S` have been obtained exclusively. |
| |
| So, if we wanted to block a thread `T` in a wait queue `W`, we would: |
| |
| 1) Generate a token `A`. |
| 2) Obtain `T.lock` using `A`. |
| 3) Obtain `W.lock` using `A`. |
| 4) Check to see if `W` has an owner, and if so, obtain `W.owner.lock` using `A` |
| 5) Check to see if `W.owner` is blocked in a WaitQueue, and if so, obtain |
| `W.owner.blocking_queue.lock` using `A`. |
| 6) Repeat this process until we hold all of the locks in the priority |
| inheritance chain. |
| 7) Now that all of the locks are held, add the edge between `T` and `W`, and |
| propagate the profile inheritance consequences through the PI chain to the |
| target node. |
| 8) Drop all of the locks. |
| |
| If, at any point in time, attempting to obtain a lock results in a back-off |
| result, all of the current locks must be dropped and the sequence restarted from |
| step #2. Note that we *do not* want to regenerate `A` when we back-off and |
| retry. Keeping our original token `A` guarantees that eventually we become the |
| most important lock operation in the system, and be able to obtain all of our |
| locks without needing to back off. |
| |
| ## ChainLockTransactions |
| |
| While ChainLocks can be a generalized tool, in the kernel, they are used to solve |
| a specific class of problems where operations on objects protected by a set of |
| locks need to be performed, but the order in which the locks are obtained is not |
| always guaranteed to be the same. The ergonomics of attempting to manually |
| manage ChainLocks and Tokens can become burdensome in general, and also lead to |
| situations where it can be difficult to implement common static and dynamic |
| assertions related to lock ownership. |
| |
| In order to make this easier to use and less error prone overall, the kernel |
| introduces a new helper object called the ChainLockTransaction (or CLT). |
| |
| ### What is a ChainLockTransaction, logically speaking? |
| |
| Logically speaking, a ChainLockTransaction represents a sequence of operations |
| similar to the "block `T` in `W`" sequence previously described. At the start |
| of the transaction, `A` is generated, and the transaction is in the |
| "non-finalized" state. The transaction becomes the "active transaction", and |
| its generated token `A` is used to acquire the set of required locks. Once all |
| of the locks are acquired, the transaction becomes "finalized" (more on this |
| later) and no new chain locks may be obtained. Finally, any required mutations |
| take place, the locks are dropped, and the transaction is complete. |
| |
| ### What is a ChainLockTransaction, practically speaking? |
| |
| In code, CLTs are RAII style objects which define the scope of an operation |
| involving a set of locks (henceforth, a transaction). They are always |
| instantiated on the stack at the start of a transaction. It is an error to ever |
| instantiate one in a global scope, a static local scope, or on the heap. For a |
| given thread, there may only ever be one active CLT at a time, and a pointer to |
| this CLT is stored in the CPU's `percpu` data structure. |
| |
| There are a number of rules in the kernel which need to be followed when using |
| ChainLocks, and CLTs are meant to provide a way to both easily follow those |
| rules, as well as to enforce that the rules are being followed, either |
| statically or dynamically (using DEBUG_ASSERTs). These rules include: |
| |
| 1) The token, `A` used during a transaction must be the same token used for all |
| of the locks involved in the transaction. CLTs and the kernel's |
| specialization guarantee this by having the CLT generate `A`, and not |
| allowing any modification of this token after it was generated. Additionally, |
| the kernel version of a ChainLock does not provide a way to pass a token |
| during an `Acquire` operation, instead demanding that the token be provided |
| by a currently active CLT registered in the percpu data structure. |
| 2) Like SpinLocks, no interrupt processing or context switching can take place |
| while any ChainLocks are held. CLTs help to enforce these requirements by |
| providing specialized version which can automatically disable interrupts and |
| preemption as appropriate based on the scope of the transaction. The fact |
| that no context switching or IRQ handling can take place during a CLT is |
| also what makes it possible to store the pointer to the currently active |
| transaction in the percpu data structure, making the transaction available |
| outside of the immediate scope the transaction was instantiated in. |
| 3) There may only ever be a single CLT active at once. Overlapping transactions |
| are illegal. CLTs help enforce this rule two different ways, both statically |
| and dynamically. A global static analysis token (the |
| `chainlock_transaction_token`) is introduced, and the instantiation of a CLT |
| will `TA_ACQUIRE` this token. Note that this token is not actually a lock, |
| and "acquiring" it is a no-op from an instruction execution perspective. That |
| said, it gives the static analyzer the ability to understand that we are in a |
| currently active CLT, and any attempt to start another transaction will |
| result in a compile time error, as it would look to the analyzer like we are |
| attempting to acquire the same lock twice. Additionally, methods can use |
| this token to statically demand that we either are or are not involved in a |
| CLT, as appropriate. Second, we sometimes may need to dynamically enforce |
| that we are or are not in an active CLT. The percpu registration of the CLT |
| allows us to check this at runtime, and additionally allows us to implement |
| an `AssertHeld` style check on the `chainlock_transaction_token`, meaning |
| that we can use the dynamic check to satisfy the static analyzer in |
| situations where it would otherwise not be possible. |
| 4) ChainLocks may only be obtained in the pre-finalized stage of a CLT. After |
| the CLT becomes finalized, no new locks may be obtained. CLTs can be used to |
| enforce this requirement by tracking internally whether or not the have |
| entered the finalized state, and triggering a panic in the case that someone |
| attempts to acquire a ChainLock after the transaction was finalized. Note |
| that this is additional CLT state which is not strictly needed, and can be |
| disabled in production builds if desired. |
| |
| ### Other practical uses of ChainLockTransactions |
| #### Dynamic assertion that a ChainLock is currently held. |
| |
| There are several places in the kernel where we need to be able to demonstrate |
| to the static analyzer that a particular lock is held, but where it is not |
| possible to do so from existing static annotations. Kernel SpinLocks can |
| currently do this by invoking a method on the lock called `AssertHeld` which is |
| annotated with a `TA_ASSERT` static annotations. The state of a kernel SpinLock |
| is an integer whose value is zero of the lock is not held, and the 1s-indexed |
| CPU id of the CPU which holds the lock in the case that it is held. A |
| SpinLock's `AssertHeld` method simply needs to `DEBUG_ASSERT` that its state |
| equals `current_cpu_number + 1`. The current CPU (when interrupts are off and |
| context switching is disabled) is a constant property of the thread of |
| execution, and can be fetched from any context to establish whether or not a |
| given SpinLock instance is currently held. |
| |
| The state of a ChainLock, however, is the Token which was used to obtain it. |
| Generally speaking, this is not available to every scope unless the token was |
| explicitly plumbed into the scope, passed either by parameter or lambda capture. |
| |
| The CLT, as used in the kernel, removes this restriction. To hold any |
| ChainLocks at all requires that we have an active CLT, and that active CLT holds |
| our current token. This allows us to implement our own AssertHeld from any |
| scope, even those where the token would not otherwise be immediately available. |
| |
| Common places where this pattern shows up include: |
| |
| 1) Places where lambda functions are used, especially where the are injected into |
| a method in functional programming pattern. The "migrate" function for a |
| thread is a good example of this. |
| 2) Places involved in context switching, especially when the context switch is |
| to a newly created thread's Trampoline function. Lock handoff from the old |
| thread to the new thread needs to take place at this point in time (which is |
| a bit of a tricky process to begin with), but the Trampoline function has no |
| static way to demonstrate that it holds the locks which need to be traded. |
| CLTs provide the way to do this, even though no Tokens were explicitly |
| plumbed. |
| |
| #### Quantitative contention analysis. |
| |
| Quantitative analysis of how much time was lost to contention on a SpinLock |
| instance is relatively straightforward. If an attempt to acquire a SpinLock |
| fails the first time, the current time is observed and stored. When the lock is |
| finally obtained, the amount of time from the first contention to the eventual |
| acquisition can be computed, and the amount of time spent spinning for the lock |
| instance can be recorded (typically via kernel tracing). |
| |
| ChainLocks are a bit different. Talking about the "time spent spinning" on a |
| ChainLock is difficult, because when a lock is contended, the result is |
| sometimes that the back-off protocol needs to be invoked. All of the locks |
| involved in the transaction need to be dropped and reacquired. During |
| re-acquisition, we may end up contending on a different lock involved in the |
| transaction, or the same lock, or no lock at all. None of this time would be |
| properly accounted for if we simply attempted to track the time from first |
| conflict to subsequent acquisition of the same individual lock. |
| |
| In general, it does not make a lot of sense to talk about the "time spent |
| contending" an individual ChainLock. The CLT, however, gives us a useful analog |
| of this measurement. Instead of talking about the contention time for an |
| individual lock, we can talk about the time spent obtaining all of the locks |
| involved in a transaction if _any_ contention event was experienced along the |
| way. |
| |
| CLTs are be integrated with the kernel's version of ChainLocks to record a |
| contention start time the very first time any ChainLock contention event happens |
| during a non-finalized transaction. When all of the locks are finally obtained, |
| the transaction becomes finalized, and the time between when the first |
| contention event happened and when the transaction was finally finalized can be |
| recorded as the overall "spin contention time" for the transaction. |
| |
| This functionality can be enabled/disabled at compile time, dropping completely |
| out of the build when not needed. |
| |
| #### Implementing proper `Relax` behavior. |
| |
| As noted earlier, when a CLT encounters a situation where a back-off is |
| required, it must drop all locks and wait a small amount of time before trying |
| again. At a minimum, we want to `arch::Yield` at this point in time, but |
| sometimes we should go further. |
| |
| For example, if we had to disable both preemption and interrupts at the start of |
| our transaction, we probably do not want to keep them disabled during our back |
| off period as this risks holding off the servicing of time critical interrupts |
| and threads during acquisition. |
| |
| The specialized versions of the CLT (instantiated on the stack by users) |
| provides us a better option. CLT instances provide a `Relax()` method which can |
| be used to implement the proper behavior automatically. In the simplest case, |
| where interrupts and preemption where already appropriately disabled, all the |
| CLT needs to do is `arch::Yield`. In cases where the CLT instance is managing |
| the IRQ/Preempt state, it can: |
| |
| 1) Unregister the current transaction as the active transaction. |
| 2) Restore the preempt/IRQ state. If we are preempted or take an interrupt at |
| this point in time, our per-cpu state properly reflects the fact that we are |
| not in a currently active transaction and hold no ChainLocks. |
| 3) `arch::Yield()` |
| 4) Undo step #2, disabling preemption/IRQs as appropriate. |
| 5) Re-register the CLT as the active transaction. |
| |
| Users do not need to explicitly implement the proper behavior, the type of the |
| instantiated transaction handles this for them. |
| |
| #### Enforcing lock count consistency checks. |
| |
| There are places in code where we would like to DEBUG_ASSERT things about the |
| number of ChainLocks currently held. For example, at the start of a |
| `RescheduleCommon` operation, we expect exactly one ChainLock to be held, that |
| of the currently running thread. The dynamic assertion capability of CLTs |
| allows us to assert that we hold the current thread's lock, but we'd like to go |
| further and assert that this is the _only_ chain lock held. |
| |
| Other examples of where we want to make a check like this is during a `Relax` |
| operation, and at the end of a transaction (when the CLT instance destructs). |
| At both of these points in time, we want to be able to assert that we hold |
| no ChainLocks at all. |
| |
| CLTs give us a tool to perform this style of consistency check. They are hooked |
| into the kernel specific implementation of ChainLocks in a way which allows them |
| to monitor Acquire/Release operations, and can maintain a count of the number of |
| locks currently held. This takes extra state in the CLT (the count of locks |
| held), but like all of the debug checks made possible by CLTs, they can be |
| disabled at compile time and completely drop out of the build when not needed. |
| |
| ### Types of ChainLockTransactions. |
| |
| As noted earlier, multiple different versions of ChainLockTransaction are |
| provided to make it easier for users to satisfy preemption and IRQ |
| enabled/disabled requirements. All of these specific types of CLT are derived |
| from a common `ChainLockTransaction` class. This base class defines the common |
| CLT interface, and a `ChainLockTransaction*` is the type of pointer which is |
| stored in the percpu data structure when a transaction is active. Users, |
| however, may never instantiate a base ChainLockTransaction object. Instead, the |
| following specific types of transactions are available for use instead. |
| |
| 1) `ChainLockTransactionNoIrqSave`. This transaction does not manipulate either |
| preempt or IRQ enabled/disabled state. Instead, it simply asserts that IRQs |
| are disabled when it is generated, and implements a simple `arch::Yield` |
| if/when the transaction is `Relax`ed. |
| 2) `ChainLockTransactionIrqSave`. Automatically saves and restores the IRQ |
| disabled state. Also automatically restores IRQ state and un/re-registers |
| the active transaction during a `Relax` operation. |
| 3) `ChainLockTransactionPreemptDisableIrqAndSave`. The same as |
| `ChainLockTransactionIrqSave`, but goes one step further, automatically |
| disabling preemption, and then IRQs. |
| 4) `ChainLockTransactionEagerReschedDisableIrqAndSave`. The same as |
| `ChainLockTransactionPreemptDisableIrqAndSave`, but goes even further, |
| disabling eager rescheduling in addition to preemption. |
| |
| ### A usage example. |
| |
| Let's take a look at a simple (but heavily commented) example of using |
| ChainLockTransactions and ChainLocks. For our simplified scenario, let's |
| pretend that we want to block a `Thread` in an `OwnedWaitQueue` but we do not |
| have to deal with ownership reassignment. |
| |
| ```c++ |
| extern ChainLock::LockResult AcquireDownstream(OwnedWaitQueue& owq) |
| TA_REQ(chainlock_transaction_token, owq.lock_); |
| |
| extern void LinkAndPropagate(Thread& t, OwnedWaitQueue& owq) |
| TA_REQ(chainlock_transaction_token, t.lock_, owq.lock_); |
| |
| extern void ReleaseDownstream(OwnedWaitQueue& owq) |
| TA_REQ(chainlock_transaction_token) TA_REL(owq.lock_); |
| |
| extern ChainLock::LockResult BlockInScheduler(Thread& t) |
| TA_REQ(chainlock_transaction_token, t.lock_); |
| |
| void BlockInQueue(OwnedWaitQueue& owq) { |
| using LockResult = ChainLock::LockResult; |
| Thread* const current_thread = Thread::Current::Get(); |
| |
| // Create our transaction. This will "acquire" the chainlock_transaction |
| // token, which is a static requirement for acquiring any ChainLocks. It will |
| // also handle disabling preemption and interrupts for us. Additionally, it |
| // will ensure that a consistent token is used throughout the transaction. |
| ChainLockTransactionPreemptDisableAndIrqSave clt("BlockInQueue"); |
| |
| // Enter into our retry loop. We will cycle through this loop any time we |
| // need to back off, automatically relaxing the transaction as we do. The |
| // relax operation will dynamically assert that we have properly released all |
| // of our locks before we relax. |
| for (;; clt.Relax()) { |
| // "Unconditionally" acquire the current thread's lock. An "unconditional" |
| // acquisition of a ChainLock means that the acquire operation will continue |
| // to attempt to obtain the lock, even if it is held by a higher priority |
| // transaction. This is legal _only_ because we know that at this point in |
| // time that we hold no other locks, therefore we cannot be "in the way" of |
| // any other concurrently executing CLTs. IOW - we have no locks to drop, |
| // and can simply keep trying. |
| // |
| // The advantage of unconditional acquisition here that we can acquire the |
| // lock using a traditional RAII style guard which will handle both |
| // releasing the guard for us, as well as establishing that we hold the lock |
| // statically. |
| UnconditionalChainLockGuard guard{current_thread->lock_}; |
| |
| if (const LockResult res = owq.lock_.Acquire(); res != LockResult::kOk) { |
| // Cycles should be impossible here. The only possible non-ok result |
| // should be "Backoff". |
| DEBUG_ASSERT(res == LockResult::kBackoff); |
| |
| // All we need to do here is continue. We only hold the current thread's |
| // lock, and our guard will automatically drop it before our for loop |
| // automatically relaxes the transaction for us. |
| continue; |
| } |
| |
| // Now that we have successfully obtained the lock, assert that we just |
| // acquired it. AssertAcquired is similar to AssertHeld, but goes a step |
| // further. Not only are we asserting that we hold it for static analysis |
| // purposes, we are asserting that we just _acquired_ the lock as well, |
| // meaning that we _must_ release the lock in our scope as well. |
| owq.lock_.AssertAcquired(); |
| |
| // If this lock has an owner, grab the downstream locks as well. If this |
| // fails, drop our locks and start over. |
| if (const LockResult res = AcquireDownstream(owq); res != LockResult::kOk) { |
| // Same as before, cycles should be impossible here. |
| DEBUG_ASSERT(res == LockResult::kBackoff); |
| |
| // Drop the queue lock, and continue which will drop the thread lock and |
| // relax the transaction for us. We cannot forget to drop the queue lock |
| // because of the AssertAcquired we executed before. |
| owq.lock_.Release(); |
| continue; |
| } |
| |
| // We have all of the locks we need! Finalize our transaction (which will |
| // automatically handle logging any contention statistics if enabled). |
| clt.Finalize(); |
| |
| // Link the thread and the queue, and propagate the PI consequences. The |
| // requirement that we hold both the thread's lock and the queue's lock is |
| // statically enforced, while the fact that the downstream lock chain needs |
| // to be held can be dynamically enforced thanks to our active CLT. |
| LinkAndPropagate(*current_thread, owq); |
| |
| // Now that we have the bookkeeping handled, we can release all of the locks |
| // at and downstream of the queue we are blocking in. |
| ReleaseDownstream(owq); |
| |
| // Finally, block in the scheduler. This requires that our current thread's |
| // lock be held, but no other ChainLocks. We statically require that the |
| // current thread's lock be held, and the scheduler's code will dynamically |
| // assert that this is the only lock held. Our thread's lock will |
| // automatically be release and reacquired before and after the block |
| // operation. |
| BlockInScheduler(*current_thread); |
| |
| // Finally, break out of our loop. Our guard will automatically release our |
| // (now re-acquired) thread's lock. |
| break; |
| } |
| |
| // Exit the scope, destroying our CLT in the process. During destruction, we |
| // will automatically: |
| // |
| // 1) Check to make sure we hold no longer hold ChainLocks. |
| // 2) Check to make sure that the transaction was properly finalized. |
| // 3) Unregister the transaction as the active transaction. |
| // 4) Restore the previous preempt/IRQ enabled/disabled state. |
| } |
| ``` |