Lock validation is a technique for checking the consistency of locking behavior in a program to find potential deadlock hazards. This document discusses relevant aspects of the static and dynamic approaches to lock validation and presents the foundation for the runtime lock validation library available in Zircon and Fuchsia.
Lock validation may be performed either statically or dynamically. The following summarizes the important differences between static and dynamic approaches to lock validation:
Static validation is typically performed at compile time by analyzing the call graphs produced by the compiler or other source-level processor. With this approach it is necessary to instrument the code and locking primitives with annotations to inform the validator about which types represent locks and which rules to apply (or not) to the code that uses the lock types.
The benefits of static validation include early detection of issues at build time, deterministic validation results, and zero runtime overhead. This combination of properties make it attractive to always enable static validation, ensuring that locking issues are often caught before code makes it into the build, without impacting the performance of the build artifacts.
Static validation also has some down sides. One problem is that static validation requires correct, consistent application of a variety of annotations to both locks and code to provide useful results. This can cause maintenance issues unless diligent code review standards are implemented. Another issue is that static validation has limited visibility and can be fooled by conditional paths, dynamic dispatch, move semantics, and lock dependencies that span compilation units.
Dynamic validation is performed online at runtime by observing the relationships between locks as the code executes. With this approach it is generally sufficient to instrument just the locking primitives and acquire/release operations to provide the information required for validation.
The benefits of dynamic validation include simpler instrumentation (from the user's perspective) and potentially greater visibility into the actual runtime behavior of the program. This makes dynamic validation useful in large code bases, where it may not be possible for static validation to see the full set of possible lock interactions.
The main downsides of dynamic validation are runtime overhead and execution coverage requirements. Because dynamic validation must track lock interactions at runtime, each acquire and release incurs a non-zero execution cost to update tracking data, in addition to the memory overhead of the tracking data itself. Runtime tracking also has the consequence that code paths that are not executed cannot be analyzed by the validator. This may increase the burden on the developer and QA to ensure sufficient execution coverage if that is not already a project requirement.
The job of the lock validator is to determine whether or not the lock invariants of the program hold. The primary invariant is the order between two or more locks: all paths in a program that acquire two or more locks must do so in an order consistent with every other path involving two or more of the same locks to avoid the potential for deadlock. Environments that deal with hardware interrupts, such as embedded systems and kernels, have an additional ordering invariant to avoid interrupt-induced deadlocks. These invariants are illustrated in the following subsections.
The simplest form of inversion occurs when a program has two locks that are both acquired sequentially with inconsistent orders in different paths.
For example, a program with the locks A and B and code paths P1 and P2 and the following behavior has the potential for deadlock:
Path P1 acquires and releases the locks in the sequence:
Path P2 acquires and releases the locks in the inverted sequence:
With the right interleaving, perhaps due to both paths executing concurrently on different threads, a deadlock occurs when path P1 holds lock A and blocks waiting for lock B, while path P2 holds lock B and blocks waiting for lock A.
Inversion may also occur between more than two locks and paths. This kind of inversion is much harder to recognize through manual inspection because each pair of locks involved may appear to be correctly ordered in every path involving just the pairs, and yet a potential deadlock may still exist given overall ordering of the locks.
For example, a program with the locks A, B, and C; paths P1, P2, and P3; with the following behavior has the potential for deadlock:
Path P1 acquires and releases the locks in the sequence:
Path P2 acquires and releases the locks in the sequence:
Path P3 acquires and releases the locks in the sequence:
With the right interleaving of paths P1, P2, and P3 a deadlock occurs as each path acquires the lock at the first step and waits for the lock at the second step. In practice this situation may be compounded by the existence of many different paths that produce the same pairwise lock sequences.
In systems that deal with hardware interrupts the ordering between irq-safe and non-irq-safe locks is critical: a non-irq-safe lock must never be acquired while holding an irq-safe lock to prevent indirect lock inversions. Irq-safe locks preserve ordering between irq and non-irq context; a consistent order of two or more irq-safe locks is guaranteed to be safe for paths running in both irq and non-irq context. The same is not true for non-irq-safe locks. The reason for this is that non-irq-safe locks permit irq handlers to effectively insert the locks acquired by the handler at arbitrary points in the interrupted task's lock sequences.
For example, a system with non-irq-safe lock A and irq-safe lock Birq; paths P1, P2, and irq path Pirq; with the following behavior has the potential for deadlock:
Path P1 on CPU1 acquires and releases the lock in sequence:
Path Pirq on CPU1 acquires and releases the lock in sequence:
Path P2 on CPU2 acquires and releases the locks in sequence:
With the right interleaving of paths P1, P2, and Pirq a deadlock occurs as Pirq attempts to acquire Birq while P2 holds Birq and blocks waiting for A. This is an indirect lock inversion: Pirq effectively inserts an acquire/release sequence of Birq in the middle of the acquire/release sequence of A in path P1, which is inconsistent with the lock sequence for the same locks in path P2.
The invariants discussed in the previous section can be validated using a finite directed graph. The directed graph tracks the identity and order of locks as the analysis traverses the code paths. Such a graph can be built either by traversing the call graphs generated by a compiler or source-level processor (static analysis) or by observing the ordering of locks during program execution (dynamic analysis). This section introduce the process in abstract terms that apply to either approach, in preparation for developing a concrete dynamic analysis strategy later on.
In the most general terms, building a directed graph from a code path requires maintaining a list of actively held locks as the path is traversed: a node representing a lock is added to the list whenever the lock is acquired and removed from the list whenever the lock is released. In addition to maintaining the active list, a directed edge is added to the graph from a vertex representing the newly acquired lock to each vertex representing a lock already in the list.
This section illustrates a directed graph approach to detect a basic two-lock inversion.
Recall from the earlier example a program with the locks A and B; code paths P1 and P2; and the following behavior:
Path P1 acquires and releases the locks in the sequence:
Path P2 acquires and releases the locks in the inverted sequence:
Starting with path P1 we define and update the directed graph.
Let L1 be the ordered active list of locks held by path P1.
Let G = (V, E) be the directed graph, having the set of vertices V representing observed locks and the set of directed edges between vertices E.
Initial state:
L1 | V | E |
---|---|---|
() | {} | {} |
After P1 step 1:
L1 | V | E |
---|---|---|
(A) | {A} | {} |
This step adds lock A to the active list and introduces a vertex for the same lock to the directed graph. Since there are no other locks in the active list no edges are added.
After P1 step 2:
L1 | V | E |
---|---|---|
(A, B) | {A, B} | {(B, A)} |
This step adds lock B to the active list and also introduces a corresponding vertex to the graph. This time the active list does contain a lock, so an edge from the new lock to the existing lock is added to the graph. This edge represents that lock B now depends on lock A preceding it in any other path that involves both locks.
After P1 step 3:
L1 | V | E |
---|---|---|
(A) | {A, B} | {(B, A)} |
Lock B is removed from the active list. No updates to the graph.
After P1 step 4:
L1 | V | E |
---|---|---|
() | {A, B} | {(B, A)} |
Lock A is removed from the active list. No updates to the graph.
Let L2 be the active list of locks held by P2.
Initial state:
L2 | V | E |
---|---|---|
() | {A, B} | {(B, A)} |
In this case the initial state is the final state from path P1.
After P2 step 1:
L2 | V | E |
---|---|---|
(B) | {A, B} | {(B, A)} |
This step adds lock B to the active list. As there are no other locks in the active list no edges are added to the graph. Since B already has a vertex in the graph there is also no change to V.
After P2 step 2:
L2 | V | E |
---|---|---|
(B, A) | {A, B} | {(B, A), (A, B)} |
This step adds lock A to the active list. Since this lock already has a vertex there is no change to V. However, because there is a lock in the active list an edge from the new lock to the existing lock is added to the graph. With this new edge the graph now forms a cycle between vertices A and B, indicating that ordering between these locks is not consistent between the two paths considered thus far and that a potential deadlock exists.
This section illustrates a directed graph approach to detect a circular dependency inversion using previously discussed example from the invariants section. This illustration is somewhat abbreviated due to the similarity to the previous illustration.
Consider a program with the locks A, B, and C and paths P1, P2, and P3 and the following behavior:
Path P1 acquires and releases the locks in the sequence:
Path P2 acquires and releases the locks in the sequence:
Path P3 acquires and releases the locks in the sequence:
Let L1 be the ordered active list of locks held by path P1.
Let G = (V, E) be the directed graph, having the set of vertices V representing observed locks and the set of directed edges between vertices E.
Initial state:
L1 | V | E |
---|---|---|
() | {} | {} |
After P1 step 1:
L1 | V | E |
---|---|---|
(A) | {A} | {} |
After P1 step 2:
L1 | V | E |
---|---|---|
(A, B) | {A, B} | {(B, A)} |
After P1 step 3:
L1 | V | E |
---|---|---|
(A) | {A, B} | {(B, A)} |
After P1 step 4:
L1 | V | E |
---|---|---|
() | {A, B} | {(B, A)} |
Let L2 be the ordered active list of locks held by path P2.
Initial state:
L2 | V | E |
---|---|---|
() | {A, B} | {(B, A)} |
After P2 step 1:
L2 | V | E |
---|---|---|
(B) | {A, B} | {(B, A)} |
After P2 step 2:
L2 | V | E |
---|---|---|
(B, C) | {A, B, C} | {(B, A), (C, B)} |
This step adds lock C to the active list and also introduces a corresponding vertex to the graph. The active list contains the lock B, so an edge is added from C to B.
After P2 step 3:
L2 | V | E |
---|---|---|
(B) | {A, B, C} | {(B, A), (C, B)} |
After P2 step 4:
L2 | V | E |
---|---|---|
() | {A, B, C} | {(B, A), (C, B)} |
Let L3 be the ordered active list of locks held by path P3.
Initial state:
L3 | V | E |
---|---|---|
() | {A, B, C} | {(B, A), (C, B)} |
After P3 step 1:
L3 | V | E |
---|---|---|
(C) | {A, B, C} | {(B, A), (C, B)} |
After P3 step 2:
L3 | V | E |
---|---|---|
(C, A) | {A, B, C} | {(B, A), (C, B), (A, C)} |
This step adds lock A to the active list. The active list contains the lock C, so an edge is added from A to C. With this new edge the graph now forms a cycle in the vertices (A, B, C), indicating a circular dependency and the potential for deadlock if paths P1, P2, and P3 are interleaved in the right way.
This section illustrates a directed graph approach to detect irq-safe order violations using the previously discussed example from the invariants section.
Recall the example system with non-irq-safe lock A and irq-safe lock Birq; paths P1, P2, and irq path Pirq; with the following behavior:
Path P1 acquires and releases the lock in sequence:
Path Pirq acquires and releases the lock in sequence:
Path P2 acquires and releases the locks in sequence:
Let L1 be the ordered active list of locks held by path P1.
Let G = (V, E) be the directed graph, having the set of vertices V representing observed locks and the set of directed edges between vertices E.
Initial state:
L1 | V | E |
---|---|---|
() | {} | {} |
After P1 step 1:
L1 | V | E |
---|---|---|
(A) | {A} | {} |
After P1 step 2:
L1 | V | E |
---|---|---|
() | {A} | {} |
Let Lirq be the ordered active list of locks held by path Pirq.
Initial state:
Lirq | V | E |
---|---|---|
() | {A} | {} |
After Pirq step 1:
Lirq | V | E |
---|---|---|
(Birq) | {A, Birq} | {} |
After Pirq step 2:
Lirq | V | E |
---|---|---|
() | {A, Birq} | {} |
Let L2 be the ordered active list of locks held by path P2.
Initial state:
L2 | V | E |
---|---|---|
() | {A} | {} |
After P2 step 1:
L2 | V | E |
---|---|---|
(Birq) | {A, Birq} | {} |
After P2 step 2:
L2 | V | E |
---|---|---|
(Birq, A) | {A, Birq} | {(A, Birq)} |
This step adds lock A to the active list. The active list contains lock Birq, so an edge is added from A to Birq. Because this is an edge from a non-irq-safe lock to an irq-safe lock the irq-safe ordering invariant is violated and a potential deadlock exists.
This section develops a concrete strategy to implement a directed graph validator, based on the analysis techniques of the previous section.
The implementation strategy has the following goals:
In the analysis earlier in this document, locks are considered abstractly with the implication that the tracked objects are individual instances of locks. While tracking individual instances produces correct results, it has several consequences that might be avoided:
A key observation is that locks that serve identical functions should follow the same ordering rules, regardless of the number of instances.
Consider the following types with lock members and an operation that mutates both types:
struct Foo { Mutex lock; int data; GUARDED_BY(lock); }; struct Bar { Mutex lock; int data; GUARDED_BY(lock); }; void Swap(Foo* foo, Bar* bar) { foo->lock.Acquire(); bar->lock.Acquire(); int temp = foo->data; foo->data = bar->data; bar->data = temp; bar->Release(); foo->Release(); }
Since operation Swap
may operate on any instance of Foo
and any instance of Bar
it follows that Swap
establishes an order between the locks of all instances of Foo
and Bar
; failure to apply this order consistently in other parts of a program could result in a deadlock when the same instances of Foo
and Bar
are locked concurrently in different orders.
Note that it is possible to intentionally or unintentionally segregate different collections of Foo
and Bar
such that instances locked in different orders never overlap. This is still dangerous however, because seemingly innocuous changes to the inputs, structure, or timing of the program could defeat the segregation and introduce a potential deadlock. This problem can be avoided entirely by treating all instances of Foo
and Bar
equivalently and applying the same ordering rules throughout the program.
Ensuring universal ordering throughout the program can be achieved by tracking classes of locks instead of lock instances: each lock member in each type represents a unique lock class. The relationships between each lock class can be tracked and analyzed using the same directed graph techniques as with individual locks.
Tracking lock classes has the following benefits:
Tracking lock classes introduces additional ordering considerations when locking multiple locks of the same class. Because individual instances are not tracked it is necessary to take additional steps to ensure consistency when multiple locks of the same class must be acquired at the same time.
Nesting locks of the same class is necessary when a hierarchical or other ordered data structure has locks in each node and more than one per-node lock must be held at a time. In this situation the data structure or access pattern must provide a stable ordering that is used to guarantee ordering of the locks.
Validation of nestable lock classes requires only that the external order is recorded in the active locks list for each nestable lock and compared when new locks of the same class are added to the list. A consequence of this design is that other lock classes may not be interspersed between nested locks of the same class, only wholly before or after a collection of nested locks.
For example, non-nestable lock classes A and B, and nestable lock class N may be interspersed like this:
A, N0, N1, ... Nn, B
But not like this:
A, N0, B, N1, ... Nn or A, N0, N1, B, ... Nn or ... etc
In most situations this is a reasonable constraint, as interspersing other locks within a nested structure with arbitrary depth is likely to result in inversions as the structure is updated at runtime. On the other hand, in situations where nesting is bounded to a few levels it may be more effective to define separate lock classes for each level instead of using a nested class -- in this case other locks may be allowed at a specific level following normal lock ordering rules.
It is difficult to generalize lock ordering between locks of the same class without an externally provided order when the locks are acquired at different times. It is possible however, to provide an ordering guarantee when acquiring multiple locks at the same time, without temporal separation. In this situation the locks may be ordered by address, guaranteeing that any path that acquires the same set locks produces a consistent locking order.
For example, consider an operation F(Sa, Sb) that operates on two instances of structure S, each with a lock of class L and, as part of the operation F must lock both locks.
If instance S0 is ordered in memory before instance S1 then the locks have the same relative ordering as their containing instances. We can consider the locks to have the subclasses L0 and L1 respectively.
A lock ordering problem arises if we perform the operation with different orders:
F(S0, S1) and F(S1, S0)
Without intervention these produce the inverted lock sequences:
L0, L1 and L1, L0
Since F has simultaneous access to both locks at the same time, it is possible to order the locks by address, resulting in a consistent lock sequence regardless of the original order of the arguments.
Now suppose we add two more lock classes to the sequence: class A acquired before operation F and class B acquired after operation F. The resulting lock sequence is:
A, L0, L1, B
Note that this looks similar to the nested lock class sequence diagram in the previous section. It is in fact the same situation, only the ordering of locks is provided by address rather than an external order. This means that the same bookkeeping in the active threads list can be used for both situations.
This section discusses implementation details for tracking lock classes and concrete processing techniques to detect potential deadlocks.
Each lock class has a statically allocated node in the directed graph representing all locks belonging to that class. Each node has the following data structures:
Each lock class node has a hash set that tracks the edges from the lock class to the lock classes ordered before it.
TODO: Add implementation details of the hash set.
Each lock class node has a parent pointer used to track nodes that are connected in cycles in the directed graph. This permits reporting cycles that have been previously by the loop detection algorithm without fully re-traversing the graph.
TODO: Add implementation details of the disjoint set structure.
Each thread maintains a thread-local list of the locks it currently holds.
TODO: Add implementation details of the thread-local lock list.
Whenever a new edge is added to the directed graph, the loop detection thread is triggered to traverse the graph to find circular dependencies involving more than two locks. Tarjan's strongly connected sets algorithm is an efficient choice, with worst case complexity of O(|E| + |V|). This algorithm is stable even when traversing a graph that is updated concurrently by other threads.
TODO: Add implementation details of the loop detection thread.