blob: 897b86a00d69d19407b2e7b775f071c3d06e28e9 [file] [log] [blame] [view]
# Debugging lock dependency cycles
Lock dependency cycles are a common source of deadlocks. This guide provides
instructions for detecting, debugging, and resolving lock dependency cycles.
## Rust
Rust programs on Fuchsia can use [`fuchsia_sync`] for their locks to benefit
from additional runtime checks that detect access patterns that can deadlock.
These checks rely on the [`tracing_mutex`] crate to detect cycles between lock
acquisitions across different threads.
### Adopting fuchsia_sync
To start using `fuchsia_sync` in your code, follow these steps:
1. Add `//src/lib/fuchsia-sync` to your `deps`.
1. Replace `std::sync::Mutex` in your code with `fuchsia_sync::Mutex`.
1. Replace `std::sync::RwLock` with `fuchsia_sync::RwLock`.
1. Remove any error handling for poisoned locks, as `fuchsia_sync` does not
support [lock poisoning].
### Enabling cycle checks
These checks are enabled in `fuchsia_sync` by default in debug builds.
You can manually enable them in balanced or release builds by setting a GN arg:
```sh
fx set ... --args=fuchsia_sync_detect_lock_cycles=true
```
Note: This instrumentation has significant performance and memory overhead
that will impact the usability of a device.
If a lock cycle is detected you will see a panic message like this:
```
thread 'main' (1) panicked at ../../third_party/rust_crates/forks/tracing-mutex-0.3.2/src/reporting.rs:
Found cycle in mutex dependency graph:
disabled backtrace
stack backtrace:
...
```
See the next section for instructions on how to enable backtraces.
### Printing cycle backtraces
`tracing-mutex` will always print a backtrace for the panicking thread that
would actually trigger a deadlock, but it is often useful to also know what
other lock acquisitions are a part of a cycle.
The instrumentation will collect and print these additional backtraces when the
`RUST_BACKTRACE` environment variable is set to `1`. Note that this comes with
a large performance overhead on top of the instrumentation's baseline overhead.
For an ELF component, include this shard in your component manifest to collect
backtraces for all lock acquisitions and print relevant ones when a deadlock
is detected:
```json5
{
include: [ "//src/lib/fuchsia-sync/meta/enable_rust_backtrace.shard.cml" ],
// ...
}
```
### Suppressing panics
You can suppress panics from lock cycles by calling
```rs
fuchsia_sync::suppress_lock_cycle_panics();
```
Warning: This should be used with caution as it hides real deadlocks.
## Ensuring consistent lock access order
This section lists some strategies that you can use to prevent deadlocks once
this instrumentation has identified a cycle.
### Example
Consider the following code:
```rs
fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
let mut foo = foo.lock();
let mut bar = bar.lock();
foo.do_thing();
bar.do_thing();
}
fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
let mut bar = bar.lock();
let mut foo = foo.lock();
foo.do_other_thing();
bar.do_other_thing();
}
fn main() {
let foo = Mutex::new(...);
let bar = Mutex::new(...);
let first = std::thread::spawn(|| do_thing_to_both(foo, bar));
let second = std::thread::spawn(|| do_other_thing_to_both(foo, bar));
first.join().unwrap();
second.join().unwrap();
}
```
This code will deadlock in scenarios where events occur in the following order:
1. `first` acquires `foo`
2. `second` acquires `bar`
3. `first` attempts to acquire `bar` but it is held by `second`
4. `second` attempts to acquire `foo` but it is held by `first`
Steps (3) and (4) will block without any thread able to wake them, leading to
a deadlock. `tracing-mutex` will panic with a message indicating that a cycle
has been detected.
Depending on the synchronization requirements of the locks in your use case, you
may be able to avoid this cycle in a couple of ways.
### Removing overlapping lock acquisitions
The simplest way to prevent a lock acquisition from participating in a cycle is
to release the lock before acquiring the next one. This is useful if the values
guarded by the two locks don't actually require their modifications to be
synchronized.
The above example can be fixed by updating the code as follows:
```rs
fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
{
let mut foo = foo.lock();
foo.do_thing();
}
{
let mut bar = bar.lock();
bar.do_thing();
}
}
fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
{
let mut bar = bar.lock();
bar.do_other_thing();
}
{
let mut foo = foo.lock();
foo.do_other_thing();
}
}
// ...
```
By releasing each lock before acquiring the next one, we ensure that no thread
can starve any other thread indefinitely.
This will allow modifications to the two variables to be interleaved but that is
acceptable in many situations.
### Aligning lock access order
In cases where it's important for accesses to two or more locks to be
synchronized, you must ensure that all threads acquire the locks in the exact
same order every time.
In the simplified example you could achieve this by swapping the order the locks
are acquired in `do_other_thing_to_both()`:
```rs
fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
// This order is the same as the original example.
let mut foo = foo.lock();
let mut bar = bar.lock();
foo.do_thing();
bar.do_thing();
}
fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
// Now the code acquires the locks in the same order as do_thing_to_both().
let mut foo = foo.lock();
let mut bar = bar.lock();
foo.do_other_thing();
bar.do_other_thing();
}
// ...
```
By always locking `foo` before locking `bar`, you ensure that all threads
acquire the locks in the same order and prevent them from forming a cycle and
deadlocking.
#### Asserting the correct acquisition order
When possible, acquire locks in their intended order early in their lifecycle.
This informs future readers and the cycle instrumentation of the correct
acquisition order, ensuring that panic messages have source locations pointing
to the callsite with incorrect usage.
Limit these extra lock acquisitions to builds with `debug_assertions` enabled
to avoid any performance penalty in release builds.
In the simple case of two locks, this means acquiring both locks in the desired
order shortly after they are created. For example:
```rs
fn main() {
let foo = Mutex::new(...);
let bar = Mutex::new(...);
// foo should always be acquired before bar if they need to overlap.
#[cfg(debug_assertions)]
{
let _foo = foo.lock();
let _bar = bar.lock();
}
// ...
}
```
This will ensure that panics will come from code where `bar` is acquired before
`foo`, regardless of the exact order of the logic under test.
[`fuchsia_sync`]: https://fuchsia-docs.firebaseapp.com/rust/fuchsia_sync/index.html
[`tracing_mutex`]: https://fuchsia-docs.firebaseapp.com/rust/tracing_mutex/index.html
[lock poisoning]: https://doc.rust-lang.org/std/sync/struct.Mutex.html#poisoning