compiler/rustc_mir_transform/src/dest_prop.rs - third_party/rust - Git at Google

 //! Propagates assignment destinations backwards in the CFG to eliminate redundant assignments.
 //!
 //! # Motivation
 //!
 //! MIR building can insert a lot of redundant copies, and Rust code in general often tends to move
 //! values around a lot. The result is a lot of assignments of the form `dest = {move} src;` in MIR.
 //! MIR building for constants in particular tends to create additional locals that are only used
 //! inside a single block to shuffle a value around unnecessarily.
 //!
 //! LLVM by itself is not good enough at eliminating these redundant copies (eg. see
 //! <https://github.com/rust-lang/rust/issues/32966>), so this leaves some performance on the table
 //! that we can regain by implementing an optimization for removing these assign statements in rustc
 //! itself. When this optimization runs fast enough, it can also speed up the constant evaluation
 //! and code generation phases of rustc due to the reduced number of statements and locals.
 //!
 //! # The Optimization
 //!
 //! Conceptually, this optimization is "destination propagation". It is similar to the Named Return
 //! Value Optimization, or NRVO, known from the C++ world, except that it isn't limited to return
 //! values or the return place `_0`. On a very high level, independent of the actual implementation
 //! details, it does the following:
 //!
 //! 1) Identify `dest = src;` statements with values for `dest` and `src` whose storage can soundly
 //!    be merged.
 //! 2) Replace all mentions of `src` with `dest` ("unifying" them and propagating the destination
 //!    backwards).
 //! 3) Delete the `dest = src;` statement (by making it a `nop`).
 //!
 //! Step 1) is by far the hardest, so it is explained in more detail below.
 //!
 //! ## Soundness
 //!
 //! We have a pair of places `p` and `q`, whose memory we would like to merge. In order for this to
 //! be sound, we need to check a number of conditions:
 //!
 //! * `p` and `q` must both be *constant* - it does not make much sense to talk about merging them
 //!   if they do not consistently refer to the same place in memory. This is satisfied if they do
 //!   not contain any indirection through a pointer or any indexing projections.
 //!
 //! * `p` and `q` must have the **same type**. If we replace a local with a subtype or supertype,
 //!   we may end up with a different vtable for that local. See the `subtyping-impacts-selection`
 //!   tests for an example where that causes issues.
 //!
 //! * We need to make sure that the goal of "merging the memory" is actually structurally possible
 //!   in MIR. For example, even if all the other conditions are satisfied, there is no way to
 //!   "merge" `_5.foo` and `_6.bar`. For now, we ensure this by requiring that both `p` and `q` are
 //!   locals with no further projections. Future iterations of this pass should improve on this.
 //!
 //! * Finally, we want `p` and `q` to use the same memory - however, we still need to make sure that
 //!   each of them has enough "ownership" of that memory to continue "doing its job." More
 //!   precisely, what we will check is that whenever the program performs a write to `p`, then it
 //!   does not currently care about what the value in `q` is (and vice versa). We formalize the
 //!   notion of "does not care what the value in `q` is" by checking the *liveness* of `q`.
 //!
 //!   Because of the difficulty of computing liveness of places that have their address taken, we do
 //!   not even attempt to do it. Any places that are in a local that has its address taken is
 //!   excluded from the optimization.
 //!
 //! The first two conditions are simple structural requirements on the `Assign` statements that can
 //! be trivially checked. The third requirement however is more difficult and costly to check.
 //!
 //! ## Future Improvements
 //!
 //! There are a number of ways in which this pass could be improved in the future:
 //!
 //! * Merging storage liveness ranges instead of removing storage statements completely. This may
 //!   improve stack usage.
 //!
 //! * Allow merging locals into places with projections, eg `_5` into `_6.foo`.
 //!
 //! * Liveness analysis with more precision than whole locals at a time. The smaller benefit of this
 //!   is that it would allow us to dest prop at "sub-local" levels in some cases. The bigger benefit
 //!   of this is that such liveness analysis can report more accurate results about whole locals at
 //!   a time. For example, consider:
 //!
 //!   ```ignore (syntax-highlighting-only)
 //!   _1 = u;
 //!   // unrelated code
 //!   _1.f1 = v;
 //!   _2 = _1.f1;
 //!   ```
 //!
 //!   Because the current analysis only thinks in terms of locals, it does not have enough
 //!   information to report that `_1` is dead in the "unrelated code" section.
 //!
 //! * Liveness analysis enabled by alias analysis. This would allow us to not just bail on locals
 //!   that ever have their address taken. Of course that requires actually having alias analysis
 //!   (and a model to build it on), so this might be a bit of a ways off.
 //!
 //! * Various perf improvements. There are a bunch of comments in here marked `PERF` with ideas for
 //!   how to do things more efficiently. However, the complexity of the pass as a whole should be
 //!   kept in mind.
 //!
 //! ## Previous Work
 //!
 //! A [previous attempt][attempt 1] at implementing an optimization like this turned out to be a
 //! significant regression in compiler performance. Fixing the regressions introduced a lot of
 //! undesirable complexity to the implementation.
 //!
 //! A [subsequent approach][attempt 2] tried to avoid the costly computation by limiting itself to
 //! acyclic CFGs, but still turned out to be far too costly to run due to suboptimal performance
 //! within individual basic blocks, requiring a walk across the entire block for every assignment
 //! found within the block. For the `tuple-stress` benchmark, which has 458745 statements in a
 //! single block, this proved to be far too costly.
 //!
 //! [Another approach after that][attempt 3] was much closer to correct, but had some soundness
 //! issues - it was failing to consider stores outside live ranges, and failed to uphold some of the
 //! requirements that MIR has for non-overlapping places within statements. However, it also had
 //! performance issues caused by `O(l² * s)` runtime, where `l` is the number of locals and `s` is
 //! the number of statements and terminators.
 //!
 //! Since the first attempt at this, the compiler has improved dramatically, and new analysis
 //! frameworks have been added that should make this approach viable without requiring a limited
 //! approach that only works for some classes of CFGs:
 //! - rustc now has a powerful dataflow analysis framework that can handle forwards and backwards
 //!   analyses efficiently.
 //! - Layout optimizations for coroutines have been added to improve code generation for
 //!   async/await, which are very similar in spirit to what this optimization does.
 //!
 //! Also, rustc now has a simple NRVO pass (see `nrvo.rs`), which handles a subset of the cases that
 //! this destination propagation pass handles, proving that similar optimizations can be performed
 //! on MIR.
 //!
 //! ## Pre/Post Optimization
 //!
 //! It is recommended to run `SimplifyCfg` and then `SimplifyLocals` some time after this pass, as
 //! it replaces the eliminated assign statements with `nop`s and leaves unused locals behind.
 //!
 //! [liveness]: https://en.wikipedia.org/wiki/Live_variable_analysis
 //! [attempt 1]: https://github.com/rust-lang/rust/pull/47954
 //! [attempt 2]: https://github.com/rust-lang/rust/pull/71003
 //! [attempt 3]: https://github.com/rust-lang/rust/pull/72632

 use rustc_data_structures::fx::{FxIndexMap, IndexEntry, IndexOccupiedEntry};
 use rustc_index::bit_set::DenseBitSet;
 use rustc_index::interval::SparseIntervalMatrix;
 use rustc_middle::bug;
 use rustc_middle::mir::visit::{MutVisitor, PlaceContext, Visitor};
 use rustc_middle::mir::{
     Body, HasLocalDecls, InlineAsmOperand, Local, LocalKind, Location, Operand, PassWhere, Place,
     Rvalue, Statement, StatementKind, TerminatorKind, dump_mir, traversal,
 };
 use rustc_middle::ty::TyCtxt;
 use rustc_mir_dataflow::Analysis;
 use rustc_mir_dataflow::impls::MaybeLiveLocals;
 use rustc_mir_dataflow::points::{DenseLocationMap, PointIndex, save_as_intervals};
 use tracing::{debug, trace};

 pub(super) struct DestinationPropagation;

 impl<'tcx> crate::MirPass<'tcx> for DestinationPropagation {
     fn is_enabled(&self, sess: &rustc_session::Session) -> bool {
         // For now, only run at MIR opt level 3. Two things need to be changed before this can be
         // turned on by default:
         //  1. Because of the overeager removal of storage statements, this can cause stack space
         //     regressions. This opt is not the place to fix this though, it's a more general
         //     problem in MIR.
         //  2. Despite being an overall perf improvement, this still causes a 30% regression in
         //     keccak. We can temporarily fix this by bounding function size, but in the long term
         //     we should fix this by being smarter about invalidating analysis results.
         sess.mir_opt_level() >= 3
     }

     fn run_pass(&self, tcx: TyCtxt<'tcx>, body: &mut Body<'tcx>) {
         let def_id = body.source.def_id();
         let mut candidates = Candidates::default();
         let mut write_info = WriteInfo::default();
         trace!(func = ?tcx.def_path_str(def_id));

         let borrowed = rustc_mir_dataflow::impls::borrowed_locals(body);

         let live = MaybeLiveLocals.iterate_to_fixpoint(tcx, body, Some("MaybeLiveLocals-DestProp"));
         let points = DenseLocationMap::new(body);
         let mut live = save_as_intervals(&points, body, live.analysis, live.results);

         // In order to avoid having to collect data for every single pair of locals in the body, we
         // do not allow doing more than one merge for places that are derived from the same local at
         // once. To avoid missed opportunities, we instead iterate to a fixed point - we'll refer to
         // each of these iterations as a "round."
         //
         // Reaching a fixed point could in theory take up to `min(l, s)` rounds - however, we do not
         // expect to see MIR like that. To verify this, a test was run against `[rust-lang/regex]` -
         // the average MIR body saw 1.32 full iterations of this loop. The most that was hit were 30
         // for a single function. Only 80/2801 (2.9%) of functions saw at least 5.
         //
         // [rust-lang/regex]:
         //     https://github.com/rust-lang/regex/tree/b5372864e2df6a2f5e543a556a62197f50ca3650
         let mut round_count = 0;
         loop {
             // PERF: Can we do something smarter than recalculating the candidates and liveness
             // results?
             candidates.reset_and_find(body, &borrowed);
             trace!(?candidates);
             dest_prop_mir_dump(tcx, body, &points, &live, round_count);

             FilterInformation::filter_liveness(
                 &mut candidates,
                 &points,
                 &live,
                 &mut write_info,
                 body,
             );

             // Because we only filter once per round, it is unsound to use a local for more than
             // one merge operation within a single round of optimizations. We store here which ones
             // we have already used.
             let mut merged_locals: DenseBitSet<Local> =
                 DenseBitSet::new_empty(body.local_decls.len());

             // This is the set of merges we will apply this round. It is a subset of the candidates.
             let mut merges = FxIndexMap::default();

             for (src, candidates) in candidates.c.iter() {
                 if merged_locals.contains(*src) {
                     continue;
                 }
                 let Some(dest) = candidates.iter().find(|dest| !merged_locals.contains(**dest))
                 else {
                     continue;
                 };

                 // Replace `src` by `dest` everywhere.
                 merges.insert(*src, *dest);
                 merged_locals.insert(*src);
                 merged_locals.insert(*dest);

                 // Update liveness information based on the merge we just performed.
                 // Every location where `src` was live, `dest` will be live.
                 live.union_rows(*src, *dest);
             }
             trace!(merging = ?merges);

             if merges.is_empty() {
                 break;
             }
             round_count += 1;

             apply_merges(body, tcx, merges, merged_locals);
         }

         trace!(round_count);
     }

     fn is_required(&self) -> bool {
         false
     }
 }

 #[derive(Debug, Default)]
 struct Candidates {
     /// The set of candidates we are considering in this optimization.
     ///
     /// We will always merge the key into at most one of its values.
     ///
     /// Whether a place ends up in the key or the value does not correspond to whether it appears as
     /// the lhs or rhs of any assignment. As a matter of fact, the places in here might never appear
     /// in an assignment at all. This happens because if we see an assignment like this:
     ///
     /// ```ignore (syntax-highlighting-only)
     /// _1.0 = _2.0
     /// ```
     ///
     /// We will still report that we would like to merge `_1` and `_2` in an attempt to allow us to
     /// remove that assignment.
     c: FxIndexMap<Local, Vec<Local>>,

     /// A reverse index of the `c` set; if the `c` set contains `a => Place { local: b, proj }`,
     /// then this contains `b => a`.
     // PERF: Possibly these should be `SmallVec`s?
     reverse: FxIndexMap<Local, Vec<Local>>,
 }

 //////////////////////////////////////////////////////////
 // Merging
 //
 // Applies the actual optimization

 fn apply_merges<'tcx>(
     body: &mut Body<'tcx>,
     tcx: TyCtxt<'tcx>,
     merges: FxIndexMap<Local, Local>,
     merged_locals: DenseBitSet<Local>,
 ) {
     let mut merger = Merger { tcx, merges, merged_locals };
     merger.visit_body_preserves_cfg(body);
 }

 struct Merger<'tcx> {
     tcx: TyCtxt<'tcx>,
     merges: FxIndexMap<Local, Local>,
     merged_locals: DenseBitSet<Local>,
 }

 impl<'tcx> MutVisitor<'tcx> for Merger<'tcx> {
     fn tcx(&self) -> TyCtxt<'tcx> {
         self.tcx
     }

     fn visit_local(&mut self, local: &mut Local, _: PlaceContext, _location: Location) {
         if let Some(dest) = self.merges.get(local) {
             *local = *dest;
         }
     }

     fn visit_statement(&mut self, statement: &mut Statement<'tcx>, location: Location) {
         match &statement.kind {
             // FIXME: Don't delete storage statements, but "merge" the storage ranges instead.
             StatementKind::StorageDead(local) | StatementKind::StorageLive(local)
                 if self.merged_locals.contains(*local) =>
             {
                 statement.make_nop();
                 return;
             }
             _ => (),
         };
         self.super_statement(statement, location);
         match &statement.kind {
             StatementKind::Assign(box (dest, rvalue)) => {
                 match rvalue {
                     Rvalue::CopyForDeref(place)
                     | Rvalue::Use(Operand::Copy(place) | Operand::Move(place)) => {
                         // These might've been turned into self-assignments by the replacement
                         // (this includes the original statement we wanted to eliminate).
                         if dest == place {
                             debug!("{:?} turned into self-assignment, deleting", location);
                             statement.make_nop();
                         }
                     }
                     _ => {}
                 }
             }

             _ => {}
         }
     }
 }

 //////////////////////////////////////////////////////////
 // Liveness filtering
 //
 // This section enforces bullet point 2

 struct FilterInformation<'a, 'tcx> {
     body: &'a Body<'tcx>,
     points: &'a DenseLocationMap,
     live: &'a SparseIntervalMatrix<Local, PointIndex>,
     candidates: &'a mut Candidates,
     write_info: &'a mut WriteInfo,
     at: Location,
 }

 // We first implement some utility functions which we will expose removing candidates according to
 // different needs. Throughout the liveness filtering, the `candidates` are only ever accessed
 // through these methods, and not directly.
 impl Candidates {
     /// Collects the candidates for merging.
     ///
     /// This is responsible for enforcing the first and third bullet point.
     fn reset_and_find<'tcx>(&mut self, body: &Body<'tcx>, borrowed: &DenseBitSet<Local>) {
         self.c.clear();
         self.reverse.clear();
         let mut visitor = FindAssignments { body, candidates: &mut self.c, borrowed };
         visitor.visit_body(body);
         // Deduplicate candidates.
         for (_, cands) in self.c.iter_mut() {
             cands.sort();
             cands.dedup();
         }
         // Generate the reverse map.
         for (src, cands) in self.c.iter() {
             for dest in cands.iter().copied() {
                 self.reverse.entry(dest).or_default().push(*src);
             }
         }
     }

     /// Just `Vec::retain`, but the condition is inverted and we add debugging output
     fn vec_filter_candidates(
         src: Local,
         v: &mut Vec<Local>,
         mut f: impl FnMut(Local) -> CandidateFilter,
         at: Location,
     ) {
         v.retain(|dest| {
             let remove = f(*dest);
             if remove == CandidateFilter::Remove {
                 trace!("eliminating {:?} => {:?} due to conflict at {:?}", src, dest, at);
             }
             remove == CandidateFilter::Keep
         });
     }

     /// `vec_filter_candidates` but for an `Entry`
     fn entry_filter_candidates(
         mut entry: IndexOccupiedEntry<'_, Local, Vec<Local>>,
         p: Local,
         f: impl FnMut(Local) -> CandidateFilter,
         at: Location,
     ) {
         let candidates = entry.get_mut();
         Self::vec_filter_candidates(p, candidates, f, at);
         if candidates.len() == 0 {
             // FIXME(#120456) - is `swap_remove` correct?
             entry.swap_remove();
         }
     }

     /// For all candidates `(p, q)` or `(q, p)` removes the candidate if `f(q)` says to do so
     fn filter_candidates_by(
         &mut self,
         p: Local,
         mut f: impl FnMut(Local) -> CandidateFilter,
         at: Location,
     ) {
         // Cover the cases where `p` appears as a `src`
         if let IndexEntry::Occupied(entry) = self.c.entry(p) {
             Self::entry_filter_candidates(entry, p, &mut f, at);
         }
         // And the cases where `p` appears as a `dest`
         let Some(srcs) = self.reverse.get_mut(&p) else {
             return;
         };
         // We use `retain` here to remove the elements from the reverse set if we've removed the
         // matching candidate in the forward set.
         srcs.retain(|src| {
             if f(*src) == CandidateFilter::Keep {
                 return true;
             }
             let IndexEntry::Occupied(entry) = self.c.entry(*src) else {
                 return false;
             };
             Self::entry_filter_candidates(
                 entry,
                 *src,
                 |dest| {
                     if dest == p { CandidateFilter::Remove } else { CandidateFilter::Keep }
                 },
                 at,
             );
             false
         });
     }
 }

 #[derive(Copy, Clone, PartialEq, Eq)]
 enum CandidateFilter {
     Keep,
     Remove,
 }

 impl<'a, 'tcx> FilterInformation<'a, 'tcx> {
     /// Filters the set of candidates to remove those that conflict.
     ///
     /// The steps we take are exactly those that are outlined at the top of the file. For each
     /// statement/terminator, we collect the set of locals that are written to in that
     /// statement/terminator, and then we remove all pairs of candidates that contain one such local
     /// and another one that is live.
     ///
     /// We need to be careful about the ordering of operations within each statement/terminator
     /// here. Many statements might write and read from more than one place, and we need to consider
     /// them all. The strategy for doing this is as follows: We first gather all the places that are
     /// written to within the statement/terminator via `WriteInfo`. Then, we use the liveness
     /// analysis from *before* the statement/terminator (in the control flow sense) to eliminate
     /// candidates - this is because we want to conservatively treat a pair of locals that is both
     /// read and written in the statement/terminator to be conflicting, and the liveness analysis
     /// before the statement/terminator will correctly report locals that are read in the
     /// statement/terminator to be live. We are additionally conservative by treating all written to
     /// locals as also being read from.
     fn filter_liveness(
         candidates: &mut Candidates,
         points: &DenseLocationMap,
         live: &SparseIntervalMatrix<Local, PointIndex>,
         write_info: &mut WriteInfo,
         body: &Body<'tcx>,
     ) {
         let mut this = FilterInformation {
             body,
             points,
             live,
             candidates,
             // We don't actually store anything at this scope, we just keep things here to be able
             // to reuse the allocation.
             write_info,
             // Doesn't matter what we put here, will be overwritten before being used
             at: Location::START,
         };
         this.internal_filter_liveness();
     }

     fn internal_filter_liveness(&mut self) {
         for (block, data) in traversal::preorder(self.body) {
             self.at = Location { block, statement_index: data.statements.len() };
             self.write_info.for_terminator(&data.terminator().kind);
             self.apply_conflicts();

             for (i, statement) in data.statements.iter().enumerate().rev() {
                 self.at = Location { block, statement_index: i };
                 self.write_info.for_statement(&statement.kind, self.body);
                 self.apply_conflicts();
             }
         }
     }

     fn apply_conflicts(&mut self) {
         let writes = &self.write_info.writes;
         for p in writes {
             let other_skip = self.write_info.skip_pair.and_then(|(a, b)| {
                 if a == *p {
                     Some(b)
                 } else if b == *p {
                     Some(a)
                 } else {
                     None
                 }
             });
             let at = self.points.point_from_location(self.at);
             self.candidates.filter_candidates_by(
                 *p,
                 |q| {
                     if Some(q) == other_skip {
                         return CandidateFilter::Keep;
                     }
                     // It is possible that a local may be live for less than the
                     // duration of a statement This happens in the case of function
                     // calls or inline asm. Because of this, we also mark locals as
                     // conflicting when both of them are written to in the same
                     // statement.
                     if self.live.contains(q, at) || writes.contains(&q) {
                         CandidateFilter::Remove
                     } else {
                         CandidateFilter::Keep
                     }
                 },
                 self.at,
             );
         }
     }
 }

 /// Describes where a statement/terminator writes to
 #[derive(Default, Debug)]
 struct WriteInfo {
     writes: Vec<Local>,
     /// If this pair of locals is a candidate pair, completely skip processing it during this
     /// statement. All other candidates are unaffected.
     skip_pair: Option<(Local, Local)>,
 }

 impl WriteInfo {
     fn for_statement<'tcx>(&mut self, statement: &StatementKind<'tcx>, body: &Body<'tcx>) {
         self.reset();
         match statement {
             StatementKind::Assign(box (lhs, rhs)) => {
                 self.add_place(*lhs);
                 match rhs {
                     Rvalue::Use(op) => {
                         self.add_operand(op);
                         self.consider_skipping_for_assign_use(*lhs, op, body);
                     }
                     Rvalue::Repeat(op, _) => {
                         self.add_operand(op);
                     }
                     Rvalue::Cast(_, op, _)
                     | Rvalue::UnaryOp(_, op)
                     | Rvalue::ShallowInitBox(op, _) => {
                         self.add_operand(op);
                     }
                     Rvalue::BinaryOp(_, ops) => {
                         for op in [&ops.0, &ops.1] {
                             self.add_operand(op);
                         }
                     }
                     Rvalue::Aggregate(_, ops) => {
                         for op in ops {
                             self.add_operand(op);
                         }
                     }
                     Rvalue::WrapUnsafeBinder(op, _) => {
                         self.add_operand(op);
                     }
                     Rvalue::ThreadLocalRef(_)
                     | Rvalue::NullaryOp(_, _)
                     | Rvalue::Ref(_, _, _)
                     | Rvalue::RawPtr(_, _)
                     | Rvalue::Len(_)
                     | Rvalue::Discriminant(_)
                     | Rvalue::CopyForDeref(_) => {}
                 }
             }
             // Retags are technically also reads, but reporting them as a write suffices
             StatementKind::SetDiscriminant { place, .. }
             | StatementKind::Deinit(place)
             | StatementKind::Retag(_, place) => {
                 self.add_place(**place);
             }
             StatementKind::Intrinsic(_)
             | StatementKind::ConstEvalCounter
             | StatementKind::Nop
             | StatementKind::Coverage(_)
             | StatementKind::StorageLive(_)
             | StatementKind::StorageDead(_)
             | StatementKind::BackwardIncompatibleDropHint { .. }
             | StatementKind::PlaceMention(_) => {}
             StatementKind::FakeRead(_) | StatementKind::AscribeUserType(_, _) => {
                 bug!("{:?} not found in this MIR phase", statement)
             }
         }
     }

     fn consider_skipping_for_assign_use<'tcx>(
         &mut self,
         lhs: Place<'tcx>,
         rhs: &Operand<'tcx>,
         body: &Body<'tcx>,
     ) {
         let Some(rhs) = rhs.place() else { return };
         if let Some(pair) = places_to_candidate_pair(lhs, rhs, body) {
             self.skip_pair = Some(pair);
         }
     }

     fn for_terminator<'tcx>(&mut self, terminator: &TerminatorKind<'tcx>) {
         self.reset();
         match terminator {
             TerminatorKind::SwitchInt { discr: op, .. }
             | TerminatorKind::Assert { cond: op, .. } => {
                 self.add_operand(op);
             }
             TerminatorKind::Call { destination, func, args, .. } => {
                 self.add_place(*destination);
                 self.add_operand(func);
                 for arg in args {
                     self.add_operand(&arg.node);
                 }
             }
             TerminatorKind::TailCall { func, args, .. } => {
                 self.add_operand(func);
                 for arg in args {
                     self.add_operand(&arg.node);
                 }
             }
             TerminatorKind::InlineAsm { operands, .. } => {
                 for asm_operand in operands {
                     match asm_operand {
                         InlineAsmOperand::In { value, .. } => {
                             self.add_operand(value);
                         }
                         InlineAsmOperand::Out { place, .. } => {
                             if let Some(place) = place {
                                 self.add_place(*place);
                             }
                         }
                         // Note that the `late` field in `InOut` is about whether the registers used
                         // for these things overlap, and is of absolutely no interest to us.
                         InlineAsmOperand::InOut { in_value, out_place, .. } => {
                             if let Some(place) = out_place {
                                 self.add_place(*place);
                             }
                             self.add_operand(in_value);
                         }
                         InlineAsmOperand::Const { .. }
                         | InlineAsmOperand::SymFn { .. }
                         | InlineAsmOperand::SymStatic { .. }
                         | InlineAsmOperand::Label { .. } => {}
                     }
                 }
             }
             TerminatorKind::Goto { .. }
             | TerminatorKind::UnwindResume
             | TerminatorKind::UnwindTerminate(_)
             | TerminatorKind::Return
             | TerminatorKind::Unreachable { .. } => (),
             TerminatorKind::Drop { .. } => {
                 // `Drop`s create a `&mut` and so are not considered
             }
             TerminatorKind::Yield { .. }
             | TerminatorKind::CoroutineDrop
             | TerminatorKind::FalseEdge { .. }
             | TerminatorKind::FalseUnwind { .. } => {
                 bug!("{:?} not found in this MIR phase", terminator)
             }
         }
     }

     fn add_place(&mut self, place: Place<'_>) {
         self.writes.push(place.local);
     }

     fn add_operand<'tcx>(&mut self, op: &Operand<'tcx>) {
         match op {
             // FIXME(JakobDegen): In a previous version, the `Move` case was incorrectly treated as
             // being a read only. This was unsound, however we cannot add a regression test because
             // it is not possible to set this off with current MIR. Once we have that ability, a
             // regression test should be added.
             Operand::Move(p) => self.add_place(*p),
             Operand::Copy(_) | Operand::Constant(_) => (),
         }
     }

     fn reset(&mut self) {
         self.writes.clear();
         self.skip_pair = None;
     }
 }

 /////////////////////////////////////////////////////
 // Candidate accumulation

 /// If the pair of places is being considered for merging, returns the candidate which would be
 /// merged in order to accomplish this.
 ///
 /// The contract here is in one direction - there is a guarantee that merging the locals that are
 /// outputted by this function would result in an assignment between the inputs becoming a
 /// self-assignment. However, there is no guarantee that the returned pair is actually suitable for
 /// merging - candidate collection must still check this independently.
 ///
 /// This output is unique for each unordered pair of input places.
 fn places_to_candidate_pair<'tcx>(
     a: Place<'tcx>,
     b: Place<'tcx>,
     body: &Body<'tcx>,
 ) -> Option<(Local, Local)> {
     let (mut a, mut b) = if a.projection.len() == 0 && b.projection.len() == 0 {
         (a.local, b.local)
     } else {
         return None;
     };

     // By sorting, we make sure we're input order independent
     if a > b {
         std::mem::swap(&mut a, &mut b);
     }

     // We could now return `(a, b)`, but then we miss some candidates in the case where `a` can't be
     // used as a `src`.
     if is_local_required(a, body) {
         std::mem::swap(&mut a, &mut b);
     }
     // We could check `is_local_required` again here, but there's no need - after all, we make no
     // promise that the candidate pair is actually valid
     Some((a, b))
 }

 struct FindAssignments<'a, 'tcx> {
     body: &'a Body<'tcx>,
     candidates: &'a mut FxIndexMap<Local, Vec<Local>>,
     borrowed: &'a DenseBitSet<Local>,
 }

 impl<'tcx> Visitor<'tcx> for FindAssignments<'_, 'tcx> {
     fn visit_statement(&mut self, statement: &Statement<'tcx>, _: Location) {
         if let StatementKind::Assign(box (
             lhs,
             Rvalue::CopyForDeref(rhs) | Rvalue::Use(Operand::Copy(rhs) | Operand::Move(rhs)),
         )) = &statement.kind
         {
             let Some((src, dest)) = places_to_candidate_pair(*lhs, *rhs, self.body) else {
                 return;
             };

             // As described at the top of the file, we do not go near things that have
             // their address taken.
             if self.borrowed.contains(src) || self.borrowed.contains(dest) {
                 return;
             }

             // As described at the top of this file, we do not touch locals which have
             // different types.
             let src_ty = self.body.local_decls()[src].ty;
             let dest_ty = self.body.local_decls()[dest].ty;
             if src_ty != dest_ty {
                 // FIXME(#112651): This can be removed afterwards. Also update the module description.
                 trace!("skipped `{src:?} = {dest:?}` due to subtyping: {src_ty} != {dest_ty}");
                 return;
             }

             // Also, we need to make sure that MIR actually allows the `src` to be removed
             if is_local_required(src, self.body) {
                 return;
             }

             // We may insert duplicates here, but that's fine
             self.candidates.entry(src).or_default().push(dest);
         }
     }
 }

 /// Some locals are part of the function's interface and can not be removed.
 ///
 /// Note that these locals *can* still be merged with non-required locals by removing that other
 /// local.
 fn is_local_required(local: Local, body: &Body<'_>) -> bool {
     match body.local_kind(local) {
         LocalKind::Arg | LocalKind::ReturnPointer => true,
         LocalKind::Temp => false,
     }
 }

 /////////////////////////////////////////////////////////
 // MIR Dump

 fn dest_prop_mir_dump<'tcx>(
     tcx: TyCtxt<'tcx>,
     body: &Body<'tcx>,
     points: &DenseLocationMap,
     live: &SparseIntervalMatrix<Local, PointIndex>,
     round: usize,
 ) {
     let locals_live_at = |location| {
         let location = points.point_from_location(location);
         live.rows().filter(|&r| live.contains(r, location)).collect::<Vec<_>>()
     };
     dump_mir(tcx, false, "DestinationPropagation-dataflow", &round, body, |pass_where, w| {
         if let PassWhere::BeforeLocation(loc) = pass_where {
             writeln!(w, "        // live: {:?}", locals_live_at(loc))?;
         }

         Ok(())
     });
 }