Auto merge of #62072 - eddyb:generator-memory-index, r=tmandry

rustc: correctly transform memory_index mappings for generators.

Fixes #61793, closes #62011 (previous attempt at fixing #61793).

During #60187, I made the mistake of suggesting that the (re-)computation of `memory_index` in `ty::layout`, after generator-specific logic split/recombined fields, be done off of the `offsets` of those fields (which needed to be computed anyway), as opposed to the `memory_index`.

`memory_index` maps each field to its in-memory order index, which ranges over the same `0..n` values as the fields themselves, making it a bijective mapping, and more specifically a permutation (indeed, it's the permutation resulting from field reordering optimizations).

Each field has an unique "memory index", meaning a sort based on them, even an unstable one, will not put them in the wrong order. But offsets don't have that property, because of ZSTs (which do not increase the offset), so sorting based on the offset of fields alone can (and did) result in wrong orders.

Instead of going back to sorting based on (slices/subsets of) `memory_index`, or special-casing ZSTs to make sorting based on offsets produce the right results (presumably), as #62011 does, I opted to drop sorting altogether and focus on `O(n)` operations involving *permutations*:
* a permutation is easily inverted (see the `invert_mapping` `fn`)
  * an `inverse_memory_index` was already employed in other parts of the `ty::layout` code (that is, a mapping from memory order to field indices)
  * inverting twice produces the original permutation, so you can invert, modify, and invert again, if it's easier to modify the inverse mapping than the direct one
* you can modify/remove elements in a permutation, as long as the result remains dense (i.e. using every integer in `0..len`, without gaps)
  * for splitting a `0..n` permutation into disjoint `0..x` and `x..n` ranges, you can pick the elements based on a `i < x` / `i >= x` predicate, and for the latter, also subtract `x` to compact the range to `0..n-x`
  * in the general case, for taking an arbitrary subset of the permutation, you need a renumbering from that subset to a dense `0..subset.len()` - but notably, this is still `O(n)`!
* you can merge permutations, as long as the result remains disjoint (i.e. each element is unique)
  * for concatenating two `0..n` and `0..m` permutations, you can renumber the elements in the latter to `n..n+m`
* some of these operations can be combined, and an inverse mapping (be it a permutation or not) can still be used instead of a forward one by changing the "domain" of the loop performing the operation

I wish I had a nicer / more mathematical description of the recombinations involved, but my focus was to fix the bug (in a way which preserves information more directly than sorting would), so I may have missed potential changes in the surrounding generator layout code, that would make this all more straight-forward.

r? @tmandry
diff --git a/src/librustc/ty/layout.rs b/src/librustc/ty/layout.rs
index 9ce1d2e..4af26e1 100644
--- a/src/librustc/ty/layout.rs
+++ b/src/librustc/ty/layout.rs
@@ -226,6 +226,19 @@
     Prefixed(Size, Align),
 }
 
+// Invert a bijective mapping, i.e. `invert(map)[y] = x` if `map[x] = y`.
+// This is used to go between `memory_index` (source field order to memory order)
+// and `inverse_memory_index` (memory order to source field order).
+// See also `FieldPlacement::Arbitrary::memory_index` for more details.
+// FIXME(eddyb) build a better abstraction for permutations, if possible.
+fn invert_mapping(map: &[u32]) -> Vec<u32> {
+    let mut inverse = vec![0; map.len()];
+    for i in 0..map.len() {
+        inverse[map[i] as usize] = i as u32;
+    }
+    inverse
+}
+
 impl<'tcx> LayoutCx<'tcx, TyCtxt<'tcx>> {
     fn scalar_pair(&self, a: Scalar, b: Scalar) -> LayoutDetails {
         let dl = self.data_layout();
@@ -303,7 +316,9 @@
         // That is, if field 5 has offset 0, the first element of inverse_memory_index is 5.
         // We now write field offsets to the corresponding offset slot;
         // field 5 with offset 0 puts 0 in offsets[5].
-        // At the bottom of this function, we use inverse_memory_index to produce memory_index.
+        // At the bottom of this function, we invert `inverse_memory_index` to
+        // produce `memory_index` (see `invert_mapping`).
+
 
         let mut offset = Size::ZERO;
 
@@ -360,13 +375,9 @@
         // Field 5 would be the first element, so memory_index is i:
         // Note: if we didn't optimize, it's already right.
 
-        let mut memory_index;
+        let memory_index;
         if optimize {
-            memory_index = vec![0; inverse_memory_index.len()];
-
-            for i in 0..inverse_memory_index.len() {
-                memory_index[inverse_memory_index[i] as usize]  = i as u32;
-            }
+            memory_index = invert_mapping(&inverse_memory_index);
         } else {
             memory_index = inverse_memory_index;
         }
@@ -1311,18 +1322,7 @@
     ) -> Result<&'tcx LayoutDetails, LayoutError<'tcx>> {
         use SavedLocalEligibility::*;
         let tcx = self.tcx;
-        let recompute_memory_index = |offsets: &[Size]| -> Vec<u32> {
-            debug!("recompute_memory_index({:?})", offsets);
-            let mut inverse_index = (0..offsets.len() as u32).collect::<Vec<_>>();
-            inverse_index.sort_unstable_by_key(|i| offsets[*i as usize]);
 
-            let mut index = vec![0; offsets.len()];
-            for i in 0..index.len() {
-                index[inverse_index[i] as usize] = i as u32;
-            }
-            debug!("recompute_memory_index() => {:?}", index);
-            index
-        };
         let subst_field = |ty: Ty<'tcx>| { ty.subst(tcx, substs.substs) };
 
         let info = tcx.generator_layout(def_id);
@@ -1349,14 +1349,34 @@
         // get included in each variant that requested them in
         // GeneratorLayout.
         debug!("prefix = {:#?}", prefix);
-        let (outer_fields, promoted_offsets) = match prefix.fields {
-            FieldPlacement::Arbitrary { mut offsets, .. } => {
-                let offsets_b = offsets.split_off(discr_index + 1);
+        let (outer_fields, promoted_offsets, promoted_memory_index) = match prefix.fields {
+            FieldPlacement::Arbitrary { mut offsets, memory_index } => {
+                let mut inverse_memory_index = invert_mapping(&memory_index);
+
+                // "a" (`0..b_start`) and "b" (`b_start..`) correspond to
+                // "outer" and "promoted" fields respectively.
+                let b_start = (discr_index + 1) as u32;
+                let offsets_b = offsets.split_off(b_start as usize);
                 let offsets_a = offsets;
 
-                let memory_index = recompute_memory_index(&offsets_a);
-                let outer_fields = FieldPlacement::Arbitrary { offsets: offsets_a, memory_index };
-                (outer_fields, offsets_b)
+                // Disentangle the "a" and "b" components of `inverse_memory_index`
+                // by preserving the order but keeping only one disjoint "half" each.
+                // FIXME(eddyb) build a better abstraction for permutations, if possible.
+                let inverse_memory_index_b: Vec<_> =
+                    inverse_memory_index.iter().filter_map(|&i| i.checked_sub(b_start)).collect();
+                inverse_memory_index.retain(|&i| i < b_start);
+                let inverse_memory_index_a = inverse_memory_index;
+
+                // Since `inverse_memory_index_{a,b}` each only refer to their
+                // respective fields, they can be safely inverted
+                let memory_index_a = invert_mapping(&inverse_memory_index_a);
+                let memory_index_b = invert_mapping(&inverse_memory_index_b);
+
+                let outer_fields = FieldPlacement::Arbitrary {
+                    offsets: offsets_a,
+                    memory_index: memory_index_a,
+                };
+                (outer_fields, offsets_b, memory_index_b)
             }
             _ => bug!(),
         };
@@ -1386,30 +1406,51 @@
                 StructKind::Prefixed(prefix_size, prefix_align.abi))?;
             variant.variants = Variants::Single { index };
 
-            let offsets = match variant.fields {
-                FieldPlacement::Arbitrary { offsets, .. } => offsets,
+            let (offsets, memory_index) = match variant.fields {
+                FieldPlacement::Arbitrary { offsets, memory_index } => {
+                    (offsets, memory_index)
+                }
                 _ => bug!(),
             };
 
             // Now, stitch the promoted and variant-only fields back together in
             // the order they are mentioned by our GeneratorLayout.
-            let mut next_variant_field = 0;
-            let mut combined_offsets = Vec::new();
-            for local in variant_fields.iter() {
-                match assignments[*local] {
+            // Because we only use some subset (that can differ between variants)
+            // of the promoted fields, we can't just pick those elements of the
+            // `promoted_memory_index` (as we'd end up with gaps).
+            // So instead, we build an "inverse memory_index", as if all of the
+            // promoted fields were being used, but leave the elements not in the
+            // subset as `INVALID_FIELD_IDX`, which we can filter out later to
+            // obtain a valid (bijective) mapping.
+            const INVALID_FIELD_IDX: u32 = !0;
+            let mut combined_inverse_memory_index =
+                vec![INVALID_FIELD_IDX; promoted_memory_index.len() + memory_index.len()];
+            let mut offsets_and_memory_index = offsets.into_iter().zip(memory_index);
+            let combined_offsets = variant_fields.iter().enumerate().map(|(i, local)| {
+                let (offset, memory_index) = match assignments[*local] {
                     Unassigned => bug!(),
                     Assigned(_) => {
-                        combined_offsets.push(offsets[next_variant_field]);
-                        next_variant_field += 1;
+                        let (offset, memory_index) = offsets_and_memory_index.next().unwrap();
+                        (offset, promoted_memory_index.len() as u32 + memory_index)
                     }
                     Ineligible(field_idx) => {
                         let field_idx = field_idx.unwrap() as usize;
-                        combined_offsets.push(promoted_offsets[field_idx]);
+                        (promoted_offsets[field_idx], promoted_memory_index[field_idx])
                     }
-                }
-            }
-            let memory_index = recompute_memory_index(&combined_offsets);
-            variant.fields = FieldPlacement::Arbitrary { offsets: combined_offsets, memory_index };
+                };
+                combined_inverse_memory_index[memory_index as usize] = i as u32;
+                offset
+            }).collect();
+
+            // Remove the unused slots and invert the mapping to obtain the
+            // combined `memory_index` (also see previous comment).
+            combined_inverse_memory_index.retain(|&i| i != INVALID_FIELD_IDX);
+            let combined_memory_index = invert_mapping(&combined_inverse_memory_index);
+
+            variant.fields = FieldPlacement::Arbitrary {
+                offsets: combined_offsets,
+                memory_index: combined_memory_index,
+            };
 
             size = size.max(variant.size);
             align = align.max(variant.align);
diff --git a/src/librustc_target/abi/mod.rs b/src/librustc_target/abi/mod.rs
index b7ad5d8..55cb179 100644
--- a/src/librustc_target/abi/mod.rs
+++ b/src/librustc_target/abi/mod.rs
@@ -699,7 +699,16 @@
         offsets: Vec<Size>,
 
         /// Maps source order field indices to memory order indices,
-        /// depending how fields were permuted.
+        /// depending on how the fields were reordered (if at all).
+        /// This is a permutation, with both the source order and the
+        /// memory order using the same (0..n) index ranges.
+        ///
+        /// Note that during computation of `memory_index`, sometimes
+        /// it is easier to operate on the inverse mapping (that is,
+        /// from memory order to source order), and that is usually
+        /// named `inverse_memory_index`.
+        ///
+        // FIXME(eddyb) build a better abstraction for permutations, if possible.
         // FIXME(camlorn) also consider small vector  optimization here.
         memory_index: Vec<u32>
     }
diff --git a/src/test/ui/async-await/issue-61793.rs b/src/test/ui/async-await/issue-61793.rs
new file mode 100644
index 0000000..bccdf01
--- /dev/null
+++ b/src/test/ui/async-await/issue-61793.rs
@@ -0,0 +1,19 @@
+// This testcase used to ICE in codegen due to inconsistent field reordering
+// in the generator state, claiming a ZST field was after a non-ZST field,
+// while those two fields were at the same offset (which is impossible).
+// That is, memory ordering of `(X, ())`, but offsets of `((), X)`.
+
+// compile-pass
+// edition:2018
+
+#![feature(async_await)]
+#![allow(unused)]
+
+async fn foo<F>(_: &(), _: F) {}
+
+fn main() {
+    foo(&(), || {});
+    async {
+        foo(&(), || {}).await;
+    };
+}