[InstCombine] Improve simplify demanded bits worklist management

This fixes a small mistake from D72944: The worklist add should
happen before assigning the new operand, not after.

In case an actual replacement happens, the old operand needs to
be added for DCE. If no actual replacement happens, then old/new
are the same, so it doesn't matter.

This drops one iteration from the annotated test case.
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 25a73c8..a017abf 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -87,9 +87,9 @@
   Value *NewVal = SimplifyDemandedUseBits(U.get(), DemandedMask, Known,
                                           Depth, I);
   if (!NewVal) return false;
-  U = NewVal;
-  // Add the simplified instruction back to the worklist.
+  // Add the old operand back to the worklist.
   Worklist.addValue(U.get());
+  U = NewVal;
   return true;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/2010-11-01-lshr-mask.ll b/llvm/test/Transforms/InstCombine/2010-11-01-lshr-mask.ll
index 7f28260..ed32bb1 100644
--- a/llvm/test/Transforms/InstCombine/2010-11-01-lshr-mask.ll
+++ b/llvm/test/Transforms/InstCombine/2010-11-01-lshr-mask.ll
@@ -1,4 +1,4 @@
-; RUN: opt -instcombine -S < %s | FileCheck %s
+; RUN: opt -instcombine -instcombine-infinite-loop-threshold=3 -S < %s | FileCheck %s
 
 ; <rdar://problem/8606771>
 define i32 @main(i32 %argc) {