tree c5dc711af74746cfd3f0240c96bb989bb592881d
parent 55924d11095df25ab25c405fadfe93d0a46f82eb
author wmi <wmi@google.com> 1503354819 -0700
committer Victor Costan <pwnall@chromium.org> 1503618852 -0700

Add a loop alignment directive to work around a performance regression.

We found LLVM upstream change at rL310792 degraded zippy benchmark by
~3%. Performance analysis showed the regression was caused by some
side-effect. The incidental loop alignment change (from 32 bytes to 16
bytes) led to increase of branch miss prediction and caused the
regression. The regression was reproducible on several intel
micro-architectures, like sandybridge, haswell and skylake. Sadly we
still don't have good understanding about the internal of intel branch
predictor and cannot explain how the branch miss prediction increases
when the loop alignment changes, so we cannot make a real fix here. The
workaround solution in the patch is to add a directive, align the hot
loop to 32 bytes, which can restore the performance. This is in order to
unblock the flip of default compiler to LLVM.
