Add a loop alignment directive to work around a performance regression. We found LLVM upstream change at rL310792 degraded zippy benchmark by ~3%. Performance analysis showed the regression was caused by some side-effect. The incidental loop alignment change (from 32 bytes to 16 bytes) led to increase of branch miss prediction and caused the regression. The regression was reproducible on several intel micro-architectures, like sandybridge, haswell and skylake. Sadly we still don't have good understanding about the internal of intel branch predictor and cannot explain how the branch miss prediction increases when the loop alignment changes, so we cannot make a real fix here. The workaround solution in the patch is to add a directive, align the hot loop to 32 bytes, which can restore the performance. This is in order to unblock the flip of default compiler to LLVM.

commit: 824e6718b5b5a50d32a89124853da0a11828b25c [log] [tgz]
author: wmi <wmi@google.com> Mon Aug 21 15:33:39 2017 -0700
committer: Victor Costan <pwnall@chromium.org> Thu Aug 24 16:54:12 2017 -0700
tree: c5dc711af74746cfd3f0240c96bb989bb592881d
parent: 55924d11095df25ab25c405fadfe93d0a46f82eb [diff]
diff --git a/snappy.cc b/snappy.cc
index 23f948f..fd519e5 100644
--- a/snappy.cc
+++ b/snappy.cc

@@ -685,6 +685,13 @@
         }
 
     MAYBE_REFILL();
+    // Add loop alignment directive. Without this directive, we observed
+    // significant performance degradation on several intel architectures
+    // in snappy benchmark built with LLVM. The degradation was caused by
+    // increased branch miss prediction.
+#if defined(__clang__) && defined(__x86_64__)
+    asm volatile (".p2align 5");
+#endif
     for ( ;; ) {
       const unsigned char c = *(reinterpret_cast<const unsigned char*>(ip++));
commit	824e6718b5b5a50d32a89124853da0a11828b25c	[log] [tgz]
author	wmi <wmi@google.com>	Mon Aug 21 15:33:39 2017 -0700
committer	Victor Costan <pwnall@chromium.org>	Thu Aug 24 16:54:12 2017 -0700
tree	c5dc711af74746cfd3f0240c96bb989bb592881d
parent	55924d11095df25ab25c405fadfe93d0a46f82eb [diff]