tree: 4d45932b3d4dc03cb4b7e2c31cf126fee559994e [path history] [tgz]
  1. measurereadlatency.h
  2. measurereadlatency_aarch64.S
  3. measurereadlatency_ppc64le.S
  4. measurereadlatency_x86.asm
  5. measurereadlatency_x86.S
  6. measurereadlatency_x86_64.asm
  7. measurereadlatency_x86_64.S

Assembly primitives

This folder holds functions that, for various reasons, we could only implement in assembly.


When should something be implemented here rather than in C/C++, maybe with inline assembly?

  • Need precise control over the instructions executed, e.g. choosing exactly when a parameter is read from the stack into a register.
  • Need implementation to stay instruction-for-instruction the same, e.g. comparing cycle counts for reads across identical machines using different compilers.
  • MSVC x64 doesn‘t support inline assembly and we’ve seen MSVC x86 reorder instructions in violation of semantics, e.g. reordering LFENCE and RDTSC relative to each other (



This is the core primitive for cache timing side-channel attacks. And it should be pretty easy! Conceptually it's just four steps:

  1. Read timestamp T_before from some platform-specific timestamp counter
  2. Read from memory
  3. Read timestamp T_after
  4. Return T_after - T_before

The hard part is ensuring that the memory read actually happens entirely between (1) and (3), and that it‘s the only thing that happens between those two points. Otherwise we’re measuring more (or less) than just the memory read.

Some ways things could go wrong due to out-of-order execution:

  • The memory read happens before we read T_before
  • The memory read isn't finished before we read T_after
  • The read of T_before migrates up ahead of prior instructions
  • Some previously-issued memory operation is in-flight and completes between the reads of T_before and T_after, e.g. a cache flush or unrelated write.

We use a lot of serializing instructions and memory fences to avoid these. See the per-platform implementations for more details.

It‘s also possible that we get unlucky and MeasureReadLatency gets preempted by the OS while performing the timed read, which might mean we return a very high latency value for a read that hit L1 cache. There’s not a lot we can do about that; we leave the job of repeating the measurement and dealing with outliers as an exercise for the caller.

Platform and toolchain quirks

Decorators (underscore prefix)

Some assembly files define the symbol Function. Others define _Function. This is an artifact of different platform ABIs: some ABIs add a leading underscore to C symbols so they exist in a separate namespace from (undecorated) pure-assembler symbols.


  • Windows (Portable Executable or PE format) decorates __cdecl C functions with a leading underscore, except on 64-bit platforms. (ref)
  • macOS (Mach-O format) decorates C functions with a leading underscore. (ref, see pg. 18, “Searching for Symbols”)
  • Linux (ELF) does not decorate C function names: “External C symbols have the same names in C and object files' symbol tables” (ref, see pg. 4-22)

Assemblers and syntax

We need to implement the same function across two toolchains (GCC/Clang and MSVC) and four architectures (x86, x86_64, aarch64, and ppc64le). Thankfully, the matrix is sparse: we only need to support MSVC for x86 and x86_64. (At least for now.)

Microsoft Macro Assembler (MASM), the assembler in the MSVC toolchain, is not an easy tool to work with. Unlike the GCC/Clang assemblers, it:

  • Can only use Intel syntax on x86/x86_64
  • Can't preprocess code before assembling it
  • Introduces comments with COMMENT or ; instead of /* */ or //

To make it easier to share as much of the implementations as possible, we start off writing the GCC/Clang implementations in .S (assembly-with-preprocessing) files with C-style comments. We use the .intel_syntax noprefix directive so that we're writing instructions in a format MASM will be able to read. Finally, we run that code through gcc -EP to remove comments and formatting, and put that in the .asm file for MASM.

If we add more assembly implementations, we may end up automating this process more or moving to NASM.

Useful references