Make UNALIGNED_LOAD16/32 on ARMv7 go through an explicitly unaligned struct,
to avoid the compiler coalescing multiple loads into a single load instruction
(which only work for aligned accesses).

A typical example where GCC would coalesce:

  uint8* p = ...;
  uint32 a = UNALIGNED_LOAD32(p);
  uint32 b = UNALIGNED_LOAD32(p + 4);
  uint32 c = a | b;
