| CCKernels: A Benchmark for HACC's Particle Force Kernels |
| |
| The Hardware/Hybrid Accelerated Cosmology Code (HACC), a cosmology N-body-code |
| framework, is designed to run efficiently on diverse computing architectures |
| and to scale to millions of cores and beyond. The gravitational force is the |
| only significant force between particles at cosmological scales, and, in HACC, |
| this force is divided into two components: a long-range component and a |
| short-range component. The long-range component is handled using a distributed |
| grid-based solver, and the short-range component is by more-direct |
| particle-particle computations. On many systems, a tree-based multipole |
| approximation is used to further reduce the computational complexity of the |
| short-range force. The inner-most computation is a direct N^2 particle-particle |
| force calculation of the short-range part of the gravitational force. It is this |
| inner-most calculation that consumes most of the simulation time, is |
| computationally bound, and is what is represented by this benchmark. |
| |
| Because this inner-most force calculation is algorithmically isolated from the |
| overall scale of the problem, the parameters don't need to be adjusted to |
| represent the workload on different machine scales (e.g. petascale or |
| exascale). |
| |
| For more information on HACC, see: |
| |
| Salman Habib, et al. HACC: Simulating Sky Surveys on State-of-the-Art |
| Supercomputing Architectures. New Astronomy Volume 42, January 2016, pp. 49-65. |
| http://doi.org/10.1016/j.newast.2015.06.003 |
| https://arxiv.org/abs/1410.2805 |
| |
| The benchmark can be compiled using cmake (or make directly using |
| Makefile.simple) and then run like this: |
| |
| $ ./HACCKernels |
| Maximum OpenMP Threads: 1 |
| Iterations: 2000 |
| Gravity Short-Range-Force Kernel (4th Order): 26307.2 -122.385 -1369.32: 4.45269 s |
| Gravity Short-Range-Force Kernel (5th Order): 26297.5 -123.056 -1368.67: 4.51347 s |
| Gravity Short-Range-Force Kernel (6th Order): 26297.6 -123.225 -1368.66: 4.8256 s |
| |
| The accumulated acceleration in each direction for all particles in the last |
| iteration, which is a function of the total number of iterations, is printed as |
| a diagnostic. It should be similar for all polynomial kernel orders. |
| |
| If you'd like the benchmark only to display deterministic output (i.e. |
| omitting information on the number of threads, timing, and the like), then |
| define the preprocessor symbol VERIFICATION_OUTPUT_ONLY when compiling. |
| You can enable this option when configuring by passing |
| -DVERIFICATION_OUTPUT_ONLY=ON to cmake. |
| |
| Compared to the older HACCmk procurement benchmark |
| (https://asc.llnl.gov/CORAL-benchmarks/#haccmk), this benchmark: |
| |
| * More closely matches the parallelization scheme used by the production code. |
| * Uses a more-realistic distribution of interaction-list lengths and |
| out-of-bounds particles. |
| * Includes 4th-, 5th-, and 6th-order kernels. |
| |
| For more information, contact: Hal Finkel <hfinkel@anl.gov> |
| |
| |