MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/README - third_party/llvm-test-suite - Git at Google

 CCKernels: A Benchmark for HACC's Particle Force Kernels

 The Hardware/Hybrid Accelerated Cosmology Code (HACC), a cosmology N-body-code
 framework, is designed to run efficiently on diverse computing architectures
 and to scale to millions of cores and beyond. The gravitational force is the
 only significant force between particles at cosmological scales, and, in HACC,
 this force is divided into two components: a long-range component and a
 short-range component. The long-range component is handled using a distributed
 grid-based solver, and the short-range component is by more-direct
 particle-particle computations. On many systems, a tree-based multipole
 approximation is used to further reduce the computational complexity of the
 short-range force. The inner-most computation is a direct N^2 particle-particle
 force calculation of the short-range part of the gravitational force. It is this
 inner-most calculation that consumes most of the simulation time, is
 computationally bound, and is what is represented by this benchmark.

 Because this inner-most force calculation is algorithmically isolated from the
 overall scale of the problem, the parameters don't need to be adjusted to
 represent the workload on different machine scales (e.g. petascale or
 exascale).

 For more information on HACC, see:

 Salman Habib, et al. HACC: Simulating Sky Surveys on State-of-the-Art
 Supercomputing Architectures. New Astronomy Volume 42, January 2016, pp. 49-65.
 http://doi.org/10.1016/j.newast.2015.06.003
 https://arxiv.org/abs/1410.2805

 The benchmark can be compiled using cmake (or make directly using
 Makefile.simple) and then run like this:

 $ ./HACCKernels
 Maximum OpenMP Threads: 1
 Iterations: 2000
 Gravity Short-Range-Force Kernel (4th Order): 26307.2 -122.385 -1369.32: 4.45269 s
 Gravity Short-Range-Force Kernel (5th Order): 26297.5 -123.056 -1368.67: 4.51347 s
 Gravity Short-Range-Force Kernel (6th Order): 26297.6 -123.225 -1368.66: 4.8256 s

 The accumulated acceleration in each direction for all particles in the last
 iteration, which is a function of the total number of iterations, is printed as
 a diagnostic. It should be similar for all polynomial kernel orders.

 If you'd like the benchmark only to display deterministic output (i.e.
 omitting information on the number of threads, timing, and the like), then
 define the preprocessor symbol VERIFICATION_OUTPUT_ONLY when compiling.
 You can enable this option when configuring by passing
 -DVERIFICATION_OUTPUT_ONLY=ON to cmake.

 Compared to the older HACCmk procurement benchmark
 (https://asc.llnl.gov/CORAL-benchmarks/#haccmk), this benchmark:

  * More closely matches the parallelization scheme used by the production code.
  * Uses a more-realistic distribution of interaction-list lengths and
    out-of-bounds particles.
  * Includes 4th-, 5th-, and 6th-order kernels.

 For more information, contact: Hal Finkel <hfinkel@anl.gov>
	CCKernels: A Benchmark for HACC's Particle Force Kernels

	The Hardware/Hybrid Accelerated Cosmology Code (HACC), a cosmology N-body-code
	framework, is designed to run efficiently on diverse computing architectures
	and to scale to millions of cores and beyond. The gravitational force is the
	only significant force between particles at cosmological scales, and, in HACC,
	this force is divided into two components: a long-range component and a
	short-range component. The long-range component is handled using a distributed
	grid-based solver, and the short-range component is by more-direct
	particle-particle computations. On many systems, a tree-based multipole
	approximation is used to further reduce the computational complexity of the
	short-range force. The inner-most computation is a direct N^2 particle-particle
	force calculation of the short-range part of the gravitational force. It is this
	inner-most calculation that consumes most of the simulation time, is
	computationally bound, and is what is represented by this benchmark.

	Because this inner-most force calculation is algorithmically isolated from the
	overall scale of the problem, the parameters don't need to be adjusted to
	represent the workload on different machine scales (e.g. petascale or
	exascale).

	For more information on HACC, see:

	Salman Habib, et al. HACC: Simulating Sky Surveys on State-of-the-Art
	Supercomputing Architectures. New Astronomy Volume 42, January 2016, pp. 49-65.
	http://doi.org/10.1016/j.newast.2015.06.003
	https://arxiv.org/abs/1410.2805

	The benchmark can be compiled using cmake (or make directly using
	Makefile.simple) and then run like this:

	$ ./HACCKernels
	Maximum OpenMP Threads: 1
	Iterations: 2000
	Gravity Short-Range-Force Kernel (4th Order): 26307.2 -122.385 -1369.32: 4.45269 s
	Gravity Short-Range-Force Kernel (5th Order): 26297.5 -123.056 -1368.67: 4.51347 s
	Gravity Short-Range-Force Kernel (6th Order): 26297.6 -123.225 -1368.66: 4.8256 s

	The accumulated acceleration in each direction for all particles in the last
	iteration, which is a function of the total number of iterations, is printed as
	a diagnostic. It should be similar for all polynomial kernel orders.

	If you'd like the benchmark only to display deterministic output (i.e.
	omitting information on the number of threads, timing, and the like), then
	define the preprocessor symbol VERIFICATION_OUTPUT_ONLY when compiling.
	You can enable this option when configuring by passing
	-DVERIFICATION_OUTPUT_ONLY=ON to cmake.

	Compared to the older HACCmk procurement benchmark
	(https://asc.llnl.gov/CORAL-benchmarks/#haccmk), this benchmark:

	* More closely matches the parallelization scheme used by the production code.
	* Uses a more-realistic distribution of interaction-list lengths and
	out-of-bounds particles.
	* Includes 4th-, 5th-, and 6th-order kernels.

	For more information, contact: Hal Finkel <hfinkel@anl.gov>