MultiSource/Benchmarks/ASCI_Purple/SMG2000/docs/smg2000.readme - third_party/llvm-test-suite - Git at Google

 %==========================================================================
 %==========================================================================

 Code Description

 A. General description:

 SMG2000 is a parallel semicoarsening multigrid solver for the linear
 systems arising from finite difference, finite volume, or finite
 element discretizations of the diffusion equation,

   \grad \cdot ( D \grad u ) + \sigma u = f

 on logically rectangular grids.  The code solves both 2D and 3D
 problems with discretization stencils of up to 9-point in 2D and up to
 27-point in 3D.  See the following paper for details on the algorithm
 and its parallel implementation/performance:

   P. N. Brown, R. D. Falgout, and J. E. Jones,
     "Semicoarsening multigrid on distributed memory machines",
     SIAM Journal on Scientific Computing, 21 (2000), pp. 1823-1834.
     Also available as LLNL technical report UCRL-JC-130720.

 The driver provided with SMG2000 builds linear systems for the special
 case of the above equation,

   - cx u_xx - cy u_yy - cz u_zz = (1/h)^2 ,         (in 3D)
   - cx u_xx - cy u_yy           = (1/h)^2 ,         (in 2D)

 with Dirichlet boundary conditions of u = 0, where h is the mesh
 spacing in each direction.  Standard finite differences are used to
 discretize the equations, yielding 5-pt. and 7-pt. stencils in 2D and
 3D, respectively.

 To determine when the solver has converged, the driver currently uses
 the relative-residual stopping criteria,

   ||r_k||_2 / ||b||_2 < tol

 with tol = 10^-6.

 This solver can serve as a key component for achieving scalability in
 radiation diffusion simulations.

 B. Coding:

 SMG2000 is written in ISO-C.  It is an SPMD code which uses MPI.
 Parallelism is achieved by data decomposition.  The driver provided
 with SMG2000 achieves this decomposition by simply subdividing the
 grid into logical P x Q x R (in 3D) chunks of equal size.

 C. Parallelism:

 SMG2000 is a highly synchronous code.  The communications and
 computations patterns exhibit the surface-to-volume relationship
 common to many parallel scientific codes.  Hence, parallel efficiency
 is largely determined by the size of the data "chunks" mentioned
 above, and the speed of communications and computations on the
 machine.  SMG2000 is also memory-access bound, doing only about 1-2
 computations per memory access, so memory-access speeds will also have
 a large impact on performance.

 %==========================================================================
 %==========================================================================

 Files in this Distribution

 NOTE: The SMG2000 code is derived directly from the hypre library, a large
 linear solver library that is being developed in the Center for Applied
 Scientific Computing (CASC) at LLNL.

 In the smg2000 directory the following files are included:

 COPYRIGHT_and_DISCLAIMER
 HYPRE_config.h
 Makefile
 Makefile.include

 The following subdirectories are also included:

 docs
 krylov
 struct_ls
 struct_mv
 test
 utilities

 In the 'docs' directory the following files are included:

 smg2000.readme

 In the 'krylov' directory the following files are included:

 HYPRE_pcg.c
 Makefile
 krylov.h
 pcg.c

 In the 'struct_ls' directory the following files are included:

 HYPRE_struct_ls.h
 HYPRE_struct_pcg.c
 HYPRE_struct_smg.c
 Makefile
 coarsen.c
 cyclic_reduction.c
 general.c
 headers.h
 pcg_struct.c
 point_relax.c
 semi_interp.c
 semi_restrict.c
 smg.c
 smg.h
 smg2_setup_rap.c
 smg3_setup_rap.c
 smg_axpy.c
 smg_relax.c
 smg_residual.c
 smg_setup.c
 smg_setup_interp.c
 smg_setup_rap.c
 smg_setup_restrict.c
 smg_solve.c
 struct_ls.h

 In the 'struct_mv' directory the following files are included:

 HYPRE_struct_grid.c
 HYPRE_struct_matrix.c
 HYPRE_struct_mv.h
 HYPRE_struct_stencil.c
 HYPRE_struct_vector.c
 Makefile
 box.c
 box_algebra.c
 box_alloc.c
 box_neighbors.c
 communication.c
 communication_info.c
 computation.c
 grow.c
 headers.h
 hypre_box_smp_forloop.h
 project.c
 struct_axpy.c
 struct_copy.c
 struct_grid.c
 struct_innerprod.c
 struct_io.c
 struct_matrix.c
 struct_matrix_mask.c
 struct_matvec.c
 struct_mv.h
 struct_scale.c
 struct_stencil.c
 struct_vector.c

 In the 'test' directory the following files are included:

 Makefile
 smg2000.c

 In the 'utilities' directory the following files are included:

 HYPRE_utilities.h
 Makefile
 general.h
 hypre_smp_forloop.h
 memory.c
 memory.h
 mpistubs.c
 mpistubs.h
 random.c
 threading.c
 threading.h
 timer.c
 timing.c
 timing.h
 utilities.h
 version

 %==========================================================================
 %==========================================================================

 Building the Code

 SMG2000 uses a simple Makefile system for building the code.  All
 compiler and link options are set by modifying the file
 'smg2000/Makefile.include' appropriately.  This file is then included
 in each of the following makefiles:

   krylov/Makefile
   struct_ls/Makefile
   struct_mv/Makefile
   test/Makefile
   utilities/Makefile

 To build the code, first modify the 'Makefile.include' file
 appropriately, then type (in the smg2000 directory)

   make

 Other available targets are

   make clean        (deletes .o files)
   make veryclean    (deletes .o files, libraries, and executables)

 To configure the code to run with:
 1 - OpenMP only, add '-DHYPRE_USING_OPENMP -DHYPRE_SEQUENTIAL' to
     the 'INCLUDE_CFLAGS' line in the 'Makefile.include' file and
     use a valid OpenMP compiler.
 2 - Open MP with MPI, add '-DHYPRE_USING_OPENMP -DTIMER_USE_MPI'
     to the 'INCLUDE_CFLAGS' line in the 'Makefile.include' file
     and use a valid OpenMP compiler and MPI library.
 3 - MPI only , add '-DTIMER_USE_MPI' to the 'INCLUDE_CFLAGS' line
     in the 'Makefile.include' file and use a valid MPI.

 %==========================================================================
 %==========================================================================

 Optimization and Improvement Challenges

 This code is memory-access bound.  We believe it would be very
 difficult to obtain "good" cache reuse with an optimized version of
 the code.

 %==========================================================================
 %==========================================================================

 Parallelism and Scalability Expectations

 SMG2000 has been run on the following platforms:

  Blue-Pacific         - up to 1000 procs
  Red                  - up to 3150 procs
  Compaq cluster       - up to 64 procs
  Sun Sparc Ultra 10's - up to 4 machines

 Consider increasing both problem size and number of processors in tandem.
 On scalable architectures, time-to-solution for SMG2000 will initially
 increase, then it will level off at a modest numbers of processors,
 remaining roughly constant for larger numbers of processors.  Iteration
 counts will also increase slightly for small to modest sized problems,
 then level off at a roughly constant number for larger problem sizes.

 For example, we get the following results for a 3D problem with
 cx = 0.1, cy = 1.0, and cz = 10.0, for a problem distributed on
 a logical P x Q x R processor topology, with fixed local problem
 size per processor given as 35x35x35:

  "P x Q x R"      P  "iters"     "setup time"    "solve time"
   1x1x1           1     6          1.681680        23.255241
   2x2x2           8     6          3.738600        32.262907
   3x3x3          27     6          6.601194        41.341892
   6x6x6         216     7         12.310776        46.672215
   8x8x8         512     7         18.968893        50.051737
   10x10x10     1000     7         18.890876        54.094806
   14x15x15     3150     8         30.635085        62.725305

 These results were obtained on ASCI Red.

 %==========================================================================
 %==========================================================================

 Running the Code

 The driver for SMG2000 is called `smg2000', and is located in the
 smg2000/test subdirectory.  Type

    mpirun -np 1 smg2000 -help

 to get usage information.  This prints out the following:

 Usage: .../smg2000/test/smg2000 [<options>]

   -n <nx> <ny> <nz>    : problem size per block
   -P <Px> <Py> <Pz>    : processor topology
   -b <bx> <by> <bz>    : blocking per processor
   -c <cx> <cy> <cz>    : diffusion coefficients
   -v <n_pre> <n_post>  : number of pre and post relaxations
   -d <dim>             : problem dimension (2 or 3)
   -solver <ID>         : solver ID (default = 0)
                          0 - SMG
                          1 - CG with SMG precond
                          2 - CG with diagonal scaling
                          3 - CG

 All of the arguments are optional.  The most important options for the
 SMG2000 compact application are the `-n' and `-P' options.  The `-n'
 option allows one to specify the local problem size per MPI process,
 the the `-P' option specifies the process topology on which to run.
 The global problem size will be <Px>*<nx> by <Py>*<ny> by <Pz>*<nz>.

 When running with OpenMP, the number of threads used per MPI process
 is controlled via the OMP_NUM_THREADS environment variable.

 %==========================================================================
 %==========================================================================

 Timing Issues

 If using MPI, the whole code is timed using the MPI timers.  If not using
 MPI, standard system timers are used.  Timing results are printed to
 standard out, and are divided into "Setup Phase" times and "Solve Phase"
 times. Timings for a few individual routines are also printed out.

 %==========================================================================
 %==========================================================================

 Memory Needed

 SMG2000 is a memory intensive code, and its memory needs are somewhat
 complicated to describe.  For the 3D problems discussed in this
 document, memory requirements are roughly 54 times the local problem
 size times the size of a double plus some overhead for storing ghost
 points, etc. in the code.  The overhead required by this version of
 the SMG code grows essentially like the logarithm of the problem size.

 %==========================================================================
 %==========================================================================

 About the Data

 SMG2000 does not read in any data. All control is on the execute line.

 %==========================================================================
 %==========================================================================

 Expected Results

 Consider the following run:

   mpirun -np 1 smg2000 -n 12 12 12 -c 2.0 3.0 40

 This is what SMG2000 prints out:

    Running with these driver parameters:
      (nx, ny, nz)    = (12, 12, 12)
      (Px, Py, Pz)    = (1, 1, 1)
      (bx, by, bz)    = (1, 1, 1)
      (cx, cy, cz)    = (2.000000, 3.000000, 40.000000)
      (n_pre, n_post) = (1, 1)
      dim             = 3
      solver ID       = 0
    =============================================
    Struct Interface:
    =============================================
    Struct Interface:
      wall clock time = 0.005627 seconds
      cpu clock time  = 0.010000 seconds

    =============================================
    Setup phase times:
    =============================================
    SMG Setup:
      wall clock time = 0.330096 seconds
      cpu clock time  = 0.330000 seconds

    =============================================
    Solve phase times:
    =============================================
    SMG Solve:
      wall clock time = 0.686244 seconds
      cpu clock time  = 0.480000 seconds


    Iterations = 4
    Final Relative Residual Norm = 8.972097e-07

 The relative residual norm may differ slightly from machine to machine
 or compiler to compiler, but should only differ very slightly (say,
 the 6th or 7th decimal place).  Also, the code should generate nearly
 identical results for a given problem, independent of the data
 distribution.  The only part of the code that does not guarantee
 bitwise identical results is the inner product used to compute norms.
 In practice, the above residual norm has remained the same.

 %==========================================================================
 %==========================================================================

 Release and Modification Record

 LLNL code release number: UCRL-CODE-2000-022

 (c) 2000   The Regents of the University of California

 See the file COPYRIGHT_and_DISCLAIMER for a complete copyright notice,
 contact person, and disclaimer.
	%==========================================================================
	%==========================================================================

	Code Description

	A. General description:

	SMG2000 is a parallel semicoarsening multigrid solver for the linear
	systems arising from finite difference, finite volume, or finite
	element discretizations of the diffusion equation,

	\grad \cdot ( D \grad u ) + \sigma u = f

	on logically rectangular grids. The code solves both 2D and 3D
	problems with discretization stencils of up to 9-point in 2D and up to
	27-point in 3D. See the following paper for details on the algorithm
	and its parallel implementation/performance:

	P. N. Brown, R. D. Falgout, and J. E. Jones,
	"Semicoarsening multigrid on distributed memory machines",
	SIAM Journal on Scientific Computing, 21 (2000), pp. 1823-1834.
	Also available as LLNL technical report UCRL-JC-130720.

	The driver provided with SMG2000 builds linear systems for the special
	case of the above equation,

	- cx u_xx - cy u_yy - cz u_zz = (1/h)^2 , (in 3D)
	- cx u_xx - cy u_yy = (1/h)^2 , (in 2D)

	with Dirichlet boundary conditions of u = 0, where h is the mesh
	spacing in each direction. Standard finite differences are used to
	discretize the equations, yielding 5-pt. and 7-pt. stencils in 2D and
	3D, respectively.

	To determine when the solver has converged, the driver currently uses
	the relative-residual stopping criteria,

	\|\|r_k\|\|_2 / \|\|b\|\|_2 < tol

	with tol = 10^-6.

	This solver can serve as a key component for achieving scalability in
	radiation diffusion simulations.

	B. Coding:

	SMG2000 is written in ISO-C. It is an SPMD code which uses MPI.
	Parallelism is achieved by data decomposition. The driver provided
	with SMG2000 achieves this decomposition by simply subdividing the
	grid into logical P x Q x R (in 3D) chunks of equal size.

	C. Parallelism:

	SMG2000 is a highly synchronous code. The communications and
	computations patterns exhibit the surface-to-volume relationship
	common to many parallel scientific codes. Hence, parallel efficiency
	is largely determined by the size of the data "chunks" mentioned
	above, and the speed of communications and computations on the
	machine. SMG2000 is also memory-access bound, doing only about 1-2
	computations per memory access, so memory-access speeds will also have
	a large impact on performance.

	%==========================================================================
	%==========================================================================

	Files in this Distribution

	NOTE: The SMG2000 code is derived directly from the hypre library, a large
	linear solver library that is being developed in the Center for Applied
	Scientific Computing (CASC) at LLNL.

	In the smg2000 directory the following files are included:

	COPYRIGHT_and_DISCLAIMER
	HYPRE_config.h
	Makefile
	Makefile.include

	The following subdirectories are also included:

	docs
	krylov
	struct_ls
	struct_mv
	test
	utilities

	In the 'docs' directory the following files are included:

	smg2000.readme

	In the 'krylov' directory the following files are included:

	HYPRE_pcg.c
	Makefile
	krylov.h
	pcg.c

	In the 'struct_ls' directory the following files are included:

	HYPRE_struct_ls.h
	HYPRE_struct_pcg.c
	HYPRE_struct_smg.c
	Makefile
	coarsen.c
	cyclic_reduction.c
	general.c
	headers.h
	pcg_struct.c
	point_relax.c
	semi_interp.c
	semi_restrict.c
	smg.c
	smg.h
	smg2_setup_rap.c
	smg3_setup_rap.c
	smg_axpy.c
	smg_relax.c
	smg_residual.c
	smg_setup.c
	smg_setup_interp.c
	smg_setup_rap.c
	smg_setup_restrict.c
	smg_solve.c
	struct_ls.h

	In the 'struct_mv' directory the following files are included:

	HYPRE_struct_grid.c
	HYPRE_struct_matrix.c
	HYPRE_struct_mv.h
	HYPRE_struct_stencil.c
	HYPRE_struct_vector.c
	Makefile
	box.c
	box_algebra.c
	box_alloc.c
	box_neighbors.c
	communication.c
	communication_info.c
	computation.c
	grow.c
	headers.h
	hypre_box_smp_forloop.h
	project.c
	struct_axpy.c
	struct_copy.c
	struct_grid.c
	struct_innerprod.c
	struct_io.c
	struct_matrix.c
	struct_matrix_mask.c
	struct_matvec.c
	struct_mv.h
	struct_scale.c
	struct_stencil.c
	struct_vector.c

	In the 'test' directory the following files are included:

	Makefile
	smg2000.c

	In the 'utilities' directory the following files are included:

	HYPRE_utilities.h
	Makefile
	general.h
	hypre_smp_forloop.h
	memory.c
	memory.h
	mpistubs.c
	mpistubs.h
	random.c
	threading.c
	threading.h
	timer.c
	timing.c
	timing.h
	utilities.h
	version

	%==========================================================================
	%==========================================================================

	Building the Code

	SMG2000 uses a simple Makefile system for building the code. All
	compiler and link options are set by modifying the file
	'smg2000/Makefile.include' appropriately. This file is then included
	in each of the following makefiles:

	krylov/Makefile
	struct_ls/Makefile
	struct_mv/Makefile
	test/Makefile
	utilities/Makefile

	To build the code, first modify the 'Makefile.include' file
	appropriately, then type (in the smg2000 directory)

	make

	Other available targets are

	make clean (deletes .o files)
	make veryclean (deletes .o files, libraries, and executables)

	To configure the code to run with:
	1 - OpenMP only, add '-DHYPRE_USING_OPENMP -DHYPRE_SEQUENTIAL' to
	the 'INCLUDE_CFLAGS' line in the 'Makefile.include' file and
	use a valid OpenMP compiler.
	2 - Open MP with MPI, add '-DHYPRE_USING_OPENMP -DTIMER_USE_MPI'
	to the 'INCLUDE_CFLAGS' line in the 'Makefile.include' file
	and use a valid OpenMP compiler and MPI library.
	3 - MPI only , add '-DTIMER_USE_MPI' to the 'INCLUDE_CFLAGS' line
	in the 'Makefile.include' file and use a valid MPI.

	%==========================================================================
	%==========================================================================

	Optimization and Improvement Challenges

	This code is memory-access bound. We believe it would be very
	difficult to obtain "good" cache reuse with an optimized version of
	the code.

	%==========================================================================
	%==========================================================================

	Parallelism and Scalability Expectations

	SMG2000 has been run on the following platforms:

	Blue-Pacific - up to 1000 procs
	Red - up to 3150 procs
	Compaq cluster - up to 64 procs
	Sun Sparc Ultra 10's - up to 4 machines

	Consider increasing both problem size and number of processors in tandem.
	On scalable architectures, time-to-solution for SMG2000 will initially
	increase, then it will level off at a modest numbers of processors,
	remaining roughly constant for larger numbers of processors. Iteration
	counts will also increase slightly for small to modest sized problems,
	then level off at a roughly constant number for larger problem sizes.

	For example, we get the following results for a 3D problem with
	cx = 0.1, cy = 1.0, and cz = 10.0, for a problem distributed on
	a logical P x Q x R processor topology, with fixed local problem
	size per processor given as 35x35x35:

	"P x Q x R" P "iters" "setup time" "solve time"
	1x1x1 1 6 1.681680 23.255241
	2x2x2 8 6 3.738600 32.262907
	3x3x3 27 6 6.601194 41.341892
	6x6x6 216 7 12.310776 46.672215
	8x8x8 512 7 18.968893 50.051737
	10x10x10 1000 7 18.890876 54.094806
	14x15x15 3150 8 30.635085 62.725305

	These results were obtained on ASCI Red.

	%==========================================================================
	%==========================================================================

	Running the Code

	The driver for SMG2000 is called `smg2000', and is located in the
	smg2000/test subdirectory. Type

	mpirun -np 1 smg2000 -help

	to get usage information. This prints out the following:

	Usage: .../smg2000/test/smg2000 [<options>]

	-n <nx> <ny> <nz> : problem size per block
	-P <Px> <Py> <Pz> : processor topology
	-b <bx> <by> <bz> : blocking per processor
	-c <cx> <cy> <cz> : diffusion coefficients
	-v <n_pre> <n_post> : number of pre and post relaxations
	-d <dim> : problem dimension (2 or 3)
	-solver <ID> : solver ID (default = 0)
	0 - SMG
	1 - CG with SMG precond
	2 - CG with diagonal scaling
	3 - CG

	All of the arguments are optional. The most important options for the
	SMG2000 compact application are the `-n' and `-P' options. The `-n'
	option allows one to specify the local problem size per MPI process,
	the the `-P' option specifies the process topology on which to run.
	The global problem size will be <Px><nx> by <Py><ny> by <Pz>*<nz>.

	When running with OpenMP, the number of threads used per MPI process
	is controlled via the OMP_NUM_THREADS environment variable.

	%==========================================================================
	%==========================================================================

	Timing Issues

	If using MPI, the whole code is timed using the MPI timers. If not using
	MPI, standard system timers are used. Timing results are printed to
	standard out, and are divided into "Setup Phase" times and "Solve Phase"
	times. Timings for a few individual routines are also printed out.

	%==========================================================================
	%==========================================================================

	Memory Needed

	SMG2000 is a memory intensive code, and its memory needs are somewhat
	complicated to describe. For the 3D problems discussed in this
	document, memory requirements are roughly 54 times the local problem
	size times the size of a double plus some overhead for storing ghost
	points, etc. in the code. The overhead required by this version of
	the SMG code grows essentially like the logarithm of the problem size.

	%==========================================================================
	%==========================================================================

	About the Data

	SMG2000 does not read in any data. All control is on the execute line.

	%==========================================================================
	%==========================================================================

	Expected Results

	Consider the following run:

	mpirun -np 1 smg2000 -n 12 12 12 -c 2.0 3.0 40

	This is what SMG2000 prints out:

	Running with these driver parameters:
	(nx, ny, nz) = (12, 12, 12)
	(Px, Py, Pz) = (1, 1, 1)
	(bx, by, bz) = (1, 1, 1)
	(cx, cy, cz) = (2.000000, 3.000000, 40.000000)
	(n_pre, n_post) = (1, 1)
	dim = 3
	solver ID = 0
	=============================================
	Struct Interface:
	=============================================
	Struct Interface:
	wall clock time = 0.005627 seconds
	cpu clock time = 0.010000 seconds

	=============================================
	Setup phase times:
	=============================================
	SMG Setup:
	wall clock time = 0.330096 seconds
	cpu clock time = 0.330000 seconds

	=============================================
	Solve phase times:
	=============================================
	SMG Solve:
	wall clock time = 0.686244 seconds
	cpu clock time = 0.480000 seconds


	Iterations = 4
	Final Relative Residual Norm = 8.972097e-07

	The relative residual norm may differ slightly from machine to machine
	or compiler to compiler, but should only differ very slightly (say,
	the 6th or 7th decimal place). Also, the code should generate nearly
	identical results for a given problem, independent of the data
	distribution. The only part of the code that does not guarantee
	bitwise identical results is the inner product used to compute norms.
	In practice, the above residual norm has remained the same.

	%==========================================================================
	%==========================================================================

	Release and Modification Record

	LLNL code release number: UCRL-CODE-2000-022

	(c) 2000 The Regents of the University of California

	See the file COPYRIGHT_and_DISCLAIMER for a complete copyright notice,
	contact person, and disclaimer.