MultiSource/Benchmarks/ASC_Sequoia/sphot/sphot_read_v0.9.1.txt - third_party/llvm-test-suite - Git at Google


   SPhot - Monte Carlo Transport Code

 *Privacy & Legal Notice* <http://www.llnl.gov/disclaimer.html>
 ------------------------------------------------------------------------


     Contents

     * Code Description
     * Files in this Distribution
     * Building the Code
     * Running the Code
     * Timing Issues
     * Expected Results

 ------------------------------------------------------------------------


     Code Description


       A. General Description

 SPhot is a 2D photon transport code. Photons are born in hot matter and
 tracked through a spherical domain that is cylindrically symmetric on a
 logically rectilinear, 2D mesh. Many application codes at LLNL implement
 some form of this basic algorithm. The version implemented here exploits
 both MPI and OpenMP parallelism. The code is written in Fortran77.

 Monte Carlo transport solves the Boltzmann transport equation by
 directly mimicking the behavior of photons as they are born in hot
 matter, move through and scatter in different materials, are absorbed or
 escape from the problem domain.

 The logically rectilinear, 2D mesh in which particles are tracked, is
 internally generated. The mesh is small enough that a complete copy of
 the mesh will not only fit on each node in a parallel machine, but also
 fit into cache memory in most modern CPUs. Thus, this benchmark does not
 stress memory access.

 Particles are born with an energy and direction that are determined by
 using random numbers to sample from appropriate distributions. Likewise,
 scattering and absorption are also modeled by randomly sampling cross
 sections. The random number generator used is implemented in the code
 using integer arithmetic in a way that makes the resulting pseudo-random
 number sequence portably reproducible across different machines.

 The use of random numbers makes the code's output (edit variables)
 "noisy." This noise is a direct result of the discrete nature of the
 simulation. The level of the noise can be reduced by increasing the
 number of particles that are used in the simulation. Unfortunately, the
 level of noise in the answer decreases only very slowly with increasing
 computational effort. The noise is inversely proportional to the square
 root of the number of particles. If the noise is to be reduced to 1% of
 the value in a given simulation, it is necessary to run 10,000 times as
 many particles. Thus, high-quality (low-noise) simulations can become
 very computationally expensive. Parallelism is an obvious way to
 increase the number of particles.


       B. Coding

 With the exception of a single routine (rdinput.f) that includes
 Fortran90 derived type usage, SPhot is coded entirely in Fortran77.
 Parallelism is implemented through the use of both MPI and OpenMP
 libraries.


       C. Parallelization

 SPhot falls into the category of "embarrassingly parallel" applications.
 Every CPU that is employed in the computation works on a local copy of
 the 2D mesh (most likely stored in cache), generates its own random
 numbers and performs its own particle trackings. For the most part,
 tasks do not exchange data with each other. The minimal communications
 that do occur take place between the "master" MPI task and all other MPI
 tasks for the purposes of distributing input data, updating global
 variables and collecting timing statistics. Additionally, several
 synchronization constructs are necessitated at both the MPI and OpenMP
 level.

 Parallelism in SPhot is implemented as a mixed-mode/hybrid model. It is
 required that both MPI and OpenMP be used during execution, however
 there is much flexibility for the user to choose how many MPI tasks and
 OpenMP threads are used. In the most general sense, the MPI parallelism
 is designed to provide communication between separate, distributed
 memory machines, which may or may not be SMPs. Each machine hosts an MPI
 task that oversees the OpenMP threaded execution of SPhot on that
 machine, and facilitates required communications with MPI tasks on other
 machines. However, having only one MPI task per machine is entirely up
 to the user. Multiple MPI tasks can be run on a machine if this is
 desired. Or, if the machine architecture employs global address space,
 only a single MPI task may be required for the entire execution.

 The OpenMP parallelism is strictly "on-node" (shared memory) execution
 for SMP machines. Every OpenMP thread is "owned" by a single MPI task
 with a ratio of /n:1/, where /n/ varies with the user-specified number
 of OpenMP tasks. As with the number of MPI tasks, there is no explicit
 specification within SPhot for the number of OpenMP threads which must
 be used. Instead, these are specified by the user through the use of
 environment variables or command line flags at run time. In this manner,
 the same code can adapt to a diverse set of architectures and conditions.

 ------------------------------------------------------------------------


     Files in This Distribution

 The following files are included in this distribution of SPhot:

       Subdirectory 	Files 	Comments
         	Makefile 	Main makefile for SPhot
       opac.txt 	Required opacity library
       input.dat 	Input file
       bin 	(none) 	Initially empty - used to keep objects after
       compilations
       includes 	geomz.inc
       globals.inc
       params.inc
       pranf.inc
       randseed.inc
       shared.inc
       times.inc 	Required include files
       src 	Makefile.src
       allocdyn.f
       copyglob.f
       copypriv.f
       copyseed.f
       execute.f
       genmesh.f
       genstats.f
       genxsec.f
       interp.f
       iranfeven.f
       iranfodd.f
       plnkut.f
       pranf.f
       ranf.f
       ranfatok.f
       ranfk.f
       ranfkbinary.f
       ranfmodmult.f
       rans.f
       rdinput.f
       rdopac.f
       second.f
       seedranf.f
       sphot.f
       thom.f
       writeout.f
       wroutput.f
       zonevols.f 	All Fortran source files and required makefile

 ------------------------------------------------------------------------


     Building the Code

 Building SPhot is fairly simple and straightforward:

    1. *Perform Makefile Modifications*

       Two makefiles are used to build the code:

       Makefile
       src/Makefile.src

       Both of these makefiles will require manual modifications to
       several parameters prior to attempting to build the code. The
       parameters which require manual modification are clearly marked
       with a "***Modifications Required*** notice, and include such
       things as choice of compiler, compiler flags, library directories
       and libraries. Note that with the exception of the rdinput.f file,
       all source files are Fortran77. The rdinput.f routine uses Fortran
       90 derived data type syntax and will therefore require a compiler
       that can handle such.

       It is important to mention that this code is implemented with both
       MPI and OpenMP. Specifying the necessary platform dependent
       parameters to enable both of these is essential.

       Additionally, both makefiles permit the specification of C
       language parameters, even though SPhot itself does not use any C
       files. The reason for this is merely convenience in case there is
       a desire to introduce C language profilers, timing tools or
       something similar.

    2. *Modify Global Parameters*

       SPhot has a number of parameters which are "hardcoded" in its
       various include files. In most cases, these do not require
       modification. However, it is quite possible that for larger runs,
       several of these parameters may be too small and require
       modification. Each of these is discussed in the table below.

       Parameter 	Where Located 	Instructions
       *maxruns* 	includes/params.inc 	Establishes the maximum value for
       the Nruns input file parameter. Must be at least equal to (Number
       of MPI tasks * Number of OpenMP threads). Default value is 65537.
       Requires source code to be recompiled if modified.
       *MaxStreams* 	includes/pranf.inc 	Must be equal to or greater than
       the previous maxruns parameter. Must always be an odd number.
       Default value is 65537. Requires source code to be recompiled if
       modified.
       *maxMPItasks* 	includes/times.inc 	Establishes the maximum number
       of MPI tasks. Default value is 16384. Requires source code to be
       recompiled if modified.
       *maxThreadsPerMPItask* 	includes/times.inc 	Establishes the
       maximum number of OpenMP threads per MPI task. Default value is
       128. Requires source code to be recompiled if modified.

    3. *Make*

       In the main directory, simply issue the make command. This will
       invoke the primary Makefile, which in turn will invoke the
       src/Makefile.src makefile. The files in the src subdirectory will
       then be compiled and their objects copied to the bin directory.
       The sphot executable will be created and copied into the main
       directory.

    4. *Cleanup*

       Invoking the command make clean in the main directory will remove
       all object files in the bin subdirectory and the sphot executable.

 ------------------------------------------------------------------------


     Running the Code

 Several steps are required to execute SPhot. These are listed below.

    1. *Set the Nruns Input File Parameter*

       The distribution input file input.dat specifies a number of
       parameters that control the execution of SPhot. It is usually
       necessary to change only one of these - the Nruns parameter. Nruns
       represents the "number of runs" that should be conducted. It must
       be evenly divisible by the (Number of MPI tasks * Number of OpenMP
       threads). To change Nruns, simply edit the input.dat file. Nruns
       appears as the single integer value on the first line of the file.

       Note that the default maximum value for Nruns is 65537. If this
       value is too small and must be increased, then an include file
       must be modified and the source recompiled. See the Building the
       Code  section for additional instructions.

       The three tables below show:

           * The raw format of input.dat upon distribution
           * A description of each input.dat parameter
           * Examples of how the Nruns parameter may be changed.

       *Raw format of input.dat file* (Nruns parameter in blue)


         *16*

 BENCHMARK CASE NO THOMS, NO RR/SPLIT: 49X40, BWGT=3.12e+14

          1         0

       4000         0         0         0         0         0         0

         49        40         2

   1.00e-10  7.50e-01  1.00e-12  1.05e+00

   1.80e+02  1.00e-03  5.00e-01  3.12e+12

          1       980

        981      1960

          3 4.000e-01 1.000e+03 2.000e+04

          2 5.000e-01 1.000e+02 1.000e+03

          0

       *Description of input.dat parameters*
       *

 Field    Format   Comments

 Name

       *


 Nruns     i4     Number of runs. Should be evenly divided by the

                  number of MPI tasks and OpenMP threads.

 title     20a4   Title information

 ilib      i10    opacity library format: 0=binary 1=formatted

 illib     i10    bin library usage:

                  0=use old bin.lib 1=form new bin.lib

 npart     i10    # of particles

 igroup    i10    energy bins (0=12, 1-12=1, 13=ross.mean)

 ixopec    i10    opacity (0=library, 1=input,2=data)

 isorst    i10    source (1=uniform in sphere, 2=plankian)

 irr       i10    r-roulette/split (0=none, 1=impt, 2=size)

 ithom     i10    thomson scattering (0=not used, 1=used)

 icross    i10    print cross sections (0= no, 1= yes)

 naxl      i10    number of axial meshes

 nradl     i10    number of radial meshes

 nreg      i10    number of material regions

 dtol      e10.2  tolerance to cell boundaries (cm)

 wcut      e10.2  low weight cutoff

 tcen      e10.2  time to census (sec)

 xmult     e10.2  weight mult. for russian roulette

 axl       e10.2  portion of sphere analyzed (degrees)

 radl      e10.2  sphere radius (cm)

 opec      e10.2  input opacity (1/cm)

 bwgt      e10.2  bundle weight (kev)

 matb,mat  i10    material region begin/end pair.  One pair for each of

                  nreg materials

 mtl       i10    Material - one value for each of nreg materials.

                  1=h 2=sio2 3=dt 4=c

 atrat     e10.3  Atomic Ratio - one value for each of nreg materials

 dns       e10.3  Density - one value for each of nreg materials

 tmp       e10.3  Temperature - one value for each of nreg materials

 prtflg    i5     Print flag to select level of output( see summary page for detail)

       *Examples of how the Nruns parameter might be modified*
       Number of MPI Tasks 	Number of OpenMP
       Threads per MPI task 	Possible Values for NRuns
       1 	1 	1, 2, 3...
       1 	16 	16, 32, 64...
       1 	1000 	1000, 2000, 3000...
       16 	4 	64, 128, 256...
       256 	4 	1024, 2048, 3072...
       1000 	1 	1000, 2000, 3000...

    2. *Specify the Number of MPI Tasks*

       The method for doing this will vary from platform to platform but
       will be typical for a given platform. For example, an IBM RS/6000
       SP platform may use POE environment variables, such as MP_PROCS,
       MP_NODES or MP_TASKS_PER_NODE. Alternately, the Compaq Tru64 Alpha
       platform may use the prun -n command.

       Note that the optimal number of MPI tasks to use on a platform
       will also vary, a determination which is left up to the user. In
       no case should the number of MPI tasks exceed the total number of
       processors/cpus available.

       The default maximum number of MPI tasks is 1024. If this value is
       too small, then an include file must be modified and the source
       recomplied. See the Building the Code section for additional instructions.

    3. *Specify the Number of OpenMP Threads per MPI Task*

       Specifying the number of OpenMP threads is done according to the
       OpenMP standard by using either the platform default (usually the
       number of cpus on a machine), or explicitly with the
       OMP_NUM_THREADS environment variable. As with the number of MPI
       tasks, determining the optimal number of OpenMP threads is left up
       to the user, and will vary from platform to platform.

       The default maximum number of OpenMP threads per MPI task is 128.
       If this value is too small, then an include file must be modified
       and the source recomplied. See the Building the Code
       section for additional instructions.

    4. *Invoke the Executable*

       This will vary from platform to platform also, especially if a
       "batch scheduling" system is used to run jobs. In the most simple
       case, issuing a command similar to that shown below would work:

         prompt> sphot <cr>

         where:

       sphot is the executable

       It is assumed that the input.dat file and opac.txt file are in the
       same directory where sphot is invoked.

 ------------------------------------------------------------------------


     Timing Issues

 All timings are obtained through the use of the portable MPI wall clock
 timing routine, MPI_Wtime(). Instrumentation occurs in several locations
 within the code, most importantly around the computational loop. Some
 instrumentations are trivial and one is actually unnecessary (the timing
 of the allocdyn.f routine).

 The execution time for one "run" of SPhot is platform dependent, but in
 any case does not require much time--usually only "seconds", and
 certainly much less than a minute on all current architectures tested.
 Most architectures tested demonstrated execution times of 10-20 seconds
 per run. Compiler optimization should improve execution times in most
 cases.

 Every task/thread maintains its own timings, which are collected and
 maintained globally by the "master" MPI task (rank=0). In this sense,
 the instrumentation of SPhot actually imposes some overhead. As the
 number of MPI tasks increases, so does the ratio of communication time
 to execution time. Since SPhot requires so little execution time per run
 to begin with, MPI overhead may gradually reduce the application's
 scalability for many-task runs. The associated MPI_Barrier calls
 required for synchronization (most notably in copypriv.f) will also
 incur additional overhead as the number of MPI tasks increase.

 The most important timing parameters are the loop speedup, program
 speedup and efficiency computations that appear at the end of the
 application's output (example output is available in the "Expected
 Results" section). These values reflect the scalability of the
 application on a given platform.

 Please see the Summary writeup for information about timing data to be
 reported.

 ------------------------------------------------------------------------

 Expected Results

 Please see the Summary writeup for information about expected results and
 reporting of data.


 Version 0.9.1

 For more information about this page, contact:
 Tom Spelce, spelce1@llnl.gov

 <http://www.llnl.gov/>

 *UCRL-MI-144211*
 September 19, 2001

	SPhot - Monte Carlo Transport Code

	Privacy & Legal Notice <http://www.llnl.gov/disclaimer.html>
	------------------------------------------------------------------------


	Contents

	* Code Description
	* Files in this Distribution
	* Building the Code
	* Running the Code
	* Timing Issues
	* Expected Results

	------------------------------------------------------------------------


	Code Description


	A. General Description

	SPhot is a 2D photon transport code. Photons are born in hot matter and
	tracked through a spherical domain that is cylindrically symmetric on a
	logically rectilinear, 2D mesh. Many application codes at LLNL implement
	some form of this basic algorithm. The version implemented here exploits
	both MPI and OpenMP parallelism. The code is written in Fortran77.

	Monte Carlo transport solves the Boltzmann transport equation by
	directly mimicking the behavior of photons as they are born in hot
	matter, move through and scatter in different materials, are absorbed or
	escape from the problem domain.

	The logically rectilinear, 2D mesh in which particles are tracked, is
	internally generated. The mesh is small enough that a complete copy of
	the mesh will not only fit on each node in a parallel machine, but also
	fit into cache memory in most modern CPUs. Thus, this benchmark does not
	stress memory access.

	Particles are born with an energy and direction that are determined by
	using random numbers to sample from appropriate distributions. Likewise,
	scattering and absorption are also modeled by randomly sampling cross
	sections. The random number generator used is implemented in the code
	using integer arithmetic in a way that makes the resulting pseudo-random
	number sequence portably reproducible across different machines.

	The use of random numbers makes the code's output (edit variables)
	"noisy." This noise is a direct result of the discrete nature of the
	simulation. The level of the noise can be reduced by increasing the
	number of particles that are used in the simulation. Unfortunately, the
	level of noise in the answer decreases only very slowly with increasing
	computational effort. The noise is inversely proportional to the square
	root of the number of particles. If the noise is to be reduced to 1% of
	the value in a given simulation, it is necessary to run 10,000 times as
	many particles. Thus, high-quality (low-noise) simulations can become
	very computationally expensive. Parallelism is an obvious way to
	increase the number of particles.


	B. Coding

	With the exception of a single routine (rdinput.f) that includes
	Fortran90 derived type usage, SPhot is coded entirely in Fortran77.
	Parallelism is implemented through the use of both MPI and OpenMP
	libraries.


	C. Parallelization

	SPhot falls into the category of "embarrassingly parallel" applications.
	Every CPU that is employed in the computation works on a local copy of
	the 2D mesh (most likely stored in cache), generates its own random
	numbers and performs its own particle trackings. For the most part,
	tasks do not exchange data with each other. The minimal communications
	that do occur take place between the "master" MPI task and all other MPI
	tasks for the purposes of distributing input data, updating global
	variables and collecting timing statistics. Additionally, several
	synchronization constructs are necessitated at both the MPI and OpenMP
	level.

	Parallelism in SPhot is implemented as a mixed-mode/hybrid model. It is
	required that both MPI and OpenMP be used during execution, however
	there is much flexibility for the user to choose how many MPI tasks and
	OpenMP threads are used. In the most general sense, the MPI parallelism
	is designed to provide communication between separate, distributed
	memory machines, which may or may not be SMPs. Each machine hosts an MPI
	task that oversees the OpenMP threaded execution of SPhot on that
	machine, and facilitates required communications with MPI tasks on other
	machines. However, having only one MPI task per machine is entirely up
	to the user. Multiple MPI tasks can be run on a machine if this is
	desired. Or, if the machine architecture employs global address space,
	only a single MPI task may be required for the entire execution.

	The OpenMP parallelism is strictly "on-node" (shared memory) execution
	for SMP machines. Every OpenMP thread is "owned" by a single MPI task
	with a ratio of /n:1/, where /n/ varies with the user-specified number
	of OpenMP tasks. As with the number of MPI tasks, there is no explicit
	specification within SPhot for the number of OpenMP threads which must
	be used. Instead, these are specified by the user through the use of
	environment variables or command line flags at run time. In this manner,
	the same code can adapt to a diverse set of architectures and conditions.

	------------------------------------------------------------------------


	Files in This Distribution

	The following files are included in this distribution of SPhot:

	Subdirectory Files Comments
	Makefile Main makefile for SPhot
	opac.txt Required opacity library
	input.dat Input file
	bin (none) Initially empty - used to keep objects after
	compilations
	includes geomz.inc
	globals.inc
	params.inc
	pranf.inc
	randseed.inc
	shared.inc
	times.inc Required include files
	src Makefile.src
	allocdyn.f
	copyglob.f
	copypriv.f
	copyseed.f
	execute.f
	genmesh.f
	genstats.f
	genxsec.f
	interp.f
	iranfeven.f
	iranfodd.f
	plnkut.f
	pranf.f
	ranf.f
	ranfatok.f
	ranfk.f
	ranfkbinary.f
	ranfmodmult.f
	rans.f
	rdinput.f
	rdopac.f
	second.f
	seedranf.f
	sphot.f
	thom.f
	writeout.f
	wroutput.f
	zonevols.f All Fortran source files and required makefile

	------------------------------------------------------------------------


	Building the Code

	Building SPhot is fairly simple and straightforward:

	1. Perform Makefile Modifications

	Two makefiles are used to build the code:

	Makefile
	src/Makefile.src

	Both of these makefiles will require manual modifications to
	several parameters prior to attempting to build the code. The
	parameters which require manual modification are clearly marked
	with a "*Modifications Required* notice, and include such
	things as choice of compiler, compiler flags, library directories
	and libraries. Note that with the exception of the rdinput.f file,
	all source files are Fortran77. The rdinput.f routine uses Fortran
	90 derived data type syntax and will therefore require a compiler
	that can handle such.

	It is important to mention that this code is implemented with both
	MPI and OpenMP. Specifying the necessary platform dependent
	parameters to enable both of these is essential.

	Additionally, both makefiles permit the specification of C
	language parameters, even though SPhot itself does not use any C
	files. The reason for this is merely convenience in case there is
	a desire to introduce C language profilers, timing tools or
	something similar.

	2. Modify Global Parameters

	SPhot has a number of parameters which are "hardcoded" in its
	various include files. In most cases, these do not require
	modification. However, it is quite possible that for larger runs,
	several of these parameters may be too small and require
	modification. Each of these is discussed in the table below.

	Parameter Where Located Instructions
	maxruns includes/params.inc Establishes the maximum value for
	the Nruns input file parameter. Must be at least equal to (Number
	of MPI tasks * Number of OpenMP threads). Default value is 65537.
	Requires source code to be recompiled if modified.
	MaxStreams includes/pranf.inc Must be equal to or greater than
	the previous maxruns parameter. Must always be an odd number.
	Default value is 65537. Requires source code to be recompiled if
	modified.
	maxMPItasks includes/times.inc Establishes the maximum number
	of MPI tasks. Default value is 16384. Requires source code to be
	recompiled if modified.
	maxThreadsPerMPItask includes/times.inc Establishes the
	maximum number of OpenMP threads per MPI task. Default value is
	128. Requires source code to be recompiled if modified.

	3. Make

	In the main directory, simply issue the make command. This will
	invoke the primary Makefile, which in turn will invoke the
	src/Makefile.src makefile. The files in the src subdirectory will
	then be compiled and their objects copied to the bin directory.
	The sphot executable will be created and copied into the main
	directory.

	4. Cleanup

	Invoking the command make clean in the main directory will remove
	all object files in the bin subdirectory and the sphot executable.

	------------------------------------------------------------------------


	Running the Code

	Several steps are required to execute SPhot. These are listed below.

	1. Set the Nruns Input File Parameter

	The distribution input file input.dat specifies a number of
	parameters that control the execution of SPhot. It is usually
	necessary to change only one of these - the Nruns parameter. Nruns
	represents the "number of runs" that should be conducted. It must
	be evenly divisible by the (Number of MPI tasks * Number of OpenMP
	threads). To change Nruns, simply edit the input.dat file. Nruns
	appears as the single integer value on the first line of the file.

	Note that the default maximum value for Nruns is 65537. If this
	value is too small and must be increased, then an include file
	must be modified and the source recompiled. See the Building the
	Code section for additional instructions.

	The three tables below show:

	* The raw format of input.dat upon distribution
	* A description of each input.dat parameter
	* Examples of how the Nruns parameter may be changed.

	Raw format of input.dat file (Nruns parameter in blue)



	16

	BENCHMARK CASE NO THOMS, NO RR/SPLIT: 49X40, BWGT=3.12e+14

	1 0

	4000 0 0 0 0 0 0

	49 40 2

	1.00e-10 7.50e-01 1.00e-12 1.05e+00

	1.80e+02 1.00e-03 5.00e-01 3.12e+12

	1 980

	981 1960

	3 4.000e-01 1.000e+03 2.000e+04

	2 5.000e-01 1.000e+02 1.000e+03

	0

	Description of input.dat parameters
	*

	Field Format Comments

	Name

	*



	Nruns i4 Number of runs. Should be evenly divided by the

	number of MPI tasks and OpenMP threads.

	title 20a4 Title information

	ilib i10 opacity library format: 0=binary 1=formatted

	illib i10 bin library usage:

	0=use old bin.lib 1=form new bin.lib

	npart i10 # of particles

	igroup i10 energy bins (0=12, 1-12=1, 13=ross.mean)

	ixopec i10 opacity (0=library, 1=input,2=data)

	isorst i10 source (1=uniform in sphere, 2=plankian)

	irr i10 r-roulette/split (0=none, 1=impt, 2=size)

	ithom i10 thomson scattering (0=not used, 1=used)

	icross i10 print cross sections (0= no, 1= yes)

	naxl i10 number of axial meshes

	nradl i10 number of radial meshes

	nreg i10 number of material regions

	dtol e10.2 tolerance to cell boundaries (cm)

	wcut e10.2 low weight cutoff

	tcen e10.2 time to census (sec)

	xmult e10.2 weight mult. for russian roulette

	axl e10.2 portion of sphere analyzed (degrees)

	radl e10.2 sphere radius (cm)

	opec e10.2 input opacity (1/cm)

	bwgt e10.2 bundle weight (kev)

	matb,mat i10 material region begin/end pair. One pair for each of

	nreg materials

	mtl i10 Material - one value for each of nreg materials.

	1=h 2=sio2 3=dt 4=c

	atrat e10.3 Atomic Ratio - one value for each of nreg materials

	dns e10.3 Density - one value for each of nreg materials

	tmp e10.3 Temperature - one value for each of nreg materials

	prtflg i5 Print flag to select level of output( see summary page for detail)

	Examples of how the Nruns parameter might be modified
	Number of MPI Tasks Number of OpenMP
	Threads per MPI task Possible Values for NRuns
	1 1 1, 2, 3...
	1 16 16, 32, 64...
	1 1000 1000, 2000, 3000...
	16 4 64, 128, 256...
	256 4 1024, 2048, 3072...
	1000 1 1000, 2000, 3000...

	2. Specify the Number of MPI Tasks

	The method for doing this will vary from platform to platform but
	will be typical for a given platform. For example, an IBM RS/6000
	SP platform may use POE environment variables, such as MP_PROCS,
	MP_NODES or MP_TASKS_PER_NODE. Alternately, the Compaq Tru64 Alpha
	platform may use the prun -n command.

	Note that the optimal number of MPI tasks to use on a platform
	will also vary, a determination which is left up to the user. In
	no case should the number of MPI tasks exceed the total number of
	processors/cpus available.

	The default maximum number of MPI tasks is 1024. If this value is
	too small, then an include file must be modified and the source
	recomplied. See the Building the Code section for additional instructions.

	3. Specify the Number of OpenMP Threads per MPI Task

	Specifying the number of OpenMP threads is done according to the
	OpenMP standard by using either the platform default (usually the
	number of cpus on a machine), or explicitly with the
	OMP_NUM_THREADS environment variable. As with the number of MPI
	tasks, determining the optimal number of OpenMP threads is left up
	to the user, and will vary from platform to platform.

	The default maximum number of OpenMP threads per MPI task is 128.
	If this value is too small, then an include file must be modified
	and the source recomplied. See the Building the Code
	section for additional instructions.

	4. Invoke the Executable

	This will vary from platform to platform also, especially if a
	"batch scheduling" system is used to run jobs. In the most simple
	case, issuing a command similar to that shown below would work:

	prompt> sphot <cr>

	where:

	sphot is the executable

	It is assumed that the input.dat file and opac.txt file are in the
	same directory where sphot is invoked.

	------------------------------------------------------------------------


	Timing Issues

	All timings are obtained through the use of the portable MPI wall clock
	timing routine, MPI_Wtime(). Instrumentation occurs in several locations
	within the code, most importantly around the computational loop. Some
	instrumentations are trivial and one is actually unnecessary (the timing
	of the allocdyn.f routine).

	The execution time for one "run" of SPhot is platform dependent, but in
	any case does not require much time--usually only "seconds", and
	certainly much less than a minute on all current architectures tested.
	Most architectures tested demonstrated execution times of 10-20 seconds
	per run. Compiler optimization should improve execution times in most
	cases.

	Every task/thread maintains its own timings, which are collected and
	maintained globally by the "master" MPI task (rank=0). In this sense,
	the instrumentation of SPhot actually imposes some overhead. As the
	number of MPI tasks increases, so does the ratio of communication time
	to execution time. Since SPhot requires so little execution time per run
	to begin with, MPI overhead may gradually reduce the application's
	scalability for many-task runs. The associated MPI_Barrier calls
	required for synchronization (most notably in copypriv.f) will also
	incur additional overhead as the number of MPI tasks increase.

	The most important timing parameters are the loop speedup, program
	speedup and efficiency computations that appear at the end of the
	application's output (example output is available in the "Expected
	Results" section). These values reflect the scalability of the
	application on a given platform.

	Please see the Summary writeup for information about timing data to be
	reported.

	------------------------------------------------------------------------

	Expected Results

	Please see the Summary writeup for information about expected results and
	reporting of data.


	Version 0.9.1

	For more information about this page, contact:
	Tom Spelce, spelce1@llnl.gov

	<http://www.llnl.gov/>

	UCRL-MI-144211
	September 19, 2001