blob: f29f78751dd4af15a09015985e5755c32eff7a66 [file] [log] [blame]
SPhot - Monte Carlo Transport Code
*Privacy & Legal Notice* <http://www.llnl.gov/disclaimer.html>
------------------------------------------------------------------------
Contents
* Code Description
* Files in this Distribution
* Building the Code
* Running the Code
* Timing Issues
* Expected Results
------------------------------------------------------------------------
Code Description
A. General Description
SPhot is a 2D photon transport code. Photons are born in hot matter and
tracked through a spherical domain that is cylindrically symmetric on a
logically rectilinear, 2D mesh. Many application codes at LLNL implement
some form of this basic algorithm. The version implemented here exploits
both MPI and OpenMP parallelism. The code is written in Fortran77.
Monte Carlo transport solves the Boltzmann transport equation by
directly mimicking the behavior of photons as they are born in hot
matter, move through and scatter in different materials, are absorbed or
escape from the problem domain.
The logically rectilinear, 2D mesh in which particles are tracked, is
internally generated. The mesh is small enough that a complete copy of
the mesh will not only fit on each node in a parallel machine, but also
fit into cache memory in most modern CPUs. Thus, this benchmark does not
stress memory access.
Particles are born with an energy and direction that are determined by
using random numbers to sample from appropriate distributions. Likewise,
scattering and absorption are also modeled by randomly sampling cross
sections. The random number generator used is implemented in the code
using integer arithmetic in a way that makes the resulting pseudo-random
number sequence portably reproducible across different machines.
The use of random numbers makes the code's output (edit variables)
"noisy." This noise is a direct result of the discrete nature of the
simulation. The level of the noise can be reduced by increasing the
number of particles that are used in the simulation. Unfortunately, the
level of noise in the answer decreases only very slowly with increasing
computational effort. The noise is inversely proportional to the square
root of the number of particles. If the noise is to be reduced to 1% of
the value in a given simulation, it is necessary to run 10,000 times as
many particles. Thus, high-quality (low-noise) simulations can become
very computationally expensive. Parallelism is an obvious way to
increase the number of particles.
B. Coding
With the exception of a single routine (rdinput.f) that includes
Fortran90 derived type usage, SPhot is coded entirely in Fortran77.
Parallelism is implemented through the use of both MPI and OpenMP
libraries.
C. Parallelization
SPhot falls into the category of "embarrassingly parallel" applications.
Every CPU that is employed in the computation works on a local copy of
the 2D mesh (most likely stored in cache), generates its own random
numbers and performs its own particle trackings. For the most part,
tasks do not exchange data with each other. The minimal communications
that do occur take place between the "master" MPI task and all other MPI
tasks for the purposes of distributing input data, updating global
variables and collecting timing statistics. Additionally, several
synchronization constructs are necessitated at both the MPI and OpenMP
level.
Parallelism in SPhot is implemented as a mixed-mode/hybrid model. It is
required that both MPI and OpenMP be used during execution, however
there is much flexibility for the user to choose how many MPI tasks and
OpenMP threads are used. In the most general sense, the MPI parallelism
is designed to provide communication between separate, distributed
memory machines, which may or may not be SMPs. Each machine hosts an MPI
task that oversees the OpenMP threaded execution of SPhot on that
machine, and facilitates required communications with MPI tasks on other
machines. However, having only one MPI task per machine is entirely up
to the user. Multiple MPI tasks can be run on a machine if this is
desired. Or, if the machine architecture employs global address space,
only a single MPI task may be required for the entire execution.
The OpenMP parallelism is strictly "on-node" (shared memory) execution
for SMP machines. Every OpenMP thread is "owned" by a single MPI task
with a ratio of /n:1/, where /n/ varies with the user-specified number
of OpenMP tasks. As with the number of MPI tasks, there is no explicit
specification within SPhot for the number of OpenMP threads which must
be used. Instead, these are specified by the user through the use of
environment variables or command line flags at run time. In this manner,
the same code can adapt to a diverse set of architectures and conditions.
------------------------------------------------------------------------
Files in This Distribution
The following files are included in this distribution of SPhot:
Subdirectory Files Comments
Makefile Main makefile for SPhot
opac.txt Required opacity library
input.dat Input file
bin (none) Initially empty - used to keep objects after
compilations
includes geomz.inc
globals.inc
params.inc
pranf.inc
randseed.inc
shared.inc
times.inc Required include files
src Makefile.src
allocdyn.f
copyglob.f
copypriv.f
copyseed.f
execute.f
genmesh.f
genstats.f
genxsec.f
interp.f
iranfeven.f
iranfodd.f
plnkut.f
pranf.f
ranf.f
ranfatok.f
ranfk.f
ranfkbinary.f
ranfmodmult.f
rans.f
rdinput.f
rdopac.f
second.f
seedranf.f
sphot.f
thom.f
writeout.f
wroutput.f
zonevols.f All Fortran source files and required makefile
------------------------------------------------------------------------
Building the Code
Building SPhot is fairly simple and straightforward:
1. *Perform Makefile Modifications*
Two makefiles are used to build the code:
Makefile
src/Makefile.src
Both of these makefiles will require manual modifications to
several parameters prior to attempting to build the code. The
parameters which require manual modification are clearly marked
with a "***Modifications Required*** notice, and include such
things as choice of compiler, compiler flags, library directories
and libraries. Note that with the exception of the rdinput.f file,
all source files are Fortran77. The rdinput.f routine uses Fortran
90 derived data type syntax and will therefore require a compiler
that can handle such.
It is important to mention that this code is implemented with both
MPI and OpenMP. Specifying the necessary platform dependent
parameters to enable both of these is essential.
Additionally, both makefiles permit the specification of C
language parameters, even though SPhot itself does not use any C
files. The reason for this is merely convenience in case there is
a desire to introduce C language profilers, timing tools or
something similar.
2. *Modify Global Parameters*
SPhot has a number of parameters which are "hardcoded" in its
various include files. In most cases, these do not require
modification. However, it is quite possible that for larger runs,
several of these parameters may be too small and require
modification. Each of these is discussed in the table below.
Parameter Where Located Instructions
*maxruns* includes/params.inc Establishes the maximum value for
the Nruns input file parameter. Must be at least equal to (Number
of MPI tasks * Number of OpenMP threads). Default value is 65537.
Requires source code to be recompiled if modified.
*MaxStreams* includes/pranf.inc Must be equal to or greater than
the previous maxruns parameter. Must always be an odd number.
Default value is 65537. Requires source code to be recompiled if
modified.
*maxMPItasks* includes/times.inc Establishes the maximum number
of MPI tasks. Default value is 16384. Requires source code to be
recompiled if modified.
*maxThreadsPerMPItask* includes/times.inc Establishes the
maximum number of OpenMP threads per MPI task. Default value is
128. Requires source code to be recompiled if modified.
3. *Make*
In the main directory, simply issue the make command. This will
invoke the primary Makefile, which in turn will invoke the
src/Makefile.src makefile. The files in the src subdirectory will
then be compiled and their objects copied to the bin directory.
The sphot executable will be created and copied into the main
directory.
4. *Cleanup*
Invoking the command make clean in the main directory will remove
all object files in the bin subdirectory and the sphot executable.
------------------------------------------------------------------------
Running the Code
Several steps are required to execute SPhot. These are listed below.
1. *Set the Nruns Input File Parameter*
The distribution input file input.dat specifies a number of
parameters that control the execution of SPhot. It is usually
necessary to change only one of these - the Nruns parameter. Nruns
represents the "number of runs" that should be conducted. It must
be evenly divisible by the (Number of MPI tasks * Number of OpenMP
threads). To change Nruns, simply edit the input.dat file. Nruns
appears as the single integer value on the first line of the file.
Note that the default maximum value for Nruns is 65537. If this
value is too small and must be increased, then an include file
must be modified and the source recompiled. See the Building the
Code section for additional instructions.
The three tables below show:
* The raw format of input.dat upon distribution
* A description of each input.dat parameter
* Examples of how the Nruns parameter may be changed.
*Raw format of input.dat file* (Nruns parameter in blue)
*16*
BENCHMARK CASE NO THOMS, NO RR/SPLIT: 49X40, BWGT=3.12e+14
1 0
4000 0 0 0 0 0 0
49 40 2
1.00e-10 7.50e-01 1.00e-12 1.05e+00
1.80e+02 1.00e-03 5.00e-01 3.12e+12
1 980
981 1960
3 4.000e-01 1.000e+03 2.000e+04
2 5.000e-01 1.000e+02 1.000e+03
0
*Description of input.dat parameters*
*
Field Format Comments
Name
*
Nruns i4 Number of runs. Should be evenly divided by the
number of MPI tasks and OpenMP threads.
title 20a4 Title information
ilib i10 opacity library format: 0=binary 1=formatted
illib i10 bin library usage:
0=use old bin.lib 1=form new bin.lib
npart i10 # of particles
igroup i10 energy bins (0=12, 1-12=1, 13=ross.mean)
ixopec i10 opacity (0=library, 1=input,2=data)
isorst i10 source (1=uniform in sphere, 2=plankian)
irr i10 r-roulette/split (0=none, 1=impt, 2=size)
ithom i10 thomson scattering (0=not used, 1=used)
icross i10 print cross sections (0= no, 1= yes)
naxl i10 number of axial meshes
nradl i10 number of radial meshes
nreg i10 number of material regions
dtol e10.2 tolerance to cell boundaries (cm)
wcut e10.2 low weight cutoff
tcen e10.2 time to census (sec)
xmult e10.2 weight mult. for russian roulette
axl e10.2 portion of sphere analyzed (degrees)
radl e10.2 sphere radius (cm)
opec e10.2 input opacity (1/cm)
bwgt e10.2 bundle weight (kev)
matb,mat i10 material region begin/end pair. One pair for each of
nreg materials
mtl i10 Material - one value for each of nreg materials.
1=h 2=sio2 3=dt 4=c
atrat e10.3 Atomic Ratio - one value for each of nreg materials
dns e10.3 Density - one value for each of nreg materials
tmp e10.3 Temperature - one value for each of nreg materials
prtflg i5 Print flag to select level of output( see summary page for detail)
*Examples of how the Nruns parameter might be modified*
Number of MPI Tasks Number of OpenMP
Threads per MPI task Possible Values for NRuns
1 1 1, 2, 3...
1 16 16, 32, 64...
1 1000 1000, 2000, 3000...
16 4 64, 128, 256...
256 4 1024, 2048, 3072...
1000 1 1000, 2000, 3000...
2. *Specify the Number of MPI Tasks*
The method for doing this will vary from platform to platform but
will be typical for a given platform. For example, an IBM RS/6000
SP platform may use POE environment variables, such as MP_PROCS,
MP_NODES or MP_TASKS_PER_NODE. Alternately, the Compaq Tru64 Alpha
platform may use the prun -n command.
Note that the optimal number of MPI tasks to use on a platform
will also vary, a determination which is left up to the user. In
no case should the number of MPI tasks exceed the total number of
processors/cpus available.
The default maximum number of MPI tasks is 1024. If this value is
too small, then an include file must be modified and the source
recomplied. See the Building the Code section for additional instructions.
3. *Specify the Number of OpenMP Threads per MPI Task*
Specifying the number of OpenMP threads is done according to the
OpenMP standard by using either the platform default (usually the
number of cpus on a machine), or explicitly with the
OMP_NUM_THREADS environment variable. As with the number of MPI
tasks, determining the optimal number of OpenMP threads is left up
to the user, and will vary from platform to platform.
The default maximum number of OpenMP threads per MPI task is 128.
If this value is too small, then an include file must be modified
and the source recomplied. See the Building the Code
section for additional instructions.
4. *Invoke the Executable*
This will vary from platform to platform also, especially if a
"batch scheduling" system is used to run jobs. In the most simple
case, issuing a command similar to that shown below would work:
prompt> sphot <cr>
where:
sphot is the executable
It is assumed that the input.dat file and opac.txt file are in the
same directory where sphot is invoked.
------------------------------------------------------------------------
Timing Issues
All timings are obtained through the use of the portable MPI wall clock
timing routine, MPI_Wtime(). Instrumentation occurs in several locations
within the code, most importantly around the computational loop. Some
instrumentations are trivial and one is actually unnecessary (the timing
of the allocdyn.f routine).
The execution time for one "run" of SPhot is platform dependent, but in
any case does not require much time--usually only "seconds", and
certainly much less than a minute on all current architectures tested.
Most architectures tested demonstrated execution times of 10-20 seconds
per run. Compiler optimization should improve execution times in most
cases.
Every task/thread maintains its own timings, which are collected and
maintained globally by the "master" MPI task (rank=0). In this sense,
the instrumentation of SPhot actually imposes some overhead. As the
number of MPI tasks increases, so does the ratio of communication time
to execution time. Since SPhot requires so little execution time per run
to begin with, MPI overhead may gradually reduce the application's
scalability for many-task runs. The associated MPI_Barrier calls
required for synchronization (most notably in copypriv.f) will also
incur additional overhead as the number of MPI tasks increase.
The most important timing parameters are the loop speedup, program
speedup and efficiency computations that appear at the end of the
application's output (example output is available in the "Expected
Results" section). These values reflect the scalability of the
application on a given platform.
Please see the Summary writeup for information about timing data to be
reported.
------------------------------------------------------------------------
Expected Results
Please see the Summary writeup for information about expected results and
reporting of data.
Version 0.9.1
For more information about this page, contact:
Tom Spelce, spelce1@llnl.gov
<http://www.llnl.gov/>
*UCRL-MI-144211*
September 19, 2001