SingleSource/Benchmarks/Polybench/README - third_party/llvm-test-suite - Git at Google

 * * * * * * * * * * * * * * *
 * PolyBench/C 4.2.1 (beta)  *
 * * * * * * * * * * * * * * *

 Copyright (c) 2011-2016 the Ohio State University.

 Contact:
    Louis-Noel Pouchet <pouchet@cse.ohio-state.edu>
    Tomofumi Yuki <tomofumi.yuki@inria.fr>


 PolyBench is a benchmark suite of 30 numerical computations with
 static control flow, extracted from operations in various application
 domains (linear algebra computations, image processing, physics
 simulation, dynamic programming, statistics, etc.). PolyBench features
 include:
 - A single file, tunable at compile-time, used for the kernel
   instrumentation. It performs extra operations such as cache flushing
   before the kernel execution, and can set real-time scheduling to
   prevent OS interference.
 - Non-null data initialization, and live-out data dump.
 - Syntactic constructs to prevent any dead code elimination on the kernel.
 - Parametric loop bounds in the kernels, for general-purpose implementation.
 - Clear kernel marking, using pragma-based delimiters.


 PolyBench is currently available in C and in Fortran:
 - See PolyBench/C 4.2.1 for the C version
 - See PolyBench/Fortran 1.0 for the Fortran version (based on PolyBench/C 3.2)

 Available benchmarks (PolyBench/C 4.2.1)

 Benchmark	Description
 2mm		2 Matrix Multiplications (alpha * A * B * C + beta * D)
 3mm		3 Matrix Multiplications ((A*B)*(C*D))
 adi		Alternating Direction Implicit solver
 atax		Matrix Transpose and Vector Multiplication
 bicg		BiCG Sub Kernel of BiCGStab Linear Solver
 cholesky	Cholesky Decomposition
 correlation	Correlation Computation
 covariance	Covariance Computation
 deriche		Edge detection filter
 doitgen		Multi-resolution analysis kernel (MADNESS)
 durbin		Toeplitz system solver
 fdtd-2d		2-D Finite Different Time Domain Kernel
 gemm		Matrix-multiply C=alpha.A.B+beta.C
 gemver		Vector Multiplication and Matrix Addition
 gesummv		Scalar, Vector and Matrix Multiplication
 gramschmidt	Gram-Schmidt decomposition
 head-3d		Heat equation over 3D data domain
 jacobi-1D	1-D Jacobi stencil computation
 jacobi-2D	2-D Jacobi stencil computation
 lu		LU decomposition
 ludcmp		LU decomposition followed by Forward Substitution
 mvt		Matrix Vector Product and Transpose
 nussinov	Dynamic programming algorithm for sequence alignment
 seidel		2-D Seidel stencil computation
 symm		Symmetric matrix-multiply
 syr2k		Symmetric rank-2k update
 syrk		Symmetric rank-k update
 trisolv		Triangular solver
 trmm		Triangular matrix-multiply


 See the end of the README for mailing lists, instructions to use
 PolyBench, etc.

 --------------------
 * New in 4.2.1-beta:
 --------------------
  - Fix a bug in PAPI support, introduced in 4.2
  - Support PAPI 5.4.x

 -------------
 * New in 4.2:
 -------------
  - Fixed a bug in syr2k.
  - Changed the data initialization function of several benchmarks.
  - Minor updates in the documentation and PolyBench API.

 -------------
 * New in 4.1:
 -------------
  - Added LICENSE.txt
  - Fixed minor issues with cholesky both in documentation and implementation.
    (Reported by François Gindraud)
  - Simplified the macros for switching between data types. Now users
    may specify DATA_TYPE_IS_XXX where XXX is one of FLOAT/DOUBLE/INT
    to change all macros associated with data types.

 -------------
 * New in 4.0a:
 -------------
  - Fixed a bug in jacobi-1d (Reported by Sven Verdoolaege)

 -------------
 * New in 4.0:
 -------------

 This update includes many changes. Please see CHANGELOG for detailed
 list of changes. Most of the benchmarks have been edited/modified by
 Tomofumi Yuki, thanks to the feedback we have received by PolyBench
 users for the past few years.

 - Three benchmarks are out: dynprog, reg-detect, fdtd-apml.
 - Three benchmarks are in: nussinov, deriche, heat-3d.
 - Jacobi-1D and Jacobi-2D perform two time steps in one time loop
   iteration alternating the source and target fields, to avoid the
   field copy statement.
 - Almost all benchmarks have been edited to ensure the computation
   result matches the mathematical specification of the operation.
 - A major effort on documentation and harmonization of problem sizes
   and data allocations schemes.

 * Important Note:
 -----------------

 PolyBench/C 3.2 kernels had numerous implementation errors making
 their outputs to not match what is expected from the mathematical
 specification of the operation. Many of them did not influence the
 program behavior (e.g., the number and type of operations, data
 dependences, and overall control-flow was similar to the corrected
 implementation), however, some had non-negligible impact. These are
 described below.

  - adi: There was an off-by-one error, which made back substitution
    part of a pass in ADI to not depend on the forward pass, making the
    program fully tilable.
 - syrk: A typo on the loop bounds made the iteration space rectangular
    instead of triangular. This has led to additional dependences and
    two times more operations than intended.
 - trmm: A typo on the loop bounds led to the wrong half of the matrix
    being used in the computation. This led to additional dependences,
    making it harder to parallelize this kernel.
 - lu: An innermost loop was missing for the operation to be valid on
    general matrices. This cause the kernel to perform about half the
    work compared to a general implementation of LU decomposition. The
    new implementation is the generic LU decomposition.

 In addition, some of the kernels used "high-footprint" memory allocation for
 easier parallelization, where variables used in accumulation were fully
 expanded. These variables were changed to only use a scalar.


 -------------
 * New in 3.2:
 -------------

 - Rename the package to PolyBench/C, to prepare for the upcoming
   PolyBench/Fortran and PolyBench/GPU.
 - Fixed a typo in polybench.h, causing compilation problems for 5D arrays.
 - Fixed minor typos in correlation, atax, cholesky, fdtd-2d.
 - Added an option to build the test suite with constant loop bounds
   (default is parametric loop bounds)

 -------------
 * New in 3.1:
 -------------

 - Fixed a typo in polybench.h, causing compilation problems for 3D arrays.
 - Set by default heap arrays, stack arrays are now optional.

 -------------
 * New in 3.0:
 -------------

 - Multiple dataset sizes are predefined. Each file comes now with a .h
   header file defining the dataset.
 - Support of heap-allocated arrays. It uses a single malloc for the
   entire array region, the data allocated is cast into a C99
   multidimensional array.
 - One benchmark is out: gauss_filter
 - One benchmark is in: floyd-warshall
 - PAPI support has been greatly improved; it also can report the
   counters on a specific core to be set by the user.


 ----------------
 * Mailing lists:
 ----------------

 ** polybench-announces@lists.sourceforge.net:
 ---------------------------------------------

 Announces about releases of PolyBench.

 ** polybench-discussion@lists.sourceforge.net:
 ----------------------------------------------

 General discussions reg. PolyBench.


 -----------------------
 * Available benchmarks:
 -----------------------

 See utilities/benchmark_list for paths to each files.
 See doc/polybench.pdf for detailed description of the algorithms.


 ------------------------------
 * Sample compilation commands:
 ------------------------------

 ** To compile a benchmark without any monitoring:
 -------------------------------------------------

 $> gcc -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -o atax_base


 ** To compile a benchmark with execution time reporting:
 --------------------------------------------------------

 $> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_TIME -o atax_time


 ** To generate the reference output of a benchmark:
 ---------------------------------------------------

 $> gcc -O0 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_DUMP_ARRAYS -o atax_ref
 $> ./atax_ref 2>atax_ref.out


 -------------------------
 * Some available options:
 -------------------------

 They are all passed as macro definitions during compilation time (e.g,
 -Dname_of_the_option).

 ** Typical options:
 -------------------

 - POLYBENCH_TIME: output execution time (gettimeofday) [default: off]

 - MINI_DATASET, SMALL_DATASET, MEDIUM_DATASET, LARGE_DATASET,
   EXTRALARGE_DATASET: set the dataset size to be used
   [default: STANDARD_DATASET]

 - POLYBENCH_DUMP_ARRAYS: dump all live-out arrays on stderr [default: off]

 - POLYBENCH_STACK_ARRAYS: use stack allocation instead of malloc [default: off]


 ** Options that may lead to better performance:
 -----------------------------------------------

 - POLYBENCH_USE_RESTRICT: Use restrict keyword to allow compilers to
   assume absence of aliasing. [default: off]

 - POLYBENCH_USE_SCALAR_LB: Use scalar loop bounds instead of parametric ones.
   [default: off]

 - POLYBENCH_PADDING_FACTOR: Pad all dimensions of all arrays by this
   value [default: 0]

 - POLYBENCH_INTER_ARRAY_PADDING_FACTOR: Offset the starting address of
   polybench arrays allocated on the heap (default) by a multiple of
   this value [default: 0]

 - POLYBENCH_USE_C99_PROTO: Use standard C99 prototype for the functions.
   [default: off]


 ** Timing/profiling options:
 ----------------------------

 - POLYBENCH_PAPI: turn on papi timing (see below).

 - POLYBENCH_CACHE_SIZE_KB: cache size to flush, in kB [default: 33MB]

 - POLYBENCH_NO_FLUSH_CACHE: don't flush the cache before calling the
   timer [default: flush the cache]

 - POLYBENCH_CYCLE_ACCURATE_TIMER: Use Time Stamp Counter to monitor
   the execution time of the kernel [default: off]

 - POLYBENCH_LINUX_FIFO_SCHEDULER: use FIFO real-time scheduler for the
   kernel execution, the program must be run as root, under linux only,
   and compiled with -lc [default: off]


 ---------------
 * PAPI support:
 ---------------

 ** To compile a benchmark with PAPI support:
 --------------------------------------------

 $> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_PAPI -lpapi -o atax_papi


 ** To specify which counter(s) to monitor:
 ------------------------------------------

 Edit utilities/papi_counters.list, and add 1 line per event to
 monitor. Each line (including the last one) must finish with a ',' and
 both native and standard events are supported.

 The whole kernel is run one time per counter (no multiplexing) and
 there is no sampling being used for the counter value.


 ------------------------------
 * Accurate performance timing:
 ------------------------------

 With kernels that have an execution time in the orders of a few tens
 of milliseconds, it is critical to validate any performance number by
 repeating several times the experiment. A companion script is
 available to perform reasonable performance measurement of a PolyBench.

 $> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_TIME -o atax_time
 $> ./utilities/time_benchmark.sh ./atax_time

 This script will run five times the benchmark (that must be a
 PolyBench compiled with -DPOLYBENCH_TIME), eliminate the two extremal
 times, and check that the deviation of the three remaining does not
 exceed a given threshold, set to 5%.

 It is also possible to use POLYBENCH_CYCLE_ACCURATE_TIMER to use the
 Time Stamp Counter instead of gettimeofday() to monitor the number of
 elapsed cycles.


 ----------------------------------------
 * Generating macro-free benchmark suite:
 ----------------------------------------

 (from the root of the archive:)
 $> PARGS="-I utilities -DPOLYBENCH_TIME";
 $> for i in `cat utilities/benchmark_list`; do perl utilities/create_cpped_version.pl $i "$PARGS"; done

 This create for each benchmark file 'xxx.c' a new file
 'xxx.preproc.c'. The PARGS variable in the above example can be set to
 the desired configuration, for instance to create a full C99 version
 (parametric arrays):

 $> PARGS="-I utilities -DPOLYBENCH_USE_C99_PROTO";
 $> for i in `cat utilities/benchmark_list`; do perl utilities/create_cpped_version.pl $i "$PARGS"; done


 ------------------
 * Utility scripts:
 ------------------
 create_cpped_version.pl: Used in the above for generating macro free version.

 makefile-gen.pl: generates make files in each directory. Options are globally
                  configurable through config.mk at polybench root.
   header-gen.pl: refers to 'polybench.spec' file and generates header in
                  each directory. Allows default problem sizes and datatype to
                  be configured without going into each header file.

     run-all.pl: compiles and runs each kernel.
       clean.pl: runs make clean in each directory and then removes Makefile.
	* * * * * * * * * * * * * * *
	* PolyBench/C 4.2.1 (beta) *
	* * * * * * * * * * * * * * *

	Copyright (c) 2011-2016 the Ohio State University.

	Contact:
	Louis-Noel Pouchet <pouchet@cse.ohio-state.edu>
	Tomofumi Yuki <tomofumi.yuki@inria.fr>


	PolyBench is a benchmark suite of 30 numerical computations with
	static control flow, extracted from operations in various application
	domains (linear algebra computations, image processing, physics
	simulation, dynamic programming, statistics, etc.). PolyBench features
	include:
	- A single file, tunable at compile-time, used for the kernel
	instrumentation. It performs extra operations such as cache flushing
	before the kernel execution, and can set real-time scheduling to
	prevent OS interference.
	- Non-null data initialization, and live-out data dump.
	- Syntactic constructs to prevent any dead code elimination on the kernel.
	- Parametric loop bounds in the kernels, for general-purpose implementation.
	- Clear kernel marking, using pragma-based delimiters.


	PolyBench is currently available in C and in Fortran:
	- See PolyBench/C 4.2.1 for the C version
	- See PolyBench/Fortran 1.0 for the Fortran version (based on PolyBench/C 3.2)

	Available benchmarks (PolyBench/C 4.2.1)

	Benchmark Description
	2mm 2 Matrix Multiplications (alpha * A * B * C + beta * D)
	3mm 3 Matrix Multiplications ((AB)(C*D))
	adi Alternating Direction Implicit solver
	atax Matrix Transpose and Vector Multiplication
	bicg BiCG Sub Kernel of BiCGStab Linear Solver
	cholesky Cholesky Decomposition
	correlation Correlation Computation
	covariance Covariance Computation
	deriche Edge detection filter
	doitgen Multi-resolution analysis kernel (MADNESS)
	durbin Toeplitz system solver
	fdtd-2d 2-D Finite Different Time Domain Kernel
	gemm Matrix-multiply C=alpha.A.B+beta.C
	gemver Vector Multiplication and Matrix Addition
	gesummv Scalar, Vector and Matrix Multiplication
	gramschmidt Gram-Schmidt decomposition
	head-3d Heat equation over 3D data domain
	jacobi-1D 1-D Jacobi stencil computation
	jacobi-2D 2-D Jacobi stencil computation
	lu LU decomposition
	ludcmp LU decomposition followed by Forward Substitution
	mvt Matrix Vector Product and Transpose
	nussinov Dynamic programming algorithm for sequence alignment
	seidel 2-D Seidel stencil computation
	symm Symmetric matrix-multiply
	syr2k Symmetric rank-2k update
	syrk Symmetric rank-k update
	trisolv Triangular solver
	trmm Triangular matrix-multiply


	See the end of the README for mailing lists, instructions to use
	PolyBench, etc.

	--------------------
	* New in 4.2.1-beta:
	--------------------
	- Fix a bug in PAPI support, introduced in 4.2
	- Support PAPI 5.4.x

	-------------
	* New in 4.2:
	-------------
	- Fixed a bug in syr2k.
	- Changed the data initialization function of several benchmarks.
	- Minor updates in the documentation and PolyBench API.

	-------------
	* New in 4.1:
	-------------
	- Added LICENSE.txt
	- Fixed minor issues with cholesky both in documentation and implementation.
	(Reported by François Gindraud)
	- Simplified the macros for switching between data types. Now users
	may specify DATA_TYPE_IS_XXX where XXX is one of FLOAT/DOUBLE/INT
	to change all macros associated with data types.

	-------------
	* New in 4.0a:
	-------------
	- Fixed a bug in jacobi-1d (Reported by Sven Verdoolaege)

	-------------
	* New in 4.0:
	-------------

	This update includes many changes. Please see CHANGELOG for detailed
	list of changes. Most of the benchmarks have been edited/modified by
	Tomofumi Yuki, thanks to the feedback we have received by PolyBench
	users for the past few years.

	- Three benchmarks are out: dynprog, reg-detect, fdtd-apml.
	- Three benchmarks are in: nussinov, deriche, heat-3d.
	- Jacobi-1D and Jacobi-2D perform two time steps in one time loop
	iteration alternating the source and target fields, to avoid the
	field copy statement.
	- Almost all benchmarks have been edited to ensure the computation
	result matches the mathematical specification of the operation.
	- A major effort on documentation and harmonization of problem sizes
	and data allocations schemes.

	* Important Note:
	-----------------

	PolyBench/C 3.2 kernels had numerous implementation errors making
	their outputs to not match what is expected from the mathematical
	specification of the operation. Many of them did not influence the
	program behavior (e.g., the number and type of operations, data
	dependences, and overall control-flow was similar to the corrected
	implementation), however, some had non-negligible impact. These are
	described below.

	- adi: There was an off-by-one error, which made back substitution
	part of a pass in ADI to not depend on the forward pass, making the
	program fully tilable.
	- syrk: A typo on the loop bounds made the iteration space rectangular
	instead of triangular. This has led to additional dependences and
	two times more operations than intended.
	- trmm: A typo on the loop bounds led to the wrong half of the matrix
	being used in the computation. This led to additional dependences,
	making it harder to parallelize this kernel.
	- lu: An innermost loop was missing for the operation to be valid on
	general matrices. This cause the kernel to perform about half the
	work compared to a general implementation of LU decomposition. The
	new implementation is the generic LU decomposition.

	In addition, some of the kernels used "high-footprint" memory allocation for
	easier parallelization, where variables used in accumulation were fully
	expanded. These variables were changed to only use a scalar.


	-------------
	* New in 3.2:
	-------------

	- Rename the package to PolyBench/C, to prepare for the upcoming
	PolyBench/Fortran and PolyBench/GPU.
	- Fixed a typo in polybench.h, causing compilation problems for 5D arrays.
	- Fixed minor typos in correlation, atax, cholesky, fdtd-2d.
	- Added an option to build the test suite with constant loop bounds
	(default is parametric loop bounds)

	-------------
	* New in 3.1:
	-------------

	- Fixed a typo in polybench.h, causing compilation problems for 3D arrays.
	- Set by default heap arrays, stack arrays are now optional.

	-------------
	* New in 3.0:
	-------------

	- Multiple dataset sizes are predefined. Each file comes now with a .h
	header file defining the dataset.
	- Support of heap-allocated arrays. It uses a single malloc for the
	entire array region, the data allocated is cast into a C99
	multidimensional array.
	- One benchmark is out: gauss_filter
	- One benchmark is in: floyd-warshall
	- PAPI support has been greatly improved; it also can report the
	counters on a specific core to be set by the user.



	----------------
	* Mailing lists:
	----------------

	** polybench-announces@lists.sourceforge.net:
	---------------------------------------------

	Announces about releases of PolyBench.

	** polybench-discussion@lists.sourceforge.net:
	----------------------------------------------

	General discussions reg. PolyBench.



	-----------------------
	* Available benchmarks:
	-----------------------

	See utilities/benchmark_list for paths to each files.
	See doc/polybench.pdf for detailed description of the algorithms.



	------------------------------
	* Sample compilation commands:
	------------------------------

	** To compile a benchmark without any monitoring:
	-------------------------------------------------

	$> gcc -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -o atax_base


	** To compile a benchmark with execution time reporting:
	--------------------------------------------------------

	$> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_TIME -o atax_time


	** To generate the reference output of a benchmark:
	---------------------------------------------------

	$> gcc -O0 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_DUMP_ARRAYS -o atax_ref
	$> ./atax_ref 2>atax_ref.out



	-------------------------
	* Some available options:
	-------------------------

	They are all passed as macro definitions during compilation time (e.g,
	-Dname_of_the_option).

	** Typical options:
	-------------------

	- POLYBENCH_TIME: output execution time (gettimeofday) [default: off]

	- MINI_DATASET, SMALL_DATASET, MEDIUM_DATASET, LARGE_DATASET,
	EXTRALARGE_DATASET: set the dataset size to be used
	[default: STANDARD_DATASET]

	- POLYBENCH_DUMP_ARRAYS: dump all live-out arrays on stderr [default: off]

	- POLYBENCH_STACK_ARRAYS: use stack allocation instead of malloc [default: off]


	** Options that may lead to better performance:
	-----------------------------------------------

	- POLYBENCH_USE_RESTRICT: Use restrict keyword to allow compilers to
	assume absence of aliasing. [default: off]

	- POLYBENCH_USE_SCALAR_LB: Use scalar loop bounds instead of parametric ones.
	[default: off]

	- POLYBENCH_PADDING_FACTOR: Pad all dimensions of all arrays by this
	value [default: 0]

	- POLYBENCH_INTER_ARRAY_PADDING_FACTOR: Offset the starting address of
	polybench arrays allocated on the heap (default) by a multiple of
	this value [default: 0]

	- POLYBENCH_USE_C99_PROTO: Use standard C99 prototype for the functions.
	[default: off]


	** Timing/profiling options:
	----------------------------

	- POLYBENCH_PAPI: turn on papi timing (see below).

	- POLYBENCH_CACHE_SIZE_KB: cache size to flush, in kB [default: 33MB]

	- POLYBENCH_NO_FLUSH_CACHE: don't flush the cache before calling the
	timer [default: flush the cache]

	- POLYBENCH_CYCLE_ACCURATE_TIMER: Use Time Stamp Counter to monitor
	the execution time of the kernel [default: off]

	- POLYBENCH_LINUX_FIFO_SCHEDULER: use FIFO real-time scheduler for the
	kernel execution, the program must be run as root, under linux only,
	and compiled with -lc [default: off]



	---------------
	* PAPI support:
	---------------

	** To compile a benchmark with PAPI support:
	--------------------------------------------

	$> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_PAPI -lpapi -o atax_papi


	** To specify which counter(s) to monitor:
	------------------------------------------

	Edit utilities/papi_counters.list, and add 1 line per event to
	monitor. Each line (including the last one) must finish with a ',' and
	both native and standard events are supported.

	The whole kernel is run one time per counter (no multiplexing) and
	there is no sampling being used for the counter value.



	------------------------------
	* Accurate performance timing:
	------------------------------

	With kernels that have an execution time in the orders of a few tens
	of milliseconds, it is critical to validate any performance number by
	repeating several times the experiment. A companion script is
	available to perform reasonable performance measurement of a PolyBench.

	$> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_TIME -o atax_time
	$> ./utilities/time_benchmark.sh ./atax_time

	This script will run five times the benchmark (that must be a
	PolyBench compiled with -DPOLYBENCH_TIME), eliminate the two extremal
	times, and check that the deviation of the three remaining does not
	exceed a given threshold, set to 5%.

	It is also possible to use POLYBENCH_CYCLE_ACCURATE_TIMER to use the
	Time Stamp Counter instead of gettimeofday() to monitor the number of
	elapsed cycles.



	----------------------------------------
	* Generating macro-free benchmark suite:
	----------------------------------------

	(from the root of the archive:)
	$> PARGS="-I utilities -DPOLYBENCH_TIME";
	$> for i in `cat utilities/benchmark_list`; do perl utilities/create_cpped_version.pl $i "$PARGS"; done

	This create for each benchmark file 'xxx.c' a new file
	'xxx.preproc.c'. The PARGS variable in the above example can be set to
	the desired configuration, for instance to create a full C99 version
	(parametric arrays):

	$> PARGS="-I utilities -DPOLYBENCH_USE_C99_PROTO";
	$> for i in `cat utilities/benchmark_list`; do perl utilities/create_cpped_version.pl $i "$PARGS"; done



	------------------
	* Utility scripts:
	------------------
	create_cpped_version.pl: Used in the above for generating macro free version.

	makefile-gen.pl: generates make files in each directory. Options are globally
	configurable through config.mk at polybench root.
	header-gen.pl: refers to 'polybench.spec' file and generates header in
	each directory. Allows default problem sizes and datatype to
	be configured without going into each header file.

	run-all.pl: compiles and runs each kernel.
	clean.pl: runs make clean in each directory and then removes Makefile.