hyPACK-2013 : Tuning and Performance of Programs/Benchmarks Using Math Libraries
|
Tuning and Performance of Application Programs using Compiler optimisation techniques,
Codre restructuring techniques and system tuned mathematical libraries
on Multi-Core Processors will enhance performance. Performance and scalability of application
on multi-core processors with respect to increase in problem size require serious effrots.
System provided tuned mathematical libraries on Intel, IBM P690 are discussed below.
|
IBM ESSL Mathematical Libraries
|
(a). ESSL Libraries
|
The performance of computer depends how fast the system can move data between processors and memories.The
mathematical libraries are tuned to architecture and one can use the best compiler falgs to get the best
sustained performance. The compilers used for compiling Fortran and C programs are xlf and xlc provided
on IBM AIX Systems.
Besides the standard libraries, the Sequential Programs use BLAS libraries and IBM AIX -ESSL libraries for
demonstrating the performance of some of the matrix operations using the subroutines provided by these
libraries. The BLAS (Basic Linear Algebra Subprograms) are high quality "building block" routines for
performing basic vector and matrix operations. Level 1 BLAS does vector-vector operations, Level 2
BLAS does matrix-vector operations, and Level 3 BLAS does matrix-matrix operations. Because the BLAS
is efficient, portable, and widely available, it is commonly used in the development of high quality
linear algebra software like LINPACK and LAPACK. They are available at www.netlib.org/blas/. Information
about BLAS can be found at www.netlib.org/blas/faq.html. The ESSL libraries are the libraries
providing the various subroutines for matrix-vector operations tuned to the IBM POWER5/Power6 machine a
rchitecture (shared-memory processor architecture). The operations include solution of linear system of
equations, dot product of vectors, matrix-matrix multiplication. These are highly optimized keeping
in mind the memory and cache hierarchy of POWER4 architecture resulting in high performance for Linear
Algebra problems with large problem sizes.
For information on ESSL libraries , one can go through "Engineering and Scientific Subroutine Library
for AIX Version 3 Release 3: Guide and Reference" at
IBM ESSL Library
and
I
http://www-1.ibm.com/servers/eserver/pseries/library/sp_books/essl.html
The subroutines from BLAS and ESSL libraries used in this module are:
ddot subroutine: This is the subroutine from BLAS level 1 libraries which calculates the
dot product of two double precision vectors given by X and Y. The starting letter d
refers to double precision operation. The return value is a double precision value.
|
Calling sequence in Fortran |
dot = ddot(N, DX, INCX, DY, INCY) |
Calling Sequence in C |
double ddot(int N, double *DX, int INCX, double *DY, int INCY) |
Arguments : |
|
N |
Number of elements in the vector; Default=0. |
DX |
Input double-precision vector X; the size of array X must be at least
max(1,N*|INCX|).
|
INCX |
Specifies the storage spacing between successive elements of the vector X. A
value of one indicates that the elements of the vector are consecutive in memory.
|
DY |
Input double-precision vector Y; the size of array Y must be at least max(1,N*|INCY|). |
INCY |
Specifies the storage spacing between successive elements of the vector Y. A
value of one indicates that the elements of the vector are consecutive in memory.
|
|
dgesv subroutine: This subroutine solves a linear system AX = B
for a square general matrix A and general matrices B and X. The starting letter
d refers to double precision operation.This is the present in LAPACK
subroutines in the IBM ESSL libraries.
|
Calling sequence in Fortran |
call dgesv (N, NRHS, DA, LDA, IPIVOT, DB, LDB, INFO) |
Calling Sequence in C |
void dgesv (int N, int NRHS, double *DA, int LDA, int
*IPIVOT, double *DB, int LDB, int *INFO)
|
Arguments |
|
N |
Order of Matrix A; Default=0 |
NRHS |
Number of right-hand sides, equal to the number of columns of the matrix B.
Default=0.
|
DA |
On entry, the N*N matrix A. |
LDA |
Leading dimension of the array A as specified in a dimension or type statement. Default :
LDA= max(1, N).
|
IPIVOT |
On exit, pivot indices as computed by DGETRF routine. |
DB |
On entry, the N*NRHS right-hand side matrix B. On exit, the N*NRHS solution matrix X. |
LDB |
LDB Leading dimension of the array B as specified in a dimension or type statement. LDB . max(1, N). |
Below information is about successful completion of mathematical routine.
On exit:
INFO = 0: Subroutine completed normally:
INFO < 0 The ith argument, where i = | INFO |, had an illegal value.
INFO > 0 U(i,i), where i = INFO, is exactly zero and U is therefore singular. The LU factorization
has been completed, but the solution could not be computed
|
(b). Compilation & Execution
|
Compilation, Linking and Execution of Sequential Programs on PARAM Padma (IBM AIX -Power 5)
IBM AIX cluster runs AIX OS 5.1 L. It has the following Programming tools:
Compilers Available:
XL C Compiler
XL Fortran Compiler
GNU C Compiler
Libraries Available:
ESSL - BLAS Level 1,2,3, LAPACK, LINPACK
ESSLSMP - Threaded versions of ESSL libraries
PESSL - Parallel version of the ESSL libraries for MPI BLACS
Using BLAS Libraries:
Using BLAS Downloadable from
NetLib.org
Using BLAS/LAPACK/LINPACK Libraries:
Using IBM ESSL/ESSL-SMP Libraries
How to compile and link:
For more control over the process of compiling and linking programs
for Sequential Programs, you should use a 'Makefile'.
You may also use some commands in Makefile particularly for programs
contained in a large number of files. The user has to specify the names
of the program and appropriate paths to link some of the libraries required for
the programs in the Makefile.
To compile a C/Fortran program linking with/without BLAS or ESSL or ESSL-SMP
libraries, the file Makefile
has to be edited as per the guidelines given in the Makefile. A routine from ESSL library can be used by linking
the program with -lessl option and multi-threaded version of routine can be used by linking with ESSL-SMP library which
is achieved by keeping -lesslsmp instead of -lessl.
Appropriate lines consisting of "F77=","FFLAGS=","LINKFLAGS=","
COBJECTS="," FOBJECTS=","BLASLIBS=" have to be uncommented
based on the guidelines given in the Makefile. One of the lines consisting of "COBJECTS="
has to be uncommented for compilation of a C program and one of the lines consisting of "FOBJECTS=" has to be
uncommented for compilation of a Fortran program.
After editing the Makefile, one can type on command-line
make runc
for compilation of a C program and
make runf
for compilation of a Fortran program.
This creates an executable runc or runf for C and Fortran programs respectively.
For the Hands-On Session on IBM AIX cluster, the application user can use the Makefile.
How to execute:
After the creation of an executable runc or runf, execution of the program can be done by issuing
a command
./runc or ./runf
However, if the program is linked with ESSL-SMP library routines, the program will execute using multiple
threads.
The Makefile and the procedure used in the Hands-on session for linking with ESSL-SMP routines is
intended to create a multi-threaded environment using OpenMP threads. After editing the Makefile using
the guidelines in the Makefile and after compilation using ESSL-SMP libraries, runc or runf
are created. The number of threads is set using the environment variable
OMP_NUM_THREADS prior to execution
export OMP_NUM_THREADS = <number of threads >
For example, to execute runc or runf using 4 threads,
the number of threads have to be set prior to execution using
export OMP_NUM_THREADS = 4
After setting the number of threads, the executable runc or runf can be
executed.
|
|
|
| |