C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

Overview Venue : CMSD, UoH Key-Note/Invited Talks Faculty / Speakers Proceedings Downloads Past Tech. Workshops Target Audience Benefits Organisers Accommodation Local Travel Sponsors Feedback Acknowledgements Contact Home

Topics of Interest Tech. Prog. Schedule Topic : Multi-Core Topic : ARM Proc. Topic : Coprocessors Topic : GPGPUs Topic : HPC Cluster Topic : App. Kernels. Topic : Lab. Session Key-Note / Invited Talks Home

Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compilers - Opt. Features Threads-Perf. Math. Lib. Threads-Prof. & Tools Threads - I/O Perf. PGAS : UPC / CAF/ GA Power & Perf. Home

Mode-2 ARM Prog. Env Benchmarks Power & Perf. Home

Mode-3 Coprocessors Arch. Software Compiler & Vect. Prog. Env. Benchmarks Power & Perf. Home

Mode-4 GPGPUs NVIDIA - CUDA/OpenCL AMD APP - OpenCL GPGPUs - OpenCL GPGPUs : Power & Perf. Home

Mode-5 HPC Cluster HPC MPI Cluster GPU Cluster - NVIDIA GPU Cluster - AMD APP Cluster - Intel Coprocessors Cluster- Power & Perf. Home

Mode-6 App. Kernels PDE Solvers : FDM/FEM Image Processing - FFT Monte Carlo Methods String Srch. Seq. Analy. Video Process. Intr. Detcn. Sys App. Power & Perf. Home

Reg. Overview Pvt. Sector Pub. Sector Govt. Acad. Staff Students Reg. On-line Reg. Accommodation Contact Home

Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compiler Opt. Features Threads-Perf. Math.Lib. Threads-Prof. & Tools Threads-I/O Perf. PGAS : UPC / CAF / GA Power-Perf. Home

Programming on Multi-Core Processors Using TBB APIs

Example 2.1	Write a TBB program for vector vector multiplication.
Example 2.2	Write a TBB program for Matrix vector multiplication.
Example 2.3	Write a TBB program for Matrix matrix multiplication using blockrange2d directive.
Example 2.4	Write TBB program to obtain Solution of Poisson Equation (Partial differential Equations) by Finite Difference Method using One Dimensional Decomposition of Mesh.

Description of TBB Programs

Makefile : To compile the program.
(Download source code ; Makefile )

Example 2.1 : Vector vector multiplication using TBB.

Download source code ; tbb-vect-vect.cpp

Download source code ; mmap - tbb vect-vect multiplication (WinRAR ZIP archive)
(using mmap memory allocator )

Download source code ; malloc- tbb vect-vect multiplication (WinRAR ZIP archive)
(using malloc - memory allocator )

Objective

To write a TBB program to compute the vector-vector multiplication with TBB using block_range.

Description

This is an implementation of Vector-Vector multiplication using TBB block_range directive. A block_range is a template class provided by the library. It describes a one-dimensional iteration space over type T. parallel_for divides this entire iteration space into subspaces for each processor.

Input

Vector Size and Number Threads.

Output

Time taken for vector-vector computation.

Example 2.2 : Matrix vector multiplication using TBB.
Download source code ; tbb-mat-vect.cpp

Download source code ; mmap - tbb mat-vect multiplication (WinRAR ZIP archive)
(using mmap memory allocator )

Download source code ; malloc- tbb mat-vect multiplication (WinRAR ZIP archive)
(using malloc - memory allocator )

Objective

Write a programme for performing matrix-vector multiplication using TBB and measure the performance.

Description

In this code we are doing multiplication of matrix ma and vector vb and the resultant vector is stored in vector vc.
This example illustrates the use of parallel_for which breaks the iteration space into chunks and runs each chunk on a separate thread. Operator() function processes a chunk given by parallel_for.

It is assumed that number of columns of the matrix A and size of the vector are same.

Input

Size (nrows=ncols=vecsize) , Number of Threads

Output

Execution time in seconds required for matrix-vector multiplication.

Example 2.3 : Matrix Matrix multiplication using TBB blockranged2d directive.
Download source code ; tbb-matmat-multiply.cpp

Download source code ; mmap - tbb mat-mat multiplication (WinRAR ZIP archive)
(using mmap memory allocator )

Download source code ; malloc- tbb mat-mat multiplication (WinRAR ZIP archive)
(using malloc - memory allocator )

Objective

Write a TBB program for matrix-matrix multiplication using TBB blockranged2d directive and measure the performance.

Description

In this example we have shown uses a blocked_range2d to specify two dimensional iteration space.The blocked_range2d enables the two outermost loops of the serial version to become parallel loops.The Parallel_for recursively splits the blocked_range2d. it invokes operator() for each piece.

It is assumed that both matrix size are same.

Input

Number of threads , Size of matrix .

Output

Time taken for Matrix-Matrix computation.

Example 2.4 : Write TBB program to obtain Solution of Poisson Equation (Partial differential Equations) by Finite Difference Method using One Dimensional Decomposition of Mesh.

(Download source code : tbbpoisson)

Objective

Write a CUDA program to solve the Poisson equation with Dirichlet boundary conditions in two space dimensions by finite difference method on structured rectangular type of grid. Use Jacobi iteration method to solve the discretized equations.

Description

In the Poisson problem, the FD method is imposed a regular grid on the physical domain. It then approximate the derivative of unknown quantity u at a grid point by the ratio of the difference in u at two adjacent grid points to the distance between the grid point. In a simple situation, consider a square domain discretized by n x n grid points, as shown in the Figure 8.1(a). Assume that the grid points are numbered in a row-major order fashion from left to right and from top to bottom, as shown in the Figure 8.1(b). This ordering is called natural ordering . Given a total of n² points in the domain n x n grid, this numbering scheme labels the immediate neighbors of point i on the top, left, right, and bottom point as i - n, i -1, i +1 and i+n , respectively. Figure 8.1(b) represents partitioning of mesh using one dimensional partitioning

Figure 1(a). Finite difference grid: Five point stencil approximation Figure 1(b). One dimensional decomposition for 2-D problem, with n=7

Formulation of Poisson Problem

The Poisson problem is expressed by the equations

Lu = f(x, y) in the interior of domain [0,1] x [0,1]

Where L is Laplacian operator in two space dimensions.

      u(x,y) = g(x,y) on the boundary of the domain [0,1] x [0,1]

We define a square mesh (also called a grid) consisting of the points (x_i , y_i), given by

     x_i = i / n+1, i=0, ..., n+1,

     y_j = j / n+1, j=0, ...,n+1,

where there are n+2 points along each edge of the mesh. We will find an approximation to u(x , y) only at the points (x_i, y_j). Let u_{i, j} be the approximate solution to u at (x_i, y_j). and let h = 1/(n+1). Now, we can approximate at each of these points with the formula

   (u_i-_{1, j}+u_i,
j+₁+u_{i, j-}₁+u_i+_1,
j-4u_{i, j})/ h² = f_{i, j}.

Rewriting the above equation as
u_{i, j} = 1/4(u_i-1,
j+u_{i, j+}₁+u_{i, j-}₁+u_i+_1,
j-h²f_{i, j}),

Jacobi iteration method is employed to obtain final solution starting with initial guess solution u_i,j^k for k=0 for all mesh points u _i,j and solve the following equation iteratively until the solution is converged.

u_{i, j}^k+¹ = 1/4(u_i-_{1, j}^k+u_i,
j+₁^k+u_{i, j-}₁^k+u_i+_1,
j^k-h²f_{i, j}).

The resultant discretized banded matrix is shown in the Figure 2.

Input

The input to the problem is given as arguments in the command line. It should be given in the following format ; Suppose that the number of points in the X direction is m, the number of points in the Y direction is n then the program must be run as,

./program_name m n

CPU assigns the boundary values for the Uold and Unew vectors.

Output

CPU prints the final Unew vector which is the solution to the Poisson equation with the above stated boundary values

Centre for Development of Advanced Computing