C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

Overview Venue : CMSD, UoH Key-Note/Invited Talks Faculty / Speakers Proceedings Downloads Past Tech. Workshops Target Audience Benefits Organisers Accommodation Local Travel Sponsors Feedback Acknowledgements Contact Home

Topics of Interest Tech. Prog. Schedule Topic : Multi-Core Topic : ARM Proc. Topic : Coprocessors Topic : GPGPUs Topic : HPC Cluster Topic : App. Kernels. Topic : Lab. Session Key-Note / Invited Talks Home

Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compilers - Opt. Features Threads-Perf. Math. Lib. Threads-Prof. & Tools Threads - I/O Perf. PGAS : UPC / CAF/ GA Power & Perf. Home

Mode-2 ARM Prog. Env Benchmarks Power & Perf. Home

Mode-3 Coprocessors Arch. Software Compiler & Vect. Prog. Env. Benchmarks Power & Perf. Home

Mode-4 GPGPUs NVIDIA - CUDA/OpenCL AMD APP - OpenCL GPGPUs - OpenCL GPGPUs : Power & Perf. Home

Mode-5 HPC Cluster HPC MPI Cluster GPU Cluster - NVIDIA GPU Cluster - AMD APP Cluster - Intel Coprocessors Cluster- Power & Perf. Home

Mode-6 App. Kernels PDE Solvers : FDM/FEM Image Processing - FFT Monte Carlo Methods String Srch. Seq. Analy. Video Process. Intr. Detcn. Sys App. Power & Perf. Home

Reg. Overview Pvt. Sector Pub. Sector Govt. Acad. Staff Students Reg. On-line Reg. Accommodation Contact Home

• Mode-1 Multi-Core • Memory Allocators • OpenMP • Intel TBB • Pthreads • Java - Threads • Charm++ Prog. • Message Passing (MPI) • MPI - OpenMP • MPI - Intel TBB • MPI - Pthreads • Compiler Opt. Features • Threads-Perf. Math.Lib. • Threads-Prof. & Tools • Threads-I/O Perf. • PGAS : UPC / CAF / GA • Power-Perf. • Home

Multi Cores : Mixed Mode of Programming :Using MPI & OpenMP

Example 1.1	Write an MPI-OpenMP program to print Hello World" . You have to use MPI Basic Library Calls and OpenMP PARALLEL For Directive.
Example 1.2	Write an MPI-OpenMP program to compute the value of PI pie function by numerical integration of a function f(x) = 4/(1+x²) between the limits 0 and 1. You have to use MPI Collective Communication and Computation Library Calls and OpenMP PARALLEL For Directive and CRITICAL section.
Example 1.3	Write an MPI-OpenMP program to calculate Infinity norm of a matrix using block striped partitioning with row wise data distribution. You have to use MPI Collective Communication and OpenMP Parallel For Directive and PRIVATE, SHARED Clauses.
Example 1.4	Write a MPI-OpenMP program to compute the matrix-vector multiplication using self scheduling algorithm. You have to use MPI Collective Communication and OpenMP Parallel For Directive and PRIVATE, SHARED Clauses.
Example 1.5	Write a MPI-OpenMP program to compute the matrix into matrix Multiplication using Checker-Board Partitoning of input Matrices (Assignment).
Example 1.6	Write a MPI-openmp program to solve a system of linear equations Ax=b using Conjugate Gradient Method. (Assignment).

Description of MPI-OpenMP Programs

Example 1.1 : Write an MPI-OpenMP program to print Hello World" . You have to use MPI basic Library Calls and OpenMP PARALLEL Directive

(Download source code :
mpi-Omp-hello-world.c ) / mpi-Omp-hello-world.f )

Objective

Write an MPI-OpenMP program to print Hello World" . You have to use MPI basic Library Calls and OpenMP PARALLEL Directives using p processes and t threads

Description

This is a very simple program to get the feel of threads and to get a feel of how to use threads with MPI. The implementation is as follows: The MPI application starts and each process creates two child threads. The main thread on each process passes an identifier to the thread and the address of the function to execute. Each thread will print the message "Hello World from thread: thread ID on process: Process No.

Input

None

Output

Hello World from Thread: thread No. on Process: process No.

Example 1.2 : Write an MPI-OpenMP program to compute the value of PI pie function by numerical integration of a function f(x) = 4/(1+x²) between the limits 0 and 1. You have to use MPI Collective Communication and Computation Library Calls and OpenMP PARALLEL FOR Directive and CRITICAL section.

(Download source code :
mpi-omp-pie-calculation.c ) / mpi-omp-pie-calculation.f )

Objective

Write an MPI-OpenMP program to compute the value of PI pie function by numerical integration of a function f(x) = 4/(1+x²) between the limits 0 and 1. You have to use MPI Collective Communication and Computation Library Calls and OpenMP PARALLEL FOR Directive and CRITICAL section on p processes and t threads
.

Description

This program computes the value of PI over a given interval using Numerical integration. All the process determine the number of intervals to be calculated by it. Each process then creates threads as the number of intervals it is to calculate. The main thread assigns an interval to each thread. Each thread calculates thr value in its interval and adds it to a common variable. This variable is protected by a Critical section to ensure the atomicity of the operations. When all the child threads on a process finish, the variable holds the value in the interval assigned to it. Using a collective call, MPI_Reduce, the root process accumulates the calculated value of PI.

Input

Number of intervals.

Output

Calculated Value of PI.

Example 1.3 : Write an MPI-OpenMP program to calculate Infinity norm of a matrix using block striped partitioning with row wise data distribution. You have to use MPI Collective Communication and OpenMP Parallel For Directive and PRIVATE, SHARED Clauses.

(Download source code :
mpi-omp-mat-infnorm-blkstp.c ) / mpi-omp-mat-infnorm-blkstp.f ; infndata.inp )

Objective

Write an MPI-OpenMP program to calculate Infinity norm of a matrix using block striped partitioning with row wise data distribution.You have to use MPI Collective Communication and OpenMP Parallel For Directive and PRIVATE, SHARED Clauses on p processe and t threads.

Description

Infinity Norm of a Matrix: The Row-Wise infinity norm of a matrix is defined to be the maximum of sums of absolute values of elements in a row, over all rows. After the initial validity checks, each process reads the input matrix and determines the number of rows to be operated by it. Using its rank, each process determines the specific rows to be operated by it. After the distribution of rows, the main thread on each process creates the child threads as the number of rows it is to operate. Each thread operates on the specified row and calculates the sum of the absolute values of the elements and updates a common variable. After all the threads on a process complete their share of work, the common variable holds the maximum value of the rows assigned to it. Finally, the Root process determines the infinity norm using a Collective MPI call, MPI_Reduce and prints the value.

Input

The input file holding the square matrix

Output

Infinity Norm of the Matrix Value

Example 1.4 : Write an MPI-OpenMP program to calculate Infinity norm of a matrix using block striped partitioning with row wise data distribution. You have to use MPI Collective Communication and OpenMP Parallel For Directive and PRIVATE, SHARED Clauses.

(Download source code :
mpi-omp-mat-vect-mult-blkstp.c ) / mpi-omp-mat-vect-mult-blkstp.f; mdata.inp; vdata.inp)

Objective

Write a MPI-OpenMP program to compute the matrix-vector multiplication using self scheduling algorithm. You have to use MPI Collective Communication and OpenMP Parallel For Directive and PRIVATE, SHARED Clauses on p processes and t threads.

Description

In self scheduling algorithm, a master distributes the rows of the matrix to worker threads. The worker threads calculate the dot product of the given row and the vector. When all the workers finish their part of the work, the resultant vector is the product of the given matrix and the vector. In this implementation, all the processes after the initial checks and populating the matrix and the vector, determine the rows of the matrix it needs to operate upon. The main thread on each process creates child threads to operate upon the rows it is assigned to. Each thread calculates the dot product of the row assigned to it and the vector and stores it in a local array on the process. After all the child threads on each process are done, the array holds the dot product of the rows of the matrix assigned to it and the vector. Finally, using a collective MPI call, MPI_Gatherv to accumulate the result vector.

Input

The input file holding the square matrix (Number of Rows, Columns of the matrix)

Output

Output Array : Product of Matrix Vector Multiplication.

Example 1.5 : Write a MPI-OpenMP program to compute the matrix into matrix Multiplication using Checker-Board Partitoning of input Matrices.

Objective

Write a MPI-OpenMP program to compute the matrix into matrix Multiplication using Checker-Board Partitoning of input Matrices. You have to use MPI Collective Communication and OpenMP Parallel ForDirective and PRIVATE, SHARED Clauses on p processes and t threads.

Description

In checkerboard partitioning, the matrix is divided into smaller square or rectangular blocks (submatrices) that are distributed among processes. A checkerboard partitioning splits both the rows and the columns of the matrix, so no process is assigned any complete row or column . Like striping partitioning, checkerboard partitioning can be block or cyclic.

Input

The input file holding the two square matrices (Number of Rows, Columns of the matrix)

Output

Output Array : Product of Matrix matrix Multiplication.

Example 1.6 : Write a MPI-OpenMP program to compute the solution for matrix system of linear equations [A] {x} = {b} by Conjugate Graident Method method using p processes and t threads.

Objective

Write an MPI-Pthreads program to compute the soluiton for matrix system of linear equations [A] {x} = {b}b by Conjugate Graident Method method using p processes and t threads.You have to use MPI Collective Communication and OpenMP Parallel ForDirective and PRIVATE, SHARED Clauses on p processes and t threads.

Description

Iterative methods are techniques to solve systems of equations of the form [A]{x}={b} that generate a sequence of approximations to the solution vector x. In each iteration, the coefficient matrix A is used to perform a matrix-vector multiplication. The number of iterations required to solve a system of equations with a desired precision is usually data dependent; hence, the number of iterations is not known prior to executing the algorithm. The Jacobi method starts with an initial guess x₀ for the solution vector x. This initial vector x₀ is used in the approriate equation sof Jacobi Algorithm to arrive at the next approximation x₁ to the solution vector. The vector x₁ is then used in the approriate equation sof Jacobi algorithm, and the process continues until a close enough approximation to the actual solution is found.

Input

The input file holding square matrix (Number of Rows, Columns of the matrix); and the Vector

Output

Output Array : Solution Array of Linear System of Equations

Centre for Development of Advanced Computing