C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

Overview Venue : CMSD, UoH Key-Note/Invited Talks Faculty / Speakers Proceedings Downloads Past Tech. Workshops Target Audience Benefits Organisers Accommodation Local Travel Sponsors Feedback Acknowledgements Contact Home

Topics of Interest Tech. Prog. Schedule Topic : Multi-Core Topic : ARM Proc. Topic : Coprocessors Topic : GPGPUs Topic : HPC Cluster Topic : App. Kernels. Topic : Lab. Session Key-Note / Invited Talks Home

Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compilers - Opt. Features Threads-Perf. Math. Lib. Threads-Prof. & Tools Threads - I/O Perf. PGAS : UPC / CAF/ GA Power & Perf. Home

Mode-2 ARM Prog. Env Benchmarks Power & Perf. Home

Mode-3 Coprocessors Arch. Software Compiler & Vect. Prog. Env. Benchmarks Power & Perf. Home

Mode-4 GPGPUs NVIDIA - CUDA/OpenCL AMD APP - OpenCL GPGPUs - OpenCL GPGPUs : Power & Perf. Home

Mode-5 HPC Cluster HPC MPI Cluster GPU Cluster - NVIDIA GPU Cluster - AMD APP Cluster - Intel Coprocessors Cluster- Power & Perf. Home

Mode-6 App. Kernels PDE Solvers : FDM/FEM Image Processing - FFT Monte Carlo Methods String Srch. Seq. Analy. Video Process. Intr. Detcn. Sys App. Power & Perf. Home

Reg. Overview Pvt. Sector Pub. Sector Govt. Acad. Staff Students Reg. On-line Reg. Accommodation Contact Home

• Mode-1 Multi-Core • Memory Allocators • OpenMP • Intel TBB • Pthreads • Java - Threads • Charm++ Prog. • Message Passing (MPI) • MPI - OpenMP • MPI - Intel TBB • MPI - Pthreads • Compiler Opt. Features • Threads-Perf. Math.Lib. • Threads-Prof. & Tools • Threads-I/O Perf. • PGAS : UPC / CAF / GA • Power-Perf. • Home

Programming on Multi-Core Processors Using OpenMP APIs

The OpenMP API is used for writing portable multi-threaded applications written in Fortran, C and C++ languages. The OpenMP programming model plays a key role by providing an easy method for threading applications without burdening the programmer with the complications of creating, synchronizing load balancing, and destroying threads. The OpenMP model provides a platform independent set of compiler pragmas, directives, function calls, and environment variables that explicitly instruct the compiler how and where to use the parallelism in the application. Example programs using compiler pragmas, directives, function calls, and environment variables, Compilation and execution of OpenMP programs and programs numerical and non-numerical computations are discussed.

Example 4.1

OpenMP program : Computing Kernels for Matrix Computation

(Source - References : Books Multi-threading OpenMP -[MCMTh-01], [MCMTh-02], [MCMTh-I03], [MCMTh-05], [MCMTh-09], [MCMTh-11], [MCMTh-15], [MCMTh-21], [MCBW-44], [MCOMP-01], [MCOMP-02], [MCOMP-04], [MCOMP-12], [MCOMP-19], [MCOMP-25])

Description of OpenMP Computing Kernels /Benchmarks

Example 4.1 : Performance of Matrix Computation Test Kernels on Multi-Cores :
( Download WinRAR ZIP archive:
OpenMP-In-House-Benchmarks-codes (WinRAR ZIP archive)

Objective

The main objective is to execute computing Kernels on Multi-cores and evaluate the performance on Multi-Core systems for various problem sizes and threads.

Description

This benchmark comprises of suites performing Integer / Floating-Point Numerical and Non-Numerical computations using Shared Memory Programming (OpenMP).

These suites measures the execution time of kernels of Dense Matrix Computations involving Computation of Square Matrix Norm by Row-wise/Column-wise Partitioning ,Matrix and Vector Multiplication using checkerboard algorithm & Matrix and Matrix Multiplication using self scheduling algorithm ; PI computation using Numerical Integration and Monte Carlo Method ; Solving the Linear equation Ax = b using Jacobi Method .

In this program OpenMP PARALLEL directive, and CRITICAL section is used. The CRITICAL directive specifies a region of program that must be executed by only one thread at a time. If a thread is currently executing inside a CRITICAL region and another thread reaches that CRITICAL region and attempts to execute it, it will block until the first thread exits that CRITICAL region.

Input

The suites run for problem sizes - Class A,B,C on 1/2/4/8 threads.

Output

This Multi Core Benchmark gives the performance of system in terms of Time , Memory Utilized.

Centre for Development of Advanced Computing