hyPACK-2013 Mode 1 : Software Threading : OpenMP
The OpenMP API is used for writing portable multi-threaded applications written in Fortran, C and C++ languages.
The OpenMP programming model plays a key role by providing an easy method for threading applications without burdening
the programmer with the complications of creating, synchronizing load balancing, and destroying threads.
The OpenMP model provides a platform independent set of compiler pragmas, directives, function calls, and environment
variables that explicitly instruct the compiler how and where to use the parallelism in the application.
This software threading model on the computing platform consists of multi-core processors
play an important role to understand and enhance the performance of your application.
Example programs using
compiler pragmas, directives, function calls, and environment variables, Compilation and execution
of OpenMP programs, programs numerical and non-numerical computations
are discussed.
|
List of OpenMP Programs
|
OpenMP programs to illustrate basic OpenMP API library calls :
Examples include some introductory
programs on use of OpenMP pragmas, parallel directives (the for directive), Threading a Loop, work-Sharing constructs,
function calls, Reduction Operations, synchronization, data handling, Managing Shared & Private Data, Critical Section &
Environment Variables.
|
|
Programs based on Numerical Computations (Dense Matrix Computations) using thread APIs :
Examples programs on numerical integration of "pi"
value by using different algorithms and OpenMP
pragmas, vector-vector multiplication, matrix-vector
multiplication using different OpenMP pragmas/clauses.
The focus is to use different OpenMP pragmas and
understand Performance issues on multi-core processors.
.
|
|
Non-Numerical Computations & I/O (Sorting, Searching, Producer-Consumer)
using thread APIs :
Examples programs on Sorting, Searching algorithms, Producer Consumer programs & OpenMP I/O programs
using different OpenMP pragmas are discussed. The focus is to use
different OpenMP pragmas and understand Performance issues on multi-core processors.
|
|
Test suite of Programs/Kernels and Benchmarks on Multi-Core Processors
A test suite of programs on selective Dense Matrix computations, and Sorting Algorithms, are discussed on multi-core processors.
Different OpenMP Pragma have been
used to understand Performance issues on multi-core processors.
|
|
|
Introduction to OpenMP
|
Three explicit parallel programming models
are data-parallel, shared-variable and message passing.
Explicit parallelism means that the programmer using special
language constructs, compiler directives, or library function calls
explicitly specifies parallelism in the source code. If the programmer
does not explicitly specify parallelism, but lets the compiler and the
run-time support system automatically exploit it, we have the implicit
parallelism. Out of many explicit parallel-programming models, the
dominant ones are data-parallel, message passing, and shared-variable
model.
OpenMP History
In the early 90's, vendors of shared-memory machines supplied similar,
directive-based, Fortran programming extensions to make use of the
architecture and its advantages:
-
The user would augment a serial Fortran program with directives
specifying which loops were to be parallelized.
-
The compiler would be responsible for automatically parallelizing
such loops across the SMP processors implementations were all functionally
similar, but were diverging (as usual) as different vendors used
different methods and implementations based on their architecture
and platform.
First attempt at a standard was the draft for ANSI X3H5 in 1994.
It was never adopted, largely due to waning interest as distributed
memory machines became popular. The OpenMP standard specification
started in the spring of 1997, taking over where ANSI X3H5 had left
off, as newer shared memory machine architectures started to become
prevalent.
The OpenMP groups believes that Pthread is not
scalable, since it is targeted at low end Unix SMPs, not technical
high performance computing. It does not even has FORTRAN binding/
Pthread is low level, because it uses the library approach, not
compiler directive. The library approach precludes compiler
optimizations. Pthread supports only thread parallelism, not fine
grain parallelism. The concept of OpenMP is flexible to support coarse
grain parallelism.
Pthreads do not support incremental parallelism
well. Given a sequential computing program, it is difficult for the
user to parallelize it using pthreads. The user has to worry about
many low level details, and pthreads do not naturally support loop
level data parallelism.
OpenMP is designed to alleviate the short
comings discussed above. For wide acceptance of OpenMP, the key to
develop good compilers and run time environment.
Shared Memory Programming
Shared-memory
systems typically provide both static and dynamic process creation.
That is, processes can be created at the beginning of program
execution by a directive to the operating system, or they can be
created during the execution of the program. The best-known dynamic
process creation function is fork. A typical implementation will allow
a process to start another, or child, process by a fork. Three
processes typically manage coordinating among processes in shared
memory programs. The starting, or a parent, process can wait for the
termination of the child process by calling join. The second prevents
processes from improperly accessing shared resources. The third
provides a means for synchronizing the processes.
The shared-memory model is similar to the data-parallel model.
It has a single address (global naming) space.
It is similar to the message-passing model in that it is
multi-threading and synchronous. However,
data reside in a single, shared address space, thus does not have to
be explicitly allocated. Workload
can be either explicitly or implicitly allocated. Communication is
done implicitly through shared reads and writes of variables.
However, synchronization is explicit.
Shared variable programs are multi threaded and asynchronous; require
explicit synchronizations to maintain correct execution order among
the processes. Parallel programming based on the shared memory model
has not progressed as much as message passing parallel programming.
An indicator is the lack of a widely accepted standard such as
MPI or PVM for message passing. The
current situation is that shared-memory programs are written in a
platform specific language for multiprocessors (mostly SMPs and PVPs).
Such programs are not portable even among multiprocessors, not
to mention multicomputers (MPPs and clusters). Three platform
independent shared memory-programming models are X3H5, Pthreads, and
OpenMP.
OpenMP is an Application Program Interface (API) that may be used
to explicitly direct multi-threaded, shared memory parallelism. It
is a specification for a set of compiler directives, library routines
and environment variables that can be used to specify shared memory
parallelism in Fortran and C/C++ programs. The OpenMP is a shared
memory standard supported by a group of hardware and software
vendors, such as DEC, Intel, IBM, Kuck & Associates, SGI, Portland
Group, Numerical Algorithms Group, U.S DOE ASCI program, etc. It
is comprised of three primary API components:
- Compiler Directives 
- Runtime Library Routines
- Environment Variables
OpenMP is portable and the API is specified for C/C++ and Fortran.
Multiple platforms have been implemented including most Unix platforms
and Windows NT. It is jointly defined and endorsed by a group of major
computer hardware and software vendors.
The goal is to define standardization and provide a standard among a
variety of shared memory
architectures/platforms. Also, establish a simple and limited set of
directives for
programming shared memory machines to achieve significant parallelism
that can be
implemented by using just 3 or 4 directives.
|
Why OpenMP ?
|
Shared-memory parallel programming directives have never been
standardized in the industry. An earlier standardization effort, ANSI
X3H5 was never formally adopted. So vendors have each provided a
different set of directives, very similar in syntax and semantics, and
each used a unique comment or pragma notation for
"portability". OpenMP consolidates these directive sets into
a single syntax and semantics, and finally delivers the long- awaited
promise of single source portability for shared-memory parallelism.
The OpenMP offers the features necessary to represent coarse-grained
algorithms and some of the features are given below.
-
Provide capability to incrementally parallelize a serial program,
unlike message-passing libraries which typically require an all
or nothing approach.
-
Provide the capability to implement both coarse-grain and fine-grain
parallelism portability
-
Supports Fortran (77, 90, and 95), C, and C++ languages
-
Public forum for API and membership
|
OpenMP Key Features
|
-
OpenMP incorporates the concept of orphan directives, do not appear in the lexical extent of a parallel construct but lie in its
dynamic (execution) extent. They are not lexically enclosed in the parallel construct of the main program,
but they are in its dynamic execution path.
-
User parallelize the main program using one or more parallel directives,
and use other directives to control execution in the parallel subroutines.
This way, he could enable parallel execution of major portions of the
program with small modification. This concept also facilitates the development
and reuse of modular parallel programs. X3H5 does not support orphan
directives.
-
Besides compiler directives, OpenMP provides a set of run-time library
routines with associated environment variables. They are used to control
and query the parallel execution environment, provide general-purpose lock
functions, set execution mode, etc. For instance, OpenMP allows a throughput mode.
-
The system then dynamically sets the number of threads used to execute
parallel regions. This can maximize the throughput performance of the system, probably
at the expense of prolonging the elapsed wall-clock time of one application. X3H5
and OpenMP have similar parallelism directives. Only the notations are slightly different. OpenMP includes a new MASTER directive.
Only the master thread should execute the program.
-
OpenMP provides more flexible functionality to control the data environment
than X3H5. For instance, OpenMP supports reduction by a REDUCTION (+ :
sum) clause in a PARALLEL DO directive. A private copy of the reduction variable sum is created for each thread. The private copy is initialized to 0 according to the
reduction operator +. Each thread computes a private result. At the end of PARALLEL DO, the reduction variable sum is updated to equal the result of combining the original value of the
reduction variable sum with the private results using the operator +. Reduction operators other than + can be specified.
The clause DEFAULT (PRIVATE) directs all variables in a parallel region to be private,
unless overwritten by other explicit SHARED clauses. There is also a DEFAULT (SHARED)
clause to direct all variables as shared. Auto-scoping makes it unnecessary to
explicitly enumerate all variables. This can save programmer's time and reduce errors.
-
OpenMP introduces an ATOMIC directive, which allows the compiler to take advantage of the most efficient scheme to implement atomic updates to a variable.
This is superior to mutually exclusive constructs such as critical regions and
locks.
|
Basic OpenMP Library Calls
|
Most commonly used OpenMP Run time pragmas in FORTRAN/C -Language
are explained below.
-
Syntax : C
void omp_set_num_threads(int num_threads)
Syntax : Fortran
SUBROUTINE OMP_SET_NUM_THREADS ( scalar_integer_expression )
sets the number of threads to use in a team
This subroutine sets the number of threads that will be used in the
next parallel region. The dynamic threads mechanism modifies the effect
of this routine. If enabled, specifies the maximum number of threads
that can be used for any parallel region. If disabled, specifies exact
number of threads to use until next call to this routine. This routine
can only be called from the serial portions of the code. This call has
precedence over the OMP_NUM_THREADS environment variable.
-
Syntax : C
int omp_get_num_threads(void)
Syntax : Fortran
INTEGER FUNCTION OMP_GET_NUM_THREADS()
returns the number of threads in the currently executing parallel region.
This subroutine/function returns the number of threads that are currently
in the team executing the parallel region from which it is called. If
this call is made from a serial portion of the program, or a nested
parallel region that is serialized, it will return 1. The default number
of threads is implementation dependent.
-
Syntax : C
int omp_get_thread_num(void)
Syntax : Fortran
INTEGER FUNCTION OMP_GET_THREAD_NUM()
returns the thread number within the team
This function returns the thread number of the thread, within the team,
making this call. This function returns the thread number. This number
will be between 0 and OMP_GET_NUM_THREADS-1. The master thread of the
team is thread 0. If called from a nested parallel region, or a serial
region, this function will return 0.
-
Syntax : C
int omp_get_num_procs(void)
Syntax : Fortran
INTEGER FUNCTION OMP_GET_NUM_PROCS()
returns the number of processors that are
available to the program.
-
Syntax : C
int omp_in_parallel(void)
Syntax : Fortran
LOGICAL FUNCTION OMP_IN_PARALLEL()
returns .TRUE. for calls within a parallel region, .FALSE. otherwise.
This function/subroutine is called to determine if the section of code
which is executing is parallel or not. For Fortran, this function returns
.TRUE. if it is called from the dynamic extent of a region executing
in parallel, and .FALSE. otherwise. For C/C++, it will return a non-zero
integer if parallel, and zero otherwise.
-
Syntax : C
void omp_set_dynamic(int dynamic_threads)
Syntax : Fortran
SUBROUTINE OMP_SET_DYNAMIC(scalar_logical_expression)
control the dynamic adjustment of the number
of parallel threads.
This subroutine enables or disables dynamic adjustment (by the run
time system) of the number of threads available for execution of parallel
regions. For Fortran, if called with .TRUE. then the number of threads
available for subsequent parallel regions can be adjusted automatically
by the run-time environment. If called with .FALSE., dynamic adjustment
is disabled. For C/C++, if dynamic_threads evaluates to non-zero, then
the mechanism is enabled, otherwise it is disabled. The OMP_SET_DYNAMIC
subroutine has precedence over the OMP_DYNAMIC environment variable.
The default setting is implementation dependent. Must be called from
a serial section of the program.
-
Syntax : C
int omp_get_dynamic(void)
Syntax : Fortran
LOGICAL FUNCTION OMP_GET_DYNAMIC()
returns .TRUE. if dynamic threads is enabled,
.FALSE. otherwise.
This function is used to determine if dynamic thread adjustment is
enabled or not. For Fortran, this function returns .TRUE. if dynamic
thread adjustment is enabled, and .FALSE. otherwise. For C/C++, non-zero will
be returned if dynamic thread adjustment is enabled, and zero
otherwise.
-
Syntax : C
void omp_set_nested(int nested)
Syntax : Fortran
SUBROUTINE OMP_SET_NESTED(scalar_logical_expression)
enable or disable nested parallelism.
This subroutine is used to enable or disable nested parallelism. For
Fortran, calling this function with .FALSE. will disable nested parallelism, and calling
with .TRUE. will enable it. For C/C++, if nested evaluates to non-zero, nested
parallelism is enabled; otherwise it is disabled. The default is for nested parallelism
to be disabled. This call has precedence over the OMP_NESTED environment variable.
-
Syntax : C
int omp_get_nested()
Syntax : Fortran
LOGICAL FUNCTION OMP_GET_NESTED()
returns .TRUE. if nested parallelism is enabled,
.FALSE. otherwise.
This function/subroutine is used to determine if nested parallelism
is enabled or not. For Fortran, this function returns .TRUE. if nested
parallelism is enabled, and .FALSE. otherwise. For C/C++, non-zero will
be returned if nested parallelism is enabled, and zero otherwise.
-
Syntax : C
int omp_test_lock(omp_lock_t *lock)
int omp_test_nest__lock(omp_nest_lock_t *lock)
Syntax : Fortran
SUBROUTINE OMP_TEST_LOCK(var)
try to acquire the lock, return success or failure
This subroutine attempts to set a lock, but does not block if the lock
is unavailable. For Fortran, .TRUE. is returned if the lock was set
successfully, otherwise .FALSE. is returned. For C/C++, non-zero is
returned if the lock was set successfully, otherwise zero is returned.
It is illegal to call this routine with a lock variable that is not
initialized.
|
Compilation & Execution of OpenMP Programs
|
Using command line arguments
The compilation and execution details of an OpenMP program may vary
on different architectures. Depending upon your language preference
(C or Fortran), use
following command line options. To compile the program :
Using GNU Fortran or C compiler
# gfortran -o <Name of executable> <program name> -fopenmp
# gcc -o <Name of
executable> <program name> -fopenmp
Using Intel Fortran or C compiler
# ifort -o <Name of executable> <program name> -openmp
# icc -o <Name of
executable> <program name> -openmp
Using PGI Fortran or C compiler
# pgf90 -o <Name of executable> <program name> -mp
# pgcc -o <Name of
executable> <program name> -mp&
Using PATH Scale Fortran or C compiler
# pathf95 -o <Name of executable> <program name> -openmp
# pathcc -o <Name of
executable> <program name> -openmp
For example to compile a simple Hello World program user can type on the command line
# ifort -o HelloWorld omp-hello-world.f
(Fortran codes) -openmp
# icc -o HelloWorld omp-hello-world.c (C codes)
-openmp
Compiler Flags
The syntax of the command to compile the OpenMP programs :
# <compiler> -o <executable-name> <program-name> <compiler-flag> -lm
Use the following compiler flag to "turn On " OpenMP compilation
Compiler |
Compiler flag |
GNU |
-fopenmp |
PGI |
-mp |
Path Scale |
-openmp |
IBM |
-qsmp=omp |
Intel |
-openmp |
Sun |
-xopenmp |
For more control over the process of compiling and linking programs
for OpenMP, you should use a 'Makefile'.
You may also use some commands in Makefile particularly for programs
contained in a large number of files. The user has to specify
the names of the program and appropriate compiler options/paths to link some of the
libraries required for OpenMP in the Makefile
To compile and link an OpenMP program in C or Fortran, you can use
the command
make
Note: If the Makefile has
some extension like Makefile_C
then user is required to type
make -f Makefile_C
(instead of simply typing make)
User can refer the Makefile
Makefile_Fortran
for Fortran openmp programs and the Makefile
Makefile_C
for C openmp programs compilation
Execution of program :
To execute an OpenMP program
user has to set the OMP_NUM_THREADS Environment variable depending on
user shell like
setenv OMP_NUM_THREADS 3
(if using csh/tcsh)
export OMP_NUM_THREADS=3
(if using bash/ksh)
then simply type the name of executable
on command line.
./< Name of executable>
For example to execute a simple Hello World program user can
type
./HelloWorld
The output should look similar to below using three threads. The
actual number of threads and order of output may vary.
Hello World
from thread = 0
Number of threads = 3
Hello World
from thread = 2
Hello World
from thread = 1
Note : To Execute the above Programs on MPI based Cluster of
Multi-Core processors, the user should submit job to scheduler.
To submit the job use the following command.
bsub -q <queue-name> [options] ./<executablename>
For Example :
bsub -q normal -ext"SULRM[nodes=1]" -o omp-hello-world.out
-e omp-hello-world.err ./omp-hello-world
Note: Refer man pages of "bsub" for options.
|
OpenMP tools
|
Performance is a critical issue in Multi core system. Performance
evaluation and visualization is an important and useful technique that helps the user to
understand the behavior of a parallel program
There are many problems associated with debugging an OpenMP program.
It's difficult to understand the flow of your program's execution, and
accessing variables is complicated because of the nature of private and
shared variables in OpenMP program.
There are many tools available for debugging and performance
visualization of an OpenMP programs. Some of them are listed below :
KAP/Pro Toolset for OpenMP
The KAP/Pro Toolset consists of
-
Guide : OpenMP compiler,
-
Guide View : A performance analysis tool for
OpenMP
-
Assure : OpenMP Analyzer, a tool for verifying the correctness
of OpenMP applications.
The Guide OpenMP Compiler provides OpenMP directive-based
application development for Fortran, C, and C++. The GuideView Performance
Analyzer presents a window into the performance details of a program's
parallel execution. The Assure OpenMP Analyzer is the industry's first
parallel correctness verifier.These are available at
http://developer.intel.com/software/products/threadtool.htm
With the KAP/Pro tools and OpenMP, the application developer
can easily and quickly develop, debug and tune programs for Windows
NT and Unix. The Toolset provides a major advance in usability over
alternative systems.
Sun's Compilers and Tools for OpenMP
Sun Microsystems
Sun Studio 12, Compiler Collection, Fortran 95, C, and C++
Performance Analyzers : Paraver from
the Paraver project
Paraver is a flexible performance
visualization and analysis tool that can be used to analyze MPI,
OpenMP, MPI+OpenMP, Java, Hardware counters profile etc..
TotalView
TotalView is the debugger for
complex code. TotalView is far and away the best choice for those
working with parallelism or large amounts of data because it scales
transparently to support the big code and data sets running on
anywhere from one to thousands of processes or processors. It's been
proven in the world's toughest debugging environments.
It is available at
http://www.etnus.com/Products/TotalView
TotalView's support for
OpenMP debugging lets you view the state of your program as if it
were a non-parallel code. With TotalView, you can
-
Debug threaded codes whether OpenMP directives are
present or not.
-
Understanding OpenMP code execution
-
Access private and shared variables as well as
thread private variables.
|
OpenMP FAQs
|
The OpenMP FAQ (frequently asked questions) list is available at
http://www.openmp.org
Q1:
What is OpenMP?
Q2:
What does the MP in OpenMP stand for?
Q3:
How does OpenMP compare with ... ?
Q4:
What languages does OpenMP work with?
Q5:
Is OpenMP scalable?
Q1: What is OpenMP?
A1:OpenMP is a specification for a set of
compiler directives, library routines, and environment variables that can
be used to specify shared memory parallelism in Fortran and C/C++
programs.
Q2:
What does the MP in OpenMP stand for?
A2:
The MP in OpenMP stands for Multi Processing.
We provide Open specifications for Multi Processing via collaborative work
with interested parties from the hardware and software industry,
government and academia.
Q3: How does OpenMP compare with ... ?
A3:
MPI? Message passing has
become accepted as a portable style of parallel programming, but has
several significant weaknesses that limit its effectiveness and
scalability. Message-passing in general is difficult to program and
doesn't support incremental parallelization of an existing sequential
program. Message-passing was initially defined for client/server
applications running across a network, and so includes costly semantics
(including message queuing and selection and the assumption of wholly
separate memories) that are often not required by tightly-coded scientific
applications running on modern scalable systems with globally addressable
and cache coherent distributed memories.
HPF? HPF has never really gained wide acceptance among
parallel application developers for all applications. Some applications
written in HPF perform well, but others find that limitations resulting
from the HPF language itself or the compiler implementations lead to
disappointing performance. HPF's focus on data parallelism has also
limited its appeal.
Pthreads? Pthreads have never been targeted toward the
technical/HPC market. This is reflected in the minimal Fortran support,
and its lack of support for data parallelism. Even for C applications,
pthreads requires programming at a level lower than most technical
developers would prefer.
FORALL loops? FORALL loops are not rich or general
enough to use as a complete parallel programming model. Their focus on
loops and the rule that subroutines called by those loops can't have side
effects effectively limit their scalability. FORALL loops are useful for
providing information to automatic parallelizing compilers and
preprocessors.
BSP or LINDA
or...? There are lots of parallel programming languages being researched
or prototyped in the industry. These may be targeted towards a specific
architecture, or focused on exploring one key requirement.
Q4:
What languages does OpenMP work with?
A4:
OpenMP is designed for Fortran, C and C++ to
support the language that the underlying compiler supports. The OpenMP
specification does not introduce any constructs that require specific
Fortran 90 or C++ features. OpenMP cannot be supported by compilers that
do not support one of Fortran 77, Fortran 90, ANSI 89 C or ANSI C++.
Q5:
Is OpenMP scalable?
A5:
OpenMP can deliver scalability for
applications using shared - memory parallel programming. Significant effort
was spent to ensure that OpenMP can be used for scalable applications.
Ultimately, scalability is a property of the application and the
algorithms used. The parallel programming language can only support the
scalability by providing constructs that simplify the specification of
the parallelism and can be implemented with low overhead by compiler
vendors. OpenMP certainly delivers these kinds of constructs.
|
OpenMP Example Program in Fortran language
|
The simple OpenMP program is "Hello World" program, in which
each thread simply prints the message "Hello World". In this example,
thread with identifier 0, 1, 2, ......, p-1 will print the message "Hello
World"
The simple OpenMP program in Fortran language in which each thread
prints "Hello World" message is explained below. We describe
the features of the entire program and describe the program in details.
First
few lines of the program explain variable definitions, and constants. Followed
by these declarations, OpenMP library calls are declared in the program.
The library call OMP_GET_THREAD_NUM returns ThreadID the identifier
of each thread and the library call OMP_GET_NUM_THREADS returns
NoofThreads the total number of threads that the user has started
for this program.
The following fragment of the program explains these features. The description
of program is as follows:
program HelloWorld
integer NoofThreads, ThreadID, OMP_GET_NUM_THREADS
integer OMP_GET_THREAD_NUM
ThreadID is the identifier of each thread and NoofThreads is
total number of threads used in the program. ThreadID and NoofThreads
are private to each thread. Each thread obtains its own identifier and then
prints the message "Hello World" in parallel.
Starting of OpenMP PARALLEL directive and PRIVATE clause. The PARALLEL
directive pair specifies that the enclosed block of program, referred to as
PARALLEL region, be executed in parallel by multiple threads. The PRIVATE
clause is typically used to identify variables that are used as scratch
storage in the program segment within the parallel region. It provides a list
of variables and specifies that each thread have a private copy of those
variables for the duration of the parallel region.
C$OMP PARALLEL PRIVATE(NoofThreads,ThreadID)
Each thread gets its own copies of variables, identifier and prints
it.
ThreadID = OMP_GET_THREAD_NUM()
print *, 'Hello World from thread = ', ThreadID
Only master thread obtains NoofThreads the total number of threads used
in program and prints it.
if (ThreadID .EQ. 0) then
NoofThreads = OMP_GET_NUM_THREADS()
print *, 'Number of threads = ', NoofThreads
end if
Ending of OpenMP PARALLEL directive. All threads join master thread and disband.
C$OMP END PARALLEL
stop
end
|
OpenMP Example Program in C language
|
The simple OpenMP program in C language in which each thread
prints "Hello World" message is as below.
#include <stdio.h>
#include <omp.h>
/* Main Program */
main()
{
int ThreadID, NoofThreads;
/* OpenMP Parallel Directive */
#pragma omp parallel
private(ThreadID)
{
ThreadID = omp_get_thread_num();
printf("\nHello World is being printed by the thread id %d\n",
ThreadID);
/* Master Thread Has Its ThreadID 0 */
if (ThreadID == 0)
{
printf("\n Master prints Numof threads \n");
NoofThreads = omp_get_num_threads();
printf(" Total number of threads are %d\n", NoofThreads);
}
}
}
|
OpenMP 3.0 Overview
|
OpenMP is a shared-memory model and all OpenMP threads have access to memory.
Each thread is allowed to have its own temporary view of the memory and other
threads cannot access this memory. This is usually called threadprivate memory.
The major changes between the OpenMP API Version 2.5 specification and the OpenMP API Version 3.0
specification are given below. For more details, please visit the below web sites.
OpenMP 3.0 Important Features
An Overview of OpenMP 3.0 Ruud van der Pas, Sun Microsystems
Tasking in OpenMP 3.0 Alejandro Duran, Barcelona Supercomputing Center
Sun Studio OpenMP Compilers and Tools Ruud van der Pas, Sun Microsystems
OpenMP And Performance Ruud van der Pas, Sun Microsystems
-
The concept of tasks has been added to the OpenMP execution model. The
task construct has been added,
which provides a mechanism for creating tasks explicitly.
The taskwait construct has been
added, which causes a task to wait for all its child tasks to complete.
-
The OpenMP memory model now covers atomicity of memory accesses. The description of the behavior
of volatile in terms of flush was removed.
OpenMP Version 2.5, supports a single copy of the
nest-var, dyn-var, nthreads-var and run-sched-var
internal control variables ( ICVs ) for the whole program. In Version 3.0,
there is one copy of these ICVs per task As a result, the
omp_set_num_threads, omp_set_nested and omp_set_dynamic runtime library routines now have
specified effects when called from inside a parallel region.
The definition of active parallel region has been changed: in Version 3.0 a parallel region is
active if it is executed by a team consisting of more than one thread.
The rules for determining the number of threads used in a parallel region have
been modified.
In Version 3.0, the assignment of iterations to threads in a
loop construct ,
collapse clause ,
and the schedule kind auto have
been modified and modified with additional features.
In version 3.0, Fortran arrays with
data-sharing attributes, private, firstprivate, lastprivate, reduction, copyin and
copyprivate clause
have been included as well as modified.
The thread-limit-var ICV has been added, which controls the maximum number of threads
participating in the OpenMP program
The stacksize-var ICV has been added, which controls the stack size for threads that
the OpenMP implementation creates.
The runtime libraries
omp_get_level,omp_get_ancestor_thread_num, omp_get_team_size, omp_get_active_level
have been added for
parallel
region work.
OpenMP 3.0 Example Programs
Examples programs on matrix-matrix multiplication algorithms, Gauss Seidel Method,
Conjugate Gradient Method for solution of linear system of equations, Producer Consumer
example programs and Solution of Poisson Equations (Partial differential equations) are
tested using various OpenMP 3.0 pragmas on multi-core processors.
OpenMP 3.0 Web Sites & Presentations
International Workshop on OpenMP ( IWOMP 2009 ) June 3 - 5, Dresden University
of Technology,
Dresden, Germany
http://www.openmp.org/forum
The OpenMP Forum
An Overview of OpenMP 3.0 Ruud Van der Pas, Sun Microsystems, USA, International Workshop
on OpenMP ( IWOMP 2009), June 3 - 5, 2009, Dresden University of Technology,
Dresden, Germany
Tasking in OpenMP Alejandro Duran, Barcelona Supercomputing Center,
International Workshop on OpenMP ( IWOMP 2009), June 3 - 5, 2009, Dresden University of Technology,
Dresden, Germany
OpenMP Support in Sun Studio
Ruud van der Pas, Senior Staff Engineer, Sun Microsystems, Menlo Park, CA, USA
International Workshop on OpenMP ( IWOMP 2009), June 3 - 5, 2009 Dresden University of Technology,
Dresden, Germany
Getting OpenMP Up to Speed
Ruud van der Pas, Senior Staff Engineer, Sun Microsystems, Menlo Park, CA, USA
International Workshop on OpenMP ( IWOMP 2009), June 3 - 5, 2009, Dresden University of Technology,
Dresden, Germany
The OpenMP News - The OpenMP API Specification for Parallel Programming :
http://www.openmp.org/wp/
OpenMP 3.0 Public Draft Available Thinking Parallel, Michael Suess
OpenMP 3.0 Overview Mark Bull EPCC, University of Edinburgh and OpenMPARB
|
|
|
|