Programming on Multi-Core Processors Using Pthreads (POSIX threads) |
Pthreads are defined as a set of C-language programming types and procedure calls,
implemented with a pthread.h header/include file and a thread library.
Solaris threads are easily
understood by someone familiar with POSIX threads, and while Java threads and the
multi-threading in the Win32 and OS/2 APIs are a little different . The subroutines,
which comprise the Pthreads APIs, can be formally grouped into three classes such as
Thread Management, Mutex Variables and Condition Variables. Threaded applications offer
potential performance gains and practical advantages over non-threaded applications
in several other ways as we can observe from the different programs.
Example programs using different APIs. Compilation and execution
of Pthread programs, programs numerical and non-numerical computations
are discussed using
different thread APIs to understand Performance issues on mutli-core processors.
|
Example 3.1
|
Write Pthread code to Find out minimum in an un-sorted integer array.
|
Example 3.2
|
Write Pthread code to explain Producer-Consumer example work queues.
|
Example 3.3
|
Write Pthread code to Find k matches in the list.
|
Example 3.4
|
Write pthread code to illustrate data race condition and also its solution using mutex.
|
Example 3.5
|
Write pthread code to illustrate restructuring Loop-carried dependence to Loop-independent dependence using pthread APIs.
|
(Source - References :
Books
Multi-threading
-[MCMTh-01], [MCMTh-02], [MCMTh-I03], [MCMTh-05], [MCMth-09], [MCMth-11],
[MCMTh-15], [MCMTh-21], [MCBW-44] )
|
Description of Pthread Programs |
Example 3.1 :
Write Pthread code to find out minimum integer in an un-sorted integer array.
( Download source code :
pthread-findminimumvalue.c
)
|
- Objective
To write a Pthreads program to Write Pthread code to
find out minimum integer in an un-sorted integer array.
(In-house Multi-Core Real Dense Matrix Comp Suite)
- Description
The program randomly generated the list of integers.The list of integers is partitioned equally among the threads.
The size of each thread's partition of stored in a variable and the pointer to the start of each thread's partial list
is passed to it as a pointer.The minimum value is protected by the mutex-lock .Threads execute the mutex lock to gain
exclusive access to the minimum value. Once this access is gained, the value is updated as required, and the lock
subsequently released. Since at any time, only one thread can hold the lock, only one thread can update the value.
- Input
Number of Threads, Size of the Integer array in terms of Class where
Class A : 10^7
Class B : 10^8
Class C : 10^9
- Output
Minimum Value of the list and the execution time.
|
Example 3.2 :
Write Pthread code to explain Producer-Consumer example work queues.
( Download source code :
pthread-producer-consumer.c
)
|
- Objective
To write a Pthreads program for Producer-Consumer model based on work queues
- Description
This problem commonly occurs in data flow decomposition in typical distributed computing. Usually, the problem can be decomposed in different
ways on target architecture of the computing system, and different tasks can do different work. The most important issue is how the data flow
between different tasks requires serious attention from performance point of view. The producer-consumer problem falls in this category
in which the program has ability to execute in parallel. In this, the output of one task, the producer, becomes the input to another,
the consumer . The two tasks are performed by different threads, and the second one, consumer , cannot start until the
producer finishes
some portion of its work. This is quite similar to the concept of pipelining in typical Parallel computing Paradigms.
The producer/consumer problem occurs in several typical scenarios. In one scenario, first task may complete many sub-tasks and decides
at some point of time, the work should be given to next task. The next task, which is executed by another thread, cannot start the work until
the previous sub-tasks are completed by first task. It is difficult to identify at what point of time the first task completes the work and the
delay caused by the first task creates a pause for the second task. After certain point of time, both the tasks can execute in parallel.
In another scenario, the first task may read of a file and the results of this become the input to next task, which might be threaded. This
process introduces substantial delay and the next step cannot begin until the first step completes the reading of file or partial reading of
the file is done. It is possible to do new piece of work in the next step and waiting for the completion of first step.
The producer-consumer problem has different benefits when decomposing a problem.
The dependence created between consumer and producer can cause significant delays if this model is implemented correctly.
There are situations
in which consumer threads are idling while waiting for producer threads. A performance sensitive design seeks to understand
the exact nature of
the dependence and diminish the delay it imposes.
In an ideal scenario, the producer and consumer plan carefully on their interaction. Also, if the consumer is finishing up while
the producer is completely done, one thread remains idle while other threads are busy working away.
-
The producer creates tasks and inserts them into a work-queue. The consumer threads pick up tasks from the
task queue and execute them. This concept can be called as producer-consumer work queues paradigm. Simple instances
of this paradigm in which the task queue can hold only one task, which may be short or long but is typically have bounded size.
In this simple model, the producer thread can estimate the time taken for consumer work and insert the
new work in a shared buffer.
-
In producer-consumer work queue paradigm, the complex model application such as multi-media processing, the different
possibilities exist on access to shared buffer. Several possibilities exist in which the producer thread must not overwrite the shared
buffer when the previous task has not been picked by a consumer thread.
Also, the consumer threads must not pick up tasks
until there is something present in the shared data structure and individual consumer threads pick up tasks one at a time.
Producer-consumer problem is classic synchronization problem.
This program makes use of mutex-locks for establishing a producer-consumer relationship between threads.
The producer creates data and inserts them into a shared work -queue .The consumer threads pick up from the
shared work- queue and consume it.
- Input
No. of Producers & No of Consumers and the work-load
- Output
Time taken to complete the producer-consumer model
|
Example 3.3 :
Write Pthread code to Find k matches in the list.
( Download source code :
pthread-finding-k-matches.c
)
|
- Objective
To write a Pthreads program to finding k matches to a query item in a given list.
- Description
This program finding k matches to a query item in a given list.
The list is partitioned equally among
the threads. Assuming that the list has n entries, each of the p threads
is responsible for searching n/p entries of the
list.
- Input
Number of threads.
- Output
Print the k matches of query item in the list.
|
Example 3.4 :
Write pthread code to illustrate data race condition and also its solution using mutex.
( Download source code :
pthread-demo-datarace.c
)
|
- Objective
Write pthread code to illustrate data race condition condition and also its solution using mutex
- Description
This code illustrates data race condition which is the situation which occurs when
more than 1 threads are trying to work with or update global variable.
Here we have taken one global variable named "myglobal" and updated its value
in function named "thread_function_datarace" twenty times. Also in main function
it is updated again 20 times. We have also included some delay in
"thread_function_datarace". At the end of program we expect value of "myglobal"
to be 40. But it gives some different value. This is because during delay which we
introduced , other thread comes and do its work. So we get unexpected result.
To deal with such situations,mutex is used. We have taken another function named
"thread_function_mutex".In this function we have done same thing as "thread_function_datarace" but used
pthread_mutex_lock() and pthread_mutex_unlock() function pair between the calculation of "myglobal".
Because of use of mutex function only one thread at a time can update global variable. Here we get
correct value of variable "myglobal".
- Input
None
- Output
Value of global variable from different functions.
|
Example 3.5 :
pthread program to illustrate restructuring Loop-carried dependence to
Loop-independent dependence using pthread APIs.
( Download source code :
pthread-loop-carried.c
)
|
- Objective
Write pthread code to illustrate restructuring Loop-carried dependence to
Loop-independent dependence using pthread APIs.
- Description
This code illustrates Loop-carried dependence which is the situation where a value of variable,
depends on the previous loop iteration. So we can not parallelize the for-loop. Here we have
taken one Sn variable which was assigned in a array and updated its value in each iteration.
The value of Sn variable depends on previous loop iteration .This case is so common in parallel
programming.
Now, the loop is reorganized in such a way that it can be parallelizable by removing the
loop carried dependency. The solution is divides the iteration into chunks
and calculating value of 'up' using power function at first iteration of loop for each thread
and multiplies the 'up' value into constant value of orignSn to getting corresponding value
of Snp which is assigned in array.
- Input
None
- Output
Value of Sn and Snp variables and execution time for serial and parallel computing.
|
| |
|
|