• Mode-1 Multi-Core • Memory Allocators • OpenMP • Intel TBB • Pthreads • Java - Threads • Charm++ Prog. • Message Passing (MPI) • MPI - OpenMP • MPI - Intel TBB • MPI - Pthreads • Compiler Opt. Features • Threads-Perf. Math.Lib. • Threads-Prof. & Tools • Threads-I/O Perf. • PGAS : UPC / CAF / GA • Power-Perf. • Home




Programming on Multi-Core Processors Using Pthreads (POSIX threads)

Pthreads are defined as a set of C-language programming types and procedure calls, implemented with a pthread.h header/include file and a thread library. Solaris threads are easily understood by someone familiar with POSIX threads, and while Java threads and the multi-threading in the Win32 and OS/2 APIs are a little different . The subroutines, which comprise the Pthreads APIs, can be formally grouped into three classes such as Thread Management, Mutex Variables and Condition Variables. Threaded applications offer potential performance gains and practical advantages over non-threaded applications in several other ways as we can observe from the different programs.

Example programs using different APIs. Compilation and execution of Pthread programs, programs numerical and non-numerical computations are discussed using different thread APIs to understand Performance issues on mutli-core processors.


Example 3.1
Write Pthread code to Find out minimum in an un-sorted integer array.
Example 3.2
Write Pthread code to explain Producer-Consumer example work queues.
Example 3.3
Write Pthread code to Find k matches in the list.
Example 3.4
Write pthread code to illustrate data race condition and also its solution using mutex.
Example 3.5

Write pthread code to illustrate restructuring Loop-carried dependence to Loop-independent dependence using pthread APIs.

(Source - References : Books     Multi-threading -[MCMTh-01], [MCMTh-02], [MCMTh-I03], [MCMTh-05], [MCMth-09], [MCMth-11], [MCMTh-15], [MCMTh-21], [MCBW-44] )

Description of Pthread Programs

Example 3.1 : Write Pthread code to find out minimum integer in an un-sorted integer array.
( Download source code : pthread-findminimumvalue.c )


  • Objective
  • To write a Pthreads program to Write Pthread code to find out minimum integer in an un-sorted integer array. (In-house Multi-Core Real Dense Matrix Comp Suite)

  • Description
  • The program randomly generated the list of integers.The list of integers is partitioned equally among the threads. The size of each thread's partition of stored in a variable and the pointer to the start of each thread's partial list is passed to it as a pointer.The minimum value is protected by the mutex-lock .Threads execute the mutex lock to gain exclusive access to the minimum value. Once this access is gained, the value is updated as required, and the lock subsequently released. Since at any time, only one thread can hold the lock, only one thread can update the value.

  • Input
  • Number of Threads, Size of the Integer array in terms of Class where

    Class A : 10^7
    Class B : 10^8
    Class C : 10^9

  • Output
  • Minimum Value of the list and the execution time.

Example 3.2 : Write Pthread code to explain Producer-Consumer example work queues.
( Download source code : pthread-producer-consumer.c )


  • Objective
  • To write a Pthreads program for Producer-Consumer model based on work queues

  • Description
  • This problem commonly occurs in data flow decomposition in typical distributed computing. Usually, the problem can be decomposed in different ways on target architecture of the computing system, and different tasks can do different work. The most important issue is how the data flow between different tasks requires serious attention from performance point of view. The producer-consumer problem falls in this category in which the program has ability to execute in parallel. In this, the output of one task, the producer, becomes the input to another, the consumer . The two tasks are performed by different threads, and the second one, consumer , cannot start until the producer finishes some portion of its work. This is quite similar to the concept of pipelining in typical Parallel computing Paradigms.

    The producer/consumer problem occurs in several typical scenarios. In one scenario, first task may complete many sub-tasks and decides at some point of time, the work should be given to next task. The next task, which is executed by another thread, cannot start the work until the previous sub-tasks are completed by first task. It is difficult to identify at what point of time the first task completes the work and the delay caused by the first task creates a pause for the second task. After certain point of time, both the tasks can execute in parallel.

    In another scenario, the first task may read of a file and the results of this become the input to next task, which might be threaded. This process introduces substantial delay and the next step cannot begin until the first step completes the reading of file or partial reading of the file is done. It is possible to do new piece of work in the next step and waiting for the completion of first step.

    The producer-consumer problem has different benefits when decomposing a problem. The dependence created between consumer and producer can cause significant delays if this model is implemented correctly. There are situations in which consumer threads are idling while waiting for producer threads. A performance sensitive design seeks to understand the exact nature of the dependence and diminish the delay it imposes.

    In an ideal scenario, the producer and consumer plan carefully on their interaction. Also, if the consumer is finishing up while the producer is completely done, one thread remains idle while other threads are busy working away.

  • The producer creates tasks and inserts them into a work-queue. The consumer threads pick up tasks from the task queue and execute them. This concept can be called as producer-consumer work queues paradigm. Simple instances of this paradigm in which the task queue can hold only one task, which may be short or long but is typically have bounded size. In this simple model, the producer thread can estimate the time taken for consumer work and insert the new work in a shared buffer.

  • In producer-consumer work queue paradigm, the complex model application such as multi-media processing, the different possibilities exist on access to shared buffer. Several possibilities exist in which the producer thread must not overwrite the shared buffer when the previous task has not been picked by a consumer thread. Also, the consumer threads must not pick up tasks until there is something present in the shared data structure and individual consumer threads pick up tasks one at a time. Producer-consumer problem is classic synchronization problem.

  • This program makes use of mutex-locks for establishing a producer-consumer relationship between threads. The producer creates data and inserts them into a shared work -queue .The consumer threads pick up from the shared work- queue and consume it.

  • Input
  • No. of Producers & No of Consumers and the work-load

  • Output
  • Time taken to complete the producer-consumer model


Example 3.3 : Write Pthread code to Find k matches in the list.
( Download source code : pthread-finding-k-matches.c )


  • Objective
  • To write a Pthreads program to finding k matches to a query item in a given list.

  • Description
  • This program finding k matches to a query item in a given list. The list is partitioned equally among the threads. Assuming that the list has n entries, each of the p threads is responsible for searching n/p entries of the list.

  • Input
  • Number of threads.

  • Output
  • Print the k matches of query item in the list.

Example 3.4 : Write pthread code to illustrate data race condition and also its solution using mutex.
( Download source code : pthread-demo-datarace.c )


  • Objective
  • Write pthread code to illustrate data race condition condition and also its solution using mutex

  • Description
  • This code illustrates data race condition which is the situation which occurs when more than 1 threads are trying to work with or update global variable. Here we have taken one global variable named "myglobal" and updated its value in function named "thread_function_datarace" twenty times. Also in main function it is updated again 20 times. We have also included some delay in "thread_function_datarace". At the end of program we expect value of "myglobal" to be 40. But it gives some different value. This is because during delay which we introduced , other thread comes and do its work. So we get unexpected result.

    To deal with such situations,mutex is used. We have taken another function named "thread_function_mutex".In this function we have done same thing as "thread_function_datarace" but used pthread_mutex_lock() and pthread_mutex_unlock() function pair between the calculation of "myglobal". Because of use of mutex function only one thread at a time can update global variable. Here we get correct value of variable "myglobal".

  • Input
  • None

  • Output
  • Value of global variable from different functions.

Example 3.5 : pthread program to illustrate restructuring Loop-carried dependence to Loop-independent dependence using pthread APIs.
( Download source code : pthread-loop-carried.c )


  • Objective
  • Write pthread code to illustrate restructuring Loop-carried dependence to Loop-independent dependence using pthread APIs.

  • Description
  • This code illustrates Loop-carried dependence which is the situation where a value of variable, depends on the previous loop iteration. So we can not parallelize the for-loop. Here we have taken one Sn variable which was assigned in a array and updated its value in each iteration. The value of Sn variable depends on previous loop iteration .This case is so common in parallel programming. Now, the loop is reorganized in such a way that it can be parallelizable by removing the loop carried dependency. The solution is divides the iteration into chunks and calculating value of 'up' using power function at first iteration of loop for each thread and multiplies the 'up' value into constant value of orignSn to getting corresponding value of Snp which is assigned in array.

  • Input
  • None

  • Output
  • Value of Sn and Snp variables and execution time for serial and parallel computing.

Centre for Development of Advanced Computing