C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

TBB Template devide in Three parts.

Basic algorithms:

1.parallel_for:

parallel_for(blocked_range<T>(begin,end,grainsize),body object)

A parallel_for<Range,Body> represents parallel execution of Body over each value in Range. This template function tbb::parallel_for recursively splits the iteration space into chunks and runs each chunk on a separate thread. A blocked_range<T> is a template class provided by the library. It describes a one-dimensional iteration space over type T. begin and end are the limits of the iteration space. grainsize refers to size of each chunk. Body object is an loop body object, in which operator() process a chunk.

2.parallel_reduce:

parallel_reduce(blocked_range<T>(begin,end,grainsize),body object)

A parallel_reduce<Range,Body> performs parallel reduction of Body over each value in Range. This template function can parallelize the loop if iterations are independent. TBB defines parallel_reduce similar to the parallel_for..

3.parallel_scan:

parallel_scan(blocked_range<T>(begin,end,grainsize),body object)

A parallel_scan<Range,Body> computes a parallel prefix or parallel scan. The template function parallel_scan decides whether and when to generate parallel work. parallel_scan better suited for future systems with more than two cores.

Advanced algorithms:

1.parallel_while:

parallel_while<Body>

A parallel_while<Body> performs parallel iteration over items. The processing to be performed on each item is defined by a function object of type Body. The template class tbb::parallel_while can be used if the end of the iteration space is not known in advance, or the loop body may add more iterations to do before the loop exits.

2.parallel_sort:

void parallel_sort(RandomAccessIterator begin,RandomAccessIterator end,const Compare& comp );

A call to parallel_sort(i,j,comp) sorts the sequence [i,j) using the third argument comp to determine relative orderings.

void parallel_sort(RandomAccessIterator begin,RandomAccessIterator end,const Compare& comp );

A call to parallel_sort(i,j) is equivalent to parallel_sort(i,j,std::less<T>). parallel_sort provides an unstable sort of the sequence [begin1,end1). This sort is a comparison sort with an average time complexity O(n log n).

3.Pipeline:

class pipeline;

A pipeline represents the pipelined application of a series of filters to a stream of items. Each filter is parallel or serial.
class filter; A filter represents a filter in a pipeline. A filter is parallel or serial. A parallel filter can process multiple items in parallel and possibly out of order. A serial filter processes items one at a time in the original stream order.

Containers:

1.concurrent_queue:

concurrent_queue<T>

The template class concurrent_queue<T> implements a concurrent queue with values of type T. This is bounded data structure, that permits multiple threads to concurrently push and pop item from the queue.
Pushing is provided by the push method. Pop is carried by blocking and nonblocking methods.
pop_if_present
It is nonblocking, if the queue is empty, it returns anyway.
Pop
This method blocks until it pops a value.

2.concurrent_vector:

concurrent_vector<T>

A concurrent_vector<T> is a dynamically grow able array of items of type T for which it is safe to simultaneously access elements in the vector while growing it.

3.concurrent_hash_map:

concurrent_hash_map<Key,T,HashCompare>

A concurrent_hash_map<Key,T,HashCompare> is a hash table that permits concurrent accesses. The table is a map from a key to a type T. The HashCompare traits type defines how to hash a key and how to compare two keys. A concurrent_hash_map maps keys to values in a way that permits multiple threads to concurrently access values.

hyPACK-2013 Mode 1 : Mixed Mode of Prog. Using MPI & Intel TBB