hyPACK-2013 Mode-1 : Memory Allocators - Multi-threaded Prog. Env.
Many traditional scientific applications and other services class of applications which include
web servers, database managers, news servers require parallel, multi-threaded C / C++ programming
languages. The scalability and performance of these applications on multi-core systems is closely
tied up the with memory allocation. These applications use dynamic memory allocation.
Unfortunately, the memory allocator is often a bottleneck that severely limits program scalability
on multiprocessor systems. Existing serial memory allocators do not scale well for multithreaded applications.
Some memory allocators suffer from problems that include poor performance and scalability, and heap organizations that
introduce false sharing.
An Overview of Memory Allocator :
A memory allocator should perform memory operations (i.e., malloc and free ) about as fast as a state-of-the-art serial memory allocator. A good memory allocator should guarantee performance even
when a multithreaded program executes on a single processor. As the number of processors in the
system grows, the performance of the allocator must scale linearly with the number of processors
to ensure scalable application performance.
Using a single-threaded malloc in a multithreaded application can degrade performance. As memory is being allocated concurrently
in multiple threads, all the threads must wait in a queue while malloc() handles one request at a time. With a few extra threads, this can
slow down performance, causing a problem known as heap contention. In other words, all the threads are competing for access to the same heap.
One indication of heap contention is that the application is making a considerably high number of calls to malloc(). System library
implementers take various approaches to alleviate the bottleneck of a singly threaded malloc().
Attention is required to know the limits of maximum amount of memory required by the application and the maximum amount of memory allocated from
the operating system. Excessive allocation of memory for the application may introduce fragmentation leading to degrade performance by causing
poor data locality.
Scalable Memory Allocators (
Intel Software tools,
The Hoard Memory Allocator and
google-perftools) are considered for multi-threaded
implementation in the Hands-on Session programs.
Lab Session: List of Programs
hyPACK-2013 laboratory session provides following codes using different memory allocators on
Multi-core Processors.
Dense Matrix Computations using traditional malloc (matrix-matrix, matrix-vector, vector-vector multiplication)
Dense Matrix Computations using malloc with Hoard Memory Allocator (matrix-matrix, matrix-vector, vector-vector multiplication)
Dense Matrix Computations using mmap (memory mapping) with Hoard Memory Allocator (matrix-matrix, matrix-vector, vector-vector multiplication)
Dense Matrix Computations using scalable malloc which is provided by Intel Threading Building Blocks (matrix-matrix, matrix-vector, vector-vector multiplication)
|