Overview Venue : CMSD, UoH Key-Note/Invited Talks Faculty / Speakers Proceedings Downloads Past Tech. Workshops Target Audience Benefits Organisers Accommodation Local Travel Sponsors Feedback Acknowledgements Contact Home

Topics of Interest Tech. Prog. Schedule Topic : Multi-Core Topic : ARM Proc. Topic : Coprocessors Topic : GPGPUs Topic : HPC Cluster Topic : App. Kernels. Topic : Lab. Session Key-Note / Invited Talks Home

Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compilers - Opt. Features Threads-Perf. Math. Lib. Threads-Prof. & Tools Threads - I/O Perf. PGAS : UPC / CAF/ GA Power & Perf. Home

Mode-2 ARM Prog. Env Benchmarks Power & Perf. Home

Mode-3 Coprocessors Arch. Software Compiler & Vect. Prog. Env. Benchmarks Power & Perf. Home

Mode-4 GPGPUs NVIDIA - CUDA/OpenCL AMD APP - OpenCL GPGPUs - OpenCL GPGPUs : Power & Perf. Home

Mode-5 HPC Cluster HPC MPI Cluster GPU Cluster - NVIDIA GPU Cluster - AMD APP Cluster - Intel Coprocessors Cluster- Power & Perf. Home

Mode-6 App. Kernels PDE Solvers : FDM/FEM Image Processing - FFT Monte Carlo Methods String Srch. Seq. Analy. Video Process. Intr. Detcn. Sys App. Power & Perf. Home

Reg. Overview Pvt. Sector Pub. Sector Govt. Acad. Staff Students Reg. On-line Reg. Accommodation Contact Home

• Mode-1 Multi-Core • Memory Allocators • OpenMP • Intel TBB • Pthreads • Java - Threads • Charm++ Prog. • Message Passing (MPI) • MPI - OpenMP • MPI - Intel TBB • MPI - Pthreads • Compiler Opt. Features • Threads-Perf. Math.Lib. • Threads-Prof. & Tools • Threads-I/O Perf. • PGAS : UPC / CAF / GA • Power-Perf. • Home

hyPACK-2013 Multi-Cores - Memory Allocators

A memory allocator should perform memory operations (i.e., malloc and free ) about as fast as a state-of-the-art serial memory allocator. A good memory allocator should guarantee performance even when a multi-threaded program executes on a single processor. As the number of processors in the system grows, the performance of the allocator must scale linearly with the number of processors to ensure scalable application performance. The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocator for shared-memory multiprocessors.

Introduction of Hoard Memory Allocator Overview of Hoard Memory Allocator

Advantages Of Hoard Memory Allocator

Compilation, Linking and Execution Using Hoard Memory Allocator

References : Multi-threading OpenMP Java Threads Books MPI Benchmarks

List of Programs

Programs based on Numerical Computations (Matrix,Vector Computations) : Examples programs on vector-vector multiplication using block striped partitioning, matrix-vector multiplication using self scheduling algorithm, , matrix matrix multiplication using block striped partitioning. The focus is to use memory allocators and understand Performance issues on multi-core processors.

Introduction of Hoard Memory Allocator

The Hoard memory allocator, or Hoard, is a memory allocator for Linux, Solaris, Microsoft Windows and other operating systems. Hoard is a drop-in replacement for malloc() that can dramatically improve application performance, especially for multi-threaded programs running on multiprocessors. Hoard can improve the performance of multi-threaded applications by providing fast, scalable memory management functions (malloc and free). It reduces contention for the heap (the central data structure used in dynamic memory allocation) caused when multiple threads allocate or free memory, and avoids the false sharing that can be introduced by memory allocators. At the same time, Hoard has strict bounds on fragmentation.

Overview of Hoard Memory Allocator

Using a single-threaded malloc in a multi-threaded application can degrade performance. As memory is being allocated concurrently in multiple threads, all the threads must wait in a queue while malloc() handles one request at a time. With a few extra threads, this can slow down performance. Multi-threaded applications do not scale because of number reasons. Some of them are :

Contention:

Multi-threaded programs often do not scale because the heap is a bottleneck. When multiple threads simultaneously allocate or deallocate memory from the allocator, the allocator will serialize them. Programs making intensive use of the allocator actually slow down as the number of processors increases.

False Sharing:

The allocator can cause false sharing in multi-threaded application. Threads on different CPUs can end up with memory in the same cache line, or chunk of memory. Accessing these falsely-shared cache lines is hundreds of times slower than accessing unshared cache lines.

Blow Up:

Multi-threaded programs can also lead the allocator to blowup memory consumption. This effect can multiply the amount of memory needed to run your application by the number of CPUs on your machine: four CPUs could mean that you need four times as much memory.

Hoard is a fast allocator that solves all of these problems. It reduces contention for the heap (the central data structure used in dynamic memory allocation) caused when multiple threads allocate or free memory, and avoids the false sharing that can be introduced by memory allocators. At the same time, Hoard has strict bounds on fragmentation.

Advantages of Hoard Memory Allocator

Speed : As fast as a Uniprocessor allocator on one processor .
Scalability : Scales linearly with the number of processors.
Avoids false sharing.
Low Fragmentation.

Compilation and execution using Hoard Memory Allocator

To use Hoard memory allocator with our application, we do not need to change any source code. Assuming that Hoard memory allocator is available in the specified location or path

                                  /home/tbbtest/Hoard/

step 1 :
                On UNIX-based platforms, before compilation we have to set environment variable LD_PRELOAD.

                 $ export    LD_PRELOAD=''/home/tbbtest/Hoard/libhoard.so''
                                                                    or
                 $ setenv    LD_PRELOAD=''/home/tbbtest/Hoard/libhoard.so''
step 2 :
                To compile and link programs, you can use the command,

                 $ gcc -o <executable name > <name of the source file >

                 For example to compile a simple 'Hello World' program user can give :

                 $ gcc -o helloworld helloworld.c
step 3 :
                To execute the programs give the name of the executable at command prompt.

                 $ ./< executable name >

                 For example, to execute a simple 'Hello World' Program, user must type:

                 $ ./helloworld
step 4:
                To know whether our application has been linked with Hoard memory allocator, use the command ldd. ldd prints the shared libraries required by each program or shared library specified on the command line.

                 $ ldd <executable name >

           For example :

                 $ ldd <helloworld >

           The Output will be like

linux-gate.so.1 => (0xffffe000)
/home/tbbtest/Hoard/libhoard.so (0xb7f6a000)
libc.so.6 => /lib/libc.so.6 (0x4d6a1000)
libdl.so.2 => /lib/libdl.so.2 (0x4d7fd000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x4daa6000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x4da98000)
/lib/ld-linux.so.2 (0x4d684000)

Observe the second line of the output which shows that the application is linked with Hoard memory allocator.

Centre for Development of Advanced Computing