hyPACK-2013 HPC GPU Cluster - Heterogeneous Programming
|
HPC GPU Cluster (AMD Opeteron Processors with AMD APP GPU Devices)
Two types of Hybrid Heterogeneous HPC GPU Cluster are used in laboratory sessions of workshop.
The two clusters i.e., Intel Xeon Processor nodes as host-cpus with CUDA enabled NVIDIA GPUs
as device accelerator GPUs and another cluster consists of AMD-Opteron processor nodes as
host-cpu with AMD-ATI GPUs (AMDFire Stream & AMD-ATI FirePro) accelerator GPUs and AMD APUs.
These clusters can address some of the heterogeneous computing workloads.
The hybrid computing system aim is to develop system software and
integrate components of the State-of-the-Art-Technology
such as Stream accelerators NVIDIA GPU computing, AMD-ATI SDK.
|
The implementation and programming issues of integrated cluster of Multi-Core processors with GPU accelerators,
will be discussed. The HPC GPU Cluster supports Parallel Programming models, which include Shared memory
programming (POSIX Threads, OpenMP, Intel TBB), and MPI 2.0 standard on Multi Core Processors. The Linux programming
environment is provided on Cluster.
|
Type 1 : Configuration of HPC GPU Cluster
|
Peak performance (in double precision) of HPC GPU Cluster with one node having OpencL enabled
AMD-ATI GPU is 4955 Gflop/s
Host-CPU : AMD Opteron X86 12 Core;
Device GPU :
AMD Fire Stream 9350 & 9250;
AMD FirePro V5900 & V7900
Host-CPU (AMD)
-
One AMD Opteron X86 24 Core Multi-Core Processor systems with
two PCI-e 2.0 x16 Slots; RAM-48 GB; Clock Speed : 3.0 GHz; Cent OS 5.2;
GCC Version 4.1.2; Dual Socket 12 Core (24 cores)
-
ACML version, OpenCL and BLAS Libraries;
Peak Performance : CPU : 144 Gflops (1 Node - 12 Cores) and AMD-APP with OpenCL Prog. Env.
GPUs (AMD-ATI)
-
AMD Fire Stream 9250 GPU Accelerator :
Double Precision
Floating Point : The FireStream 9250 supports double precision floating point operations in
hardware;
High Performance per Watt : Up to 8 GFLOPS per watt of single precision performance potential
Optimized for computation
The AMD FireStream product line provides the industry's first double-precision floating point
capability on a GPU. The AMD FireStream 9250 is our second generation DP-FP product.
With 1GB GDDR3 memory on board and single-precision performance of 1 TFLOPS.
-
AMD Fire Stream 9350 GPU Accelerator :
Technology Need : AMD FireStream Computing Solution
High DPFP performance : 528 GFLOPS double precision
High performance per Watt : 2.4 GFLOPS / Watt
Open standards : OpenCL, Direct Compute
Performance optimization tools : OpenCL SDK
PCIe 2.1 Host Interface : 8 GB/S Host-GPU bandwidth
The FireStream 9350 offers maximum GPU performance with 4GB of DDR5 memory
in a 2-slot configuration.
The FireStream 9350 offers maximum performance / slot with 2GB DDR5 memory
in a 1-slot configuration.
-
AMD FirePro V5900 :
The AMD FirePro V5900 features 2GB of blazing-fast GDDR5 memory,
512 stream processors, and support for three simultaneous monitor
outputs from a single AMD FirePro V5900 graphics card with AMD
technology.
The AMD FirePro V5900 supports OpenCL and it has
parallel processing capabilities of 512 stream processors and PCI Express 2.1 compliant.
-
AMD FirePro V7900 :
The AMD FirePro V7900 features : 2GB of
ultra-fast GDDR5 memory
and 1280 stream processors. The AMD FirePro V7900 supports OpenCL and it has
parallel processing capabilities of 1280 stream processors and PCI Express 2.1 compliant.
|
List of Programs based on HPC GPU Cluster
-
Demonstrate codes using different memory types of OpenCL Architectures on AMD APP GPU Cluster
and AMD APUs
-
Incorporation of Error Checks on HPC GPU Cluster based on OpenCL for matric computation test
suites
-
Example programs on Heterogeneous Programming - OpenCL based on CUDA enabled
NVIDIA GPUs
-
Tuning & Performance using OpenCL enabled AMD-APP Libraries; Memory Optimization, Data-access
optimization for matrix computations
-
Matrix Computations : Matrix - Vector Multiplication, Matrix-Matrix Multiplication based
on MPI and OpenCL Implementation on HPC GPU Cluster with AMD-ATI GPUs
-
Application Kernels demonstration on HPC GPU Clusters (Heterogeneous Programming & MPI,
Pthreads & Intel TBB)
-
Performance of Matrix Computations using vendor supplied tuned mathematical libraries
(OpenCL based BLAS on AMD-ATI GPUs) on HPC GPU Cluster with GPU Accelerators)
-
Selective Numerical Computational kernels on Parallel Processing Systems with GPU
Accelerator devices using MPI & OpenCL enabled AMD-ATI GPUs on HPC GPU Cluster
-
Numerical Linear algebra on Multi-Core Processors using Mixed Mode of Programming
( MPI-OpenCL, Pthreads-OpenCL) on HPC GPU Cluster.
-
Special Class of Application Kernels, and Numerical Linear algebra on Multi-Core
Processors using Heterogeneous Programming ( OpenMP-OpenCL, MPI-OpenCL, Pthreads-OpenCL) on HPC GPU Cluster.
-
HPC-GPU Cluster (MPI on host-CPU & GPU - OpenCL - Solution of Partial differential
Equations
-
HPC GPU Cluster (MPI on host-CPU & GPU - OpenCL - Image Processing -Edge Detection
algorithms
-
Heterogeneous Programming (MPI on host-CPU & GPU - OpenCL - String Search algorithms
& Sequence Analysis Applications
-
Develop test suites on HPC GPU Cluster based on MPI programming in Host-CPU to launch
multiple kernels on GPU devices on each node of HPC GPU Cluster in an MPI- OpenCL programming environment
-
HPC GPU Cluster (MPI on host-CPU & GPU-OpenCL - Open source software Benchmarks - Solution
of Matrix system Ax=b of Linear Equations (OpenCL based LINPACK solvers)
-
HPC GPU Cluster (MPI on host-CPU & GPU-OpenCL - Open source software
Benchmarks - LINPACK (Solution of Matrix system Ax=b of Linear Equations)
-
Performance of MAGMA (Numerical Linear Algebra Kernels) on CUDA enabled GPUs
& L HPC GPU Cluster (MPI on host-CPU & GPU - OpenCL - Image Processing -Edge Detection
algorithms using OpenACC
-
Bio-Informatics: Sequence analysis (Smith Waterman Algorithms) on HPC GPU Cluster - OpenCL
enabled NVIDIA GPUs
-
Solution of Partial Differential Equations (Poisson Equation in two dimensional &
three dimensional regions) by finite element Method (FEM) using OpenCL AMD-APP on HPC GPU Cluster.
-
Image Processing -Face Detection and Image Inpainting algorithms on HPC GPU
Cluster - AMD APP
|
References
1.
|
AMD Fusion
|
2.
|
APU
|
3.
|
All about AMD FUSION APUs (APU 101)
|
4.
|
AMD A6 3500 APU Llano
|
5.
|
AMD A6 3500 APU review
|
6.
|
AMD APP SDK with OpenCL 1.2 Support
|
7.
|
AMD-APP-SDKv2.7 (Linux)
with OpenCL 1.2 Support
|
8.
|
AMD Accelerated Parallel Processing Math Libraries (APPML)
|
9.
|
AMD Accelerated Parallel Processing (AMD APP) Programming Guide OpenCL : May 2013
|
10.
|
MAGMA OpenCL
|
11.
|
AMD Accelerated Parallel Processing (APP) SDK (formerly ATI Stream)
with AMD APP Math Libraries (APPML); AMD Core Math Library (ACML);
AMD Core Math Library for Graphic Processors (ACML-GPU)
|
12.
|
Getting Started with OpenCL
|
13.
|
Aparapi - API & Java
|
14.
|
AMD Developer Central - OpenCL Zone
|
15.
|
AMD Developer Central - SDKs
|
16.
|
ATI GPU Services (AGS) Library
|
17.
|
AMD GPU - Global Memory for Accelerators (GMAC)
|
18.
|
AMD Developer Central - Programming in OpenCL
|
19.
|
AMD GPU Task Manager (TM)
|
20.
|
AMD APP Documentation
|
21.
|
AMD Developer OpenCL FORUM
|
22.
|
AMD Developer Central - Programming in OpenCL - Benchmarks performance
|
23.
|
OpenCL 1.2 (pdf file)
|
24.
|
OpenCLT Optimization Case Study Fast Fourier Transform - Part 1
|
25.
|
AMD GPU PerfStudio 2
|
26.
|
Open Source Zone - AMD CodeAnalyst Performance Analyzer for Linux
|
27.
|
AMD ATI Stream Computing OpenCL - Programming Guide
|
28.
|
AMD OpenCL Emulator-Debugger
|
29.
|
GPGPU :
http://www.gpgpu.org
and Stanford BrookGPU discussion forum
http://www.gpgpu.org/forums/
|
30.
|
Apple : Snowleopard - OpenCL
|
31.
|
The OpenCL Speciifcation Version : v1.0 Khronos OpenCL Working Group
|
32.
|
Khronos V1.0 Introduction and Overview, June 2010
|
33.
|
The OpenCL 1.1 Quick Reference card.
|
34.
|
OpenCL 1.2 Specification Document Revision 15) Last Released November 15, 2011
|
35.
|
The OpenCL 1.2 Specification (Document Revision 15) Last Released November 15, 2011
Editor : Aaftab Munshi Khronos OpenCL Working Group
|
36.
|
OpenCL1.1 Reference Pages
|
37.
|
MATLAB
|
38.
|
OpenCL Toolbox v0.17 for MATLAB
|
39.
|
NAG
|
40.
|
AMD Compute Abstraction Layer (CAL) Intermediate Language (IL)
Reference Manual. Published by AMD.
|
41.
|
C++ AMP (C++ Accelerated Massive Parallelism)
|
42.
|
C++ AMP for the OpenCL Programmer
|
43.
|
C++ AMP for the OpenCL Programmer
|
44.
|
MAGMA SC 2011 Handout
|
45.
|
AMD Accelerated Parallel Processing Math Libraries (APPML) MAGMA
|
46.
|
The OpenCL 1.2 Specification Khronos OpenCL Working Group
|
47.
|
The OpenCL 1.2 Quick-reference-card ; Khronos OpenCL Working Group
|
48.
|
Benedict R Gaster, Lee Howes, David R Kaeli, Perhadd Mistry Dana Schaa
Heterogeneous Computing with OpenCL, Elsevier, Moran Kaufmann Publishers, 2011
|
49.
|
Programming Massievely Parallel Processors - A Hands-on Approach,
David B Kirk, Wen-mei W. Hwu
nvidia corporation, 2010, Elsevier, Morgan Kaufmann Publishers, 2011
|
50.
|
OpenCL Progrmamin Guide,
Aftab Munshi Benedict R Gaster, timothy F Mattson, James Fung,
Dan Cinsburg, Addision Wesley, Pearson Education, 2012
|
51.
|
AMD gDEBugger
|
52.
|
The HSA (Heterogeneous System Architecture) Foundation
|
|
|
|