hyPACK-2013 Hands On Session

hyPACK-2013 Hands-on Sessions (HoS) will be conducted on HPC Cluster with coprocessors and accelerators. Programming on ARM Processor Cluster and ARM Processor system with CUDA enabled NVIDIA carma and DSP Multi-Core processor systems for Mode-1, Mode-2, Mode-3, Mode-4, & Mode-5 modules. The approach adopted to heterogeneous programming for applications kernels and numerical linear algebra on hybrid computing systems ( HPC GPU Cluster) is discussed in Mode-1, Mode-2, Mode-3 & Mode-4 modules of hypack-2013 ) are given below.

Mode-3 : Systems with Accelerators - GPUs

Systems: CUDA/OpenCL enabled NVIDIA GPUs :

NVIDIA CUDA GPU : One Kepler K20 GK110 with 6 GB memory; CUDA 5.5 Toolkit; Peak Performance 1.31 Tflops (double precision performance); CUDA Cores :2688; with NVIDIA Management Library (NVML). The new features in Kepler GK110 are Dynamic Parallelism, Hyper-Q, Grid Management Unit, and NVIDIA GPU Direct. A kepler GK100 implementation include 15 SMX units and siz 64-bit memory controllers. Each of the kepler GK100 SMX units feature 192 single-precision CUDA cores, and exh core has fully pipleined floating-point and integer arithmetic logic units.
NVIDIA CUDA GPU : One Tesla C2050 (Fermi) with 3 GB memory; Clock Speed 1.15 GHz, CUDA 5.x Toolkit Reported theoretical peak performance of the Fermi (C2050) is 515 Gflop/s in double precision (448 cores; 1.15 GHz; one instruction per cycle) and reported maximum achievable peak performance of DGEMM in Fermi up to 58% of that peak.
NVIDIA CUDA GPU : The theoretical peak of the GTX280 is 936 Gflops/s in single precision (240 cores X 1.30 GHz X 3 instructions per cycle) and reported maximum achievable peak performance of DGEMM up to 40% of that peak.

Systems: OpenCL enabled AMD GPUs :

AMD Fire Stream 9250 GPU Accelerator :
Double Precision Floating Point : The FireStream 9250 supports double precision floating point operations in hardware; High Performance per Watt : Up to 8 GFLOPS per watt of single precision performance potential

Optimized for computation The AMD FireStream product line provides the industry's first double-precision floating point capability on a GPU. The AMD FireStream 9250 is our second generation DP-FP product. With 1GB GDDR3 memory on board and single-precision performance of 1 TFLOPS.
AMD Fire Stream 9350 GPU Accelerator :
Technology Need : AMD FireStream Computing Solution
High DPFP performance : 528 GFLOPS double precision
High performance per Watt : 2.4 GFLOPS / Watt
Open standards : OpenCL, Direct Compute
Performance optimization tools : OpenCL SDK
PCIe 2.1 Host Interface : 8 GB/S Host-GPU bandwidth

The FireStream 9350 offers maximum GPU performance with 4GB of DDR5 memory in a 2-slot configuration. The FireStream 9350 offers maximum performance / slot with 2GB DDR5 memory in a 1-slot configuration.
AMD FirePro V5900 :
The AMD FirePro V5900 features 2GB of blazing-fast GDDR5 memory, 512 stream processors, and support for three simultaneous monitor outputs from a single AMD FirePro V5900 graphics card with AMD Eyefinity technology. The AMD FirePro V5900 supports OpenCL and it has parallel processing capabilities of 512 stream processors and PCI Express 2.1 compliant.
AMD FirePro V7900 :
The AMD FirePro V7900 features : 2GB of ultra-fast GDDR5 memory and 1280 stream processors. The AMD FirePro V7900 supports OpenCL and it has parallel processing capabilities of 1280 stream processors and PCI Express 2.1 compliant.
HP Pavailion AMD A8-4500K (Trinity) APU the Pavilion dv6-7010 features an AMD A8-4500M APU with four cores, a 1.9 GHz clock frequency and a 2.8 GHz Turbo boost. Graphics are provided by a Radeon 7640G chip. Further specifications include 6 GB of memory, a 750 GB hard disk, Gigabit LAN, 802.11/b/g/n WiFi and Bluetooth. The 15.6-inch screen has a resolution of 1366x768 pixels.

Centre for Development of Advanced Computing