C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

hypack-2013 Mode-2 : GPGPU AMD-APP (APU) SDK

AMD Accelerated Parallel Processing (AMD APP) SOftware harnesses the tremendous processing power of GPUs for high-performance, data-parallel computing in a wide range of applications. The AMD Accelerated Parallel Processing system includes a software stack and the AMD GPUs. Please refer to AMD-APP Accelerated Parallel Processing (AMD APP) Programming Guide OpenCL to understand the relationship of the AMD Accelerated Parallel Processing components.

The AMD APP software stack provides end-users and developers with a complete, flexible suite of tools to leverage the processin power in AMD GPUs. AMD-APP OpenCL software development platform for x86-based CPUs and it provides complete heterogeneous OpenCL development platform for both the CPU and GPU. The software includes OpenCL compiler & runtime, Device Driver for GPU compute device - AMD Compute Abstraction Layer (CAL), Performance Profiling Tools - AMD APP Profiler and AMD APP KernelAnalyzer and Performance Libraries - AMD Core Math Library (ACML).

AMD APP SDK :

OpenCL

CAL

List of Programs - OpenCL

List of Programs OpenCL - AMD APP

Module 1 :

Getting Started : Basics - OpenCL

Module 2 :

OpenCL Programs on Matrix Computations

Module 3 :

OpenCL Programs using BLAS libraries for Matrix Computations

Module 4 :

OpenCL Programs - Application Kernels

Module 5 :

OpenCL Memory Optimization Programs - Tuning & Performance

In the recent years, much attention has been gained for general purpose CPU (GPGPU) processing. The word "general purpose" in the context of High Performance Computing (HPC) usually means "data intensive applications in scientific and engineering fields. In GPGPU (Graphics) Processing, the graphics performance of specialized software, e.g. scientific software, image manipulation, video decoders/encoders, games that make GPU performance pretty important.

The speed at which the data can be sent to the GPGPUs, internally processed and the results sent back is as important as the processing power of the GPGPUs. Also, the performance of Video (GFX) Rendering in which how efficiently graphics processors can handle rendering. Such operations are used by all graphics software, image manipulation, video decoders/encoders, games and modern operating systems. Video (GFX) Memory is crucial for performance, the bandwidth of the memory of the video adapters (GFXs) and the bandwidth of the bus drive the performance.

In these programming techniques, programmers can use GPU's pixel shavers as general-purpose single precision FPUs, For typical Video applications, GPGPU processing is highly parallel, but it relies on the size of off-chip video memory to operate on large data sets. Off-chip memory on GPGPUs plays an important role for GPGPU applications in which different threads must interact each other through off chip memory. From graphics point of view, the video memory, normally used for texture maps and so forth in graphics applications, may store any kind of data in GPGPCU applications. Video (GFX) Memory is crucial for performance and the bandwidth of the memory of the video adapters (GFXs) and the bandwidth of the bus that connects them to your computer drive the performance.

In GPGPU (Graphics) Processing, the graphics performance of specialized software, e.g. scientific software, image manipulation, video decoders/encoders, games that make GPU performance pretty important. Also, the performance of Graphics (GPGPU) bandwidth i.e. the bandwidth of the memory of the graphics processors (GPGPUs) and the bandwidth of the bus that connects them to your computer. The speed at which the data can be sent to the GPGPUs, internally processed and the results sent back is as important as the processing power of the GPGPUs. Also, the performance of Video (GFX) Rendering in which how efficiently graphics processors can handle rendering. Such operations are used by all graphics software, image manipulation, video decoders/encoders, games and modern operating systems. Video (GFX) Memory is crucial for performance of applications.

The nVIDA CUDA & AMD-APP model is highly parallel as GPGPU model. The approach is to divide the data set into smaller chunks stored in on-chip memory then allows multiple thread processors to share each chunk. Storing the data locally reduces the need to access off-chip memory, thereby improving the performance.

The GPU is viewed as a compute device capable of executing a very high number of threads in parallel. It operates as a coprocessor to the main CPU called host. Data-parallel, compute intensive portions of applications running on the host are transferred to the device by using a function that is executed on the device as many different threads. Both the host and the device maintain their own DRAM, referred to as host memory and device memory, respectively. One can copy data from one DRAM to the other through optimized API calls that utilize the devices high-performance Direct Memory Access (DMA) engines.