C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

hyPACK-2013 HPC Cluster - Intel Xeon co-Processors

HPC Cluster (Intel Xeon Processors with Intel Xeon Phi Co-processors)

Three tyes of Hybrid HPC Cluster is used in laboratory sessions of workshop. The three clusters i.e., Intel Xeon Processor nodes as host-cpus with CUDA enabled NVIDIA GPUs as device accelerator GPUs, another cluster consists of AMD-Opteron processor nodes as host-cpu with AMD-ATI GPUs (AMDFire Stream & AMD-ATI FirePro) accelerator GPUs and Intel Xeon Processor nodes as host-cpus with Intel Xeon-Phi Coprocessors. These clusters can address some of the heterogeneous computing workloads in typcial hybrid computing platforms. The hybrid computing system aim is to develop system software and integrate components of the State-of-the-Art-Technology such as Intel Xeon Phi Coprocessors, Stream accelerators NVIDIA GPU computing, and AMD-APU GPUs.

The HPC Intel Xeon-Phi Message Passing Cluster supports Parallel Programming models, which include Shared memory programming (POSIX Threads, OpenMP, Intel TBB), and MPI 2.0 standard on Multi Core Processors. The Linux programming environment is provided on Cluster and the operating environment can be designed to run large complex application that can make use of Intel Xeon-Phi coprocessors attached to Multi-Core Processors in an efficient way. The Linux programming environment can be configured to match different workloads of cluster as per application demands and execute highly scalable customized applications. The PARAM YUVA HPC Cluster-II - a message passing cluster with Intel Xeon Phi Coprocessors is used to design, develop and execute codes.

The pragma-based offload model and using Intel Xeon Phi as an SMP processor is one of the easiest approached to write a program similar to existing x86 systems. The challenge lies in expressing sufficient parallelism and vector capability to achieve high floating-point performance, as the Intel Xeon Phi coprocessors provide more than an order of magnitude increase in core count over the current generation dual-core and quad-core processors. The Xeon Phi Hardware Model from a Software Perspective The Intel Xeon Phi KNC processor is a 60-core SMP chip where each core has a dedicated 512-bit wide SSE (Streaming SIMD Extensions) vector unit. All the cores are connected via a 512-bit bidirectional ring interconnect (Figure 1). Currently, the Phi coprocessor is packaged as a separate PCIe device, external to the host processor. Each Phi contains 8 GB of RAM that provides all the memory and file-system storage that every user process, the Linux operating system, and ancillary daemon processes will use. The Phi can mount an external host file-system, which should be used for all file-based activity to conserve device memory for user applications. Even though Linux on Intel Xeon Phi provides a conventional SMP virtual memory environment, the coprocessor cards do not support paging to an external device.

The theoretical maximum bandwidth of the Intel Xeon Phi memory system is 352 GB/s (5.5GTransfers/s * 16 channels * 4B/Transfer), but internal bandwidth limitations inside the KNC chips (specifically the ring interconnect) plus the overhead of ECC memory limit achievable performance to 200 GB/s or less. Each Intel Xeon Phi core is based on a modified Pentium processor design that supports hyperthreading and some new x86 instructions created for the wide vector unit.