• Topics of Interest • Tech. Prog. Schedule • Topic : Multi-Core • Topic : ARM Proc • Topic : Coprocessor • Topic : GPGPUs • Topic : HPC Cluster • Topic : App. Kernels • Lab. Overview • Key-Note/Invited Talks • Home




hyPACK-2013 Hands On Session

hyPACK-2013 Hands-on Sessions (HoS) will be conducted on HPC Cluster with coprocessors and accelerators. Programming on ARM Processor Cluster and ARM Processor system with CUDA enabled NVIDIA carma and DSP Multi-Core processor systems for Mode-1, Mode-2, Mode-3, Mode-4, & Mode-5 modules. The approach adopted to heterogeneous programming for applications kernels and numerical linear algebra on hybrid computing systems ( HPC GPU Cluster) is discussed in Mode-1, Mode-2, Mode-3 & Mode-4 modules of hypack-2013 ) are given below.

Mode-3 : Systems with Coprocessors
  • System 1 : Intel Xeon Phi Co-processor : The pragma-based offload model and using Intel Xeon Phi as an SMP processor is one of the easiest approached to write a program similar to existing x86 systems. The Intel Xeon Phi Knights ferry processor is a 61-core SMP chip where each core has a dedicated 512-bit wide SSE (Streaming SIMD Extensions) vector unit. All the cores are connected via a 512-bit bidirectional ring interconnect. Currently, the Phi coprocessor is packaged as a separate PCIe device, external to the host processor. Each Phi contains 15 GB of RAM that provides all the memory and file-system storage that every user process, the Linux operating system, and ancillary daemon processes will use. The theoretical maximum bandwidth of the Intel Xeon Phi memory system is 352 GB/s (5.5GTransfers/s * 16 channels * 4B/Transfer). Each Intel Xeon Phi core is based on a modified Pentium processor design that supports hyperthreading and some new x86 instructions created for the wide vector unit. The parallel threads issue instructions to the wide vector units quickly enough to keep the vector pipeline full. The current generation of coprocessor cores support up to four concurrent threads of execution via hyperthreading.

    The Coprocessor is integrated with Intel X86 Xeon Processor Sandybride System for laboratory session.

  • System 2 : PARMA YUVA-II - a hybrid computing platform is a message passing cluster and configuration of a compute node with co-processors are given below. Compute Node : Two Quad Socket Eight Core Systems ( 16 CPU - Intel(R) Xeon(R) CPU E5-2670 @ 2.68GHz with sandy bridge Arch; RAM - 64 GB, cache - 20MB ; GCC 4.4.6; Infiniband, Interconnects having PARAMNet-II and InfiniBand. Each node has two Intel Xeon Phi Coprocessors.

    Intel Xeon Phi Coprocessor : 60 Cores; -8 GB GDDR5 RAM; -32kB L1-cache per core; -512kB L2-cache per core

Centre for Development of Advanced Computing