Overview Venue : CMSD, UoH Key-Note/Invited Talks Faculty / Speakers Proceedings Downloads Past Tech. Workshops Target Audience Benefits Organisers Accommodation Local Travel Sponsors Feedback Acknowledgements Contact Home

Topics of Interest Tech. Prog. Schedule Topic : Multi-Core Topic : ARM Proc. Topic : Coprocessors Topic : GPGPUs Topic : HPC Cluster Topic : App. Kernels. Topic : Lab. Session Key-Note / Invited Talks Home

Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compilers - Opt. Features Threads-Perf. Math. Lib. Threads-Prof. & Tools Threads - I/O Perf. PGAS : UPC / CAF/ GA Power & Perf. Home

Mode-2 ARM Prog. Env Benchmarks Power & Perf. Home

Mode-3 Coprocessors Arch. Software Compiler & Vect. Prog. Env. Benchmarks Power & Perf. Home

Mode-4 GPGPUs NVIDIA - CUDA/OpenCL AMD APP - OpenCL GPGPUs - OpenCL GPGPUs : Power & Perf. Home

Mode-5 HPC Cluster HPC MPI Cluster GPU Cluster - NVIDIA GPU Cluster - AMD APP Cluster - Intel Coprocessors Cluster- Power & Perf. Home

Mode-6 App. Kernels PDE Solvers : FDM/FEM Image Processing - FFT Monte Carlo Methods String Srch. Seq. Analy. Video Process. Intr. Detcn. Sys App. Power & Perf. Home

Reg. Overview Pvt. Sector Pub. Sector Govt. Acad. Staff Students Reg. On-line Reg. Accommodation Contact Home

• Topics of Interest • Tech. Prog. Schedule • Topic : Multi-Core • Topic : ARM Proc • Topic : Coprocessor • Topic : GPGPUs • Topic : HPC Cluster • Topic : App. Kernels • Lab. Overview • Key-Note/Invited Talks • Home

hyPACK-2013 Hands On Session

hyPACK-2013 Hands-on Sessions (HoS) will be conducted on HPC Cluster with coprocessors and accelerators. Programming on ARM Processor Cluster and ARM Processor system with CUDA enabled NVIDIA carma and DSP Multi-Core processor systems for Mode-1, Mode-2, Mode-3, Mode-4, & Mode-5 modules. The approach adopted to heterogeneous programming for applications kernels and numerical linear algebra on hybrid computing systems ( HPC GPU Cluster) is discussed in Mode-1, Mode-2, Mode-3 & Mode-4 modules of hypack-2013 ) are given below.

Mode-3 : Systems with Coprocessors

System 1 : Intel Xeon Phi Co-processor : The pragma-based offload model and using Intel Xeon Phi as an SMP processor is one of the easiest approached to write a program similar to existing x86 systems. The Intel Xeon Phi Knights ferry processor is a 61-core SMP chip where each core has a dedicated 512-bit wide SSE (Streaming SIMD Extensions) vector unit. All the cores are connected via a 512-bit bidirectional ring interconnect. Currently, the Phi coprocessor is packaged as a separate PCIe device, external to the host processor. Each Phi contains 15 GB of RAM that provides all the memory and file-system storage that every user process, the Linux operating system, and ancillary daemon processes will use. The theoretical maximum bandwidth of the Intel Xeon Phi memory system is 352 GB/s (5.5GTransfers/s * 16 channels * 4B/Transfer). Each Intel Xeon Phi core is based on a modified Pentium processor design that supports hyperthreading and some new x86 instructions created for the wide vector unit. The parallel threads issue instructions to the wide vector units quickly enough to keep the vector pipeline full. The current generation of coprocessor cores support up to four concurrent threads of execution via hyperthreading.

The Coprocessor is integrated with Intel X86 Xeon Processor Sandybride System for laboratory session.
System 2 : PARMA YUVA-II - a hybrid computing platform is a message passing cluster and configuration of a compute node with co-processors are given below. Compute Node : Two Quad Socket Eight Core Systems ( 16 CPU - Intel(R) Xeon(R) CPU E5-2670 @ 2.68GHz with sandy bridge Arch; RAM - 64 GB, cache - 20MB ; GCC 4.4.6; Infiniband, Interconnects having PARAMNet-II and InfiniBand. Each node has two Intel Xeon Phi Coprocessors.

Intel Xeon Phi Coprocessor : 60 Cores; -8 GB GDDR5 RAM; -32kB L1-cache per core; -512kB L2-cache per core

Centre for Development of Advanced Computing