C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

hyPACK-2013 HPC GPU Cluster - Heterogeneous Programming

HPC Messaging Passing Cluster

The Message Passing Programming paradigm is one of the widely used approaches for programming parallel computers. The Standard Message Passing Interface (MPI) library is commonly used for applications with numerous programming languages. There are two key attributes that characterize the message-passing programming paradigm. The first is that it assumes a partitioned address space and the second is that is supports only explicit parallelism. The logical view of a machine supporting the message-passing paradigm consists of p processes, each with its own exclusive address space. Instances of such a view come from cluster of workstations and non-shared address space multicomputers. Two important implications of a partitioned address space play an important role to understand message passing. First, each data element must belong to one of the partitions of the space; hence data must be explicitly partitioned and placed. This adds complexity of programming, but due to locality of data on each process, there is a possibility to achieve high-performance. In a Message Passing Cluster, MPI processes are launched across several cluster nodes with suitable interconnect. The assignment of processes to host nodes depends on the MPI implementation and launch configuration (hostfile), preventing reliable selection of a unique GPU. Several MPI implementation exists and some of these are given below.

References :

Multi-threading

MPI

Benchmarks

Intel MIC The Intel many integrated Core (Intel MIC) architecture in Intel's upcoming Knight Corner is useful for High-performance computing applications. Intel "Kinght Corner" compute acclerator cards for highly-parallel workloads can be integrated with Clusters to enhance the performance. Intel's Knights Corner accelerator has over 50 cores and delivers more than 1 TFLOPS of double precision floating point performance for general matrix-matrix multiplication benchmarks (DGEMM). The MIC architecture provides higher compute density than the current multi-core processors by packing a larger number of smaller cores that are equipped with hardware threads and wider vector units into a single MIC co-processor, resulting more than one teraflop double precision

Intel MIC products will have compatibility with existing x86 programming model and tools. One of the benefits of Intel MIC architecture is the ability to run existing applications without the need to port the code to a new programming environment. The x86 compatibility allows the huge repertoire of existing tools, libraries and applications to run on it, with little or no modification. Intel's MICs take advantage of the x86 architecture that has dominated the high-performance computing and hence evelopers can program these cores using standard C, C++, and FORTRAN source code. An opportunity is provided for scientists to use both CPU and co-processor performance simultaneously with existing x86 based applications, This efforts reduce saving time, cost and resources. In otherwords, there is no need of investing valuable time to rewrite the applications. Intel MIC architecture combines many Intel CPU cores onto a single chip.