• Mode-5 HPC Cluster • Cluster : Multi-Core - MPI • Cluster : GPU - NVIDIA CUDA/OpenCL • Cluster : GPU - AMD OpenCL • Cluster : Coprocessors -Intel Xeon Phi • Cluster:Power & Perf. • Home




hyPACK-2013 HPC GPU Cluster - Heterogeneous Programming



HPC Messaging Passing Cluster

The Message Passing Programming paradigm is one of the widely used approaches for programming parallel computers. The Standard Message Passing Interface (MPI) library is commonly used for applications with numerous programming languages. There are two key attributes that characterize the message-passing programming paradigm. The first is that it assumes a partitioned address space and the second is that is supports only explicit parallelism. The logical view of a machine supporting the message-passing paradigm consists of p processes, each with its own exclusive address space. Instances of such a view come from cluster of workstations and non-shared address space multicomputers. Two important implications of a partitioned address space play an important role to understand message passing. First, each data element must belong to one of the partitions of the space; hence data must be explicitly partitioned and placed. This adds complexity of programming, but due to locality of data on each process, there is a possibility to achieve high-performance. In a Message Passing Cluster, MPI processes are launched across several cluster nodes with suitable interconnect. The assignment of processes to host nodes depends on the MPI implementation and launch configuration (hostfile), preventing reliable selection of a unique GPU. Several MPI implementation exists and some of these are given below.

Message Passing Interface (MPI) :     MPICH-2     OpenMPI     FT-MPI     LAMMPI    

Accelerators :   CPU-Intel MIC (Xeon Phi)     C-DAC FPGA     Xilinx     Altera

References : Multi-threading     MPI   Benchmarks  

Click here ...... to know more about HPC MPI Cluster codes


Stand-alone Multi-core Processor systems with multiple GPUs are inter connected with appropriate high-speed network and their combined computational power can be applied to solve a variety of computationally intensive applications. System area networks move switched, low-latency, high-speed networks away from the backplanes and cabinets of massively parallel processors into the traditional territory of local area networks. The Linux programming environment is provided on Cluster and the operating environment can be designed to run large complex applications



MPICH-2

MPICH2 is a high-peroformance and widely portable implementation of the Message Passing interface (MPI) standard (both MPI1 and MPI-2) MPICH2 is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. The original implementation of MPICH (MPICH1) mplements the MPI-1.1 standard. The latest implementation (MPICH2) implements the MPI-2.2 standard. One of the goals of MPICH2 are to provie an MPI implementation that efficiently supports difference computation and communication platforms including commodity clusters (Desktop systems, shared-memory systems, multi-core architectures), high-speed netowrks (19 Gigabit Ethernet, InfiniBand, Myrinet, Quadrics, C-DAC PARAMNet) and massively Parallel Processing systems.


OpenMPI

OpenMP is becoming popular and the stated driving motivation behind Open MPI is to bring the best ideas and technologies from the individual projects and create one world-class open source MPI implementation that excels in all areas. One of the OpenMP goals is to Create a free, open source software, peer-reviewed, production-quality complete MPI-2 implementation. Open MPI represents the merger between three well-known MPI implementations:



FT-MPI

FT-MPI is from the University of Tennessee, Knoxville (UTK), and its aim is to build a fault tolerant MPI implementation that can survive failures, while offering the application developer a range of recovery options other than just returning to some previous checkpoint. FT-MPI is built on the HARNESS project. HARNESS (Heterogeneous Adaptable Reconfigurable Networked SyStems) provides a fault-tolerant, dynamic run-time environment, which is used by FT-MPI for process management and failure notification. UTK's FT-MPI implementation is available for free download at http://icl.cs.utk.edu/ftmpi/ .


LAM-MPI

LAM-MPI is an implementation of the Message Passing Interface (MPI) motivated by a growing need for fault tolerance at the software level in large high-performance computing (HPC) systems. The presence of vast number of components present in modern HPC systems, particularly clusters require fault tolerance checks at each component. The individual components -- processors, memory modules, network interface cards (NICs), etc. may have different types of error when the applications run for many hours or even days for completion. LAM/MPI is a high-quality open-source implementation of the Message Passing Interface specification, including all of MPI-1.2 and much of MPI-2. Intended for production as well as research use, LAM/MPI includes a rich set of features for system administrators, parallel programmers, application users, and parallel computing researchers. LAM run-time environment: a user-level, daemon-based run-time environment that provides many of the services required by MPI programs. Both major components of the LAM/MPI package are designed as component frameworks - extensible with small modules that are selectable (and configurable) at run-time. LAM/MPI is from from Indiana University and the developers are working on OpenMPI.


MVAPCIH

MVAPCIH is MPI over InfiniBand, & 10GigE nework technologies. The software is developed in Network-Based Computing Laboratoey (NBCL) of the Ohio State University. MVAPICH/MVAPICH2 software delivers best performance, scalability and fault tolerance for high-end computing systems and servers using InfiniBand, 10GigE/iWARP and RoCE networking technologies. InfiniBand is emerging as a high-performance interconnect delivering low latency and high bandwidth. It is also getting widespread acceptance due to its open standard. MVAPICH is an open-source MPI software to exploit the novel features and mechanisms of InfiniBand and other RDMA enabled interconnects to deliver performance and scalability to MPI applications. Panda.



Intel MIC

Intel MIC The Intel many integrated Core (Intel MIC) architecture in Intel's upcoming Knight Corner is useful for High-performance computing applications. Intel "Kinght Corner" compute acclerator cards for highly-parallel workloads can be integrated with Clusters to enhance the performance. Intel's Knights Corner accelerator has over 50 cores and delivers more than 1 TFLOPS of double precision floating point performance for general matrix-matrix multiplication benchmarks (DGEMM). The MIC architecture provides higher compute density than the current multi-core processors by packing a larger number of smaller cores that are equipped with hardware threads and wider vector units into a single MIC co-processor, resulting more than one teraflop double precision

Intel MIC products will have compatibility with existing x86 programming model and tools. One of the benefits of Intel MIC architecture is the ability to run existing applications without the need to port the code to a new programming environment. The x86 compatibility allows the huge repertoire of existing tools, libraries and applications to run on it, with little or no modification. Intel's MICs take advantage of the x86 architecture that has dominated the high-performance computing and hence evelopers can program these cores using standard C, C++, and FORTRAN source code. An opportunity is provided for scientists to use both CPU and co-processor performance simultaneously with existing x86 based applications, This efforts reduce saving time, cost and resources. In otherwords, there is no need of investing valuable time to rewrite the applications. Intel MIC architecture combines many Intel CPU cores onto a single chip.

Centre for Development of Advanced Computing