• Mode-5 HPC Cluster • Cluster : Multi-Core - MPI • Cluster : GPU - NVIDIA CUDA/OpenCL • Cluster : GPU - AMD OpenCL • Cluster : Coprocessors -Intel Xeon Phi • Home




hyPACK-2013 HPC GPU Cluster - Heterogeneous Programming



HPC Messaging Passing Cluster

The Message Passing Programming paradigm is one of the widely used approaches for programming parallel computers. The Standard Message Passing Interface (MPI) library is commonly used for applications with numerous programming languages. In a Message Passing Cluster, MPI processes are launched across several cluster nodes with suitable interconnect. There are two key attributes that characterize the message-passing programming paradigm. The first is that it assumes a partitioned address space and the second is that is supports only explicit parallelism. The logical view of a machine supporting the message-passing paradigm consists of p processes, each with its own exclusive address space. Hybrid Heterogeneous HPC Cluster is becoming popular to solve complex heterogeneous workloads in which Accelerators based on CPU , GPU , FPGA are being used nowadays. These systems can address some of the heterogeneous computing workloads. The goal of this mixed environment is to provide total workflow optimization, which takes cares-off applications that do not parallelize well on scalar processors, can be optimized with the appropriate computation model.

Message Passing Interface (MPI) :     MPICH-2     OpenMPI     FT-MPI     LAMMPI    




References : Multi-threading       MPI       Benchmarks


MPICH-2

MPICH2 is a high-performance and widely portable implementation of the Message Passing interface (MPI) standard (both MPI1 and MPI-2) MPICH2 is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. The original implementation of MPICH (MPICH1) implements the MPI-1.1 standard. The latest implementation (MPICH2) implements the MPI-2.2 standard. One of the goals of MPICH2 are to provide an MPI implementation that efficiently supports difference computation and communication platforms including commodity clusters (Desktop systems, shared-memory systems, multi-core architectures), high-speed networks (19 Gigabit Ethernet, InfiniBand, Myrinet, Quadrics, C-DAC PARAMNet) and massively Parallel Processing systems



OpenMPI

OpenMP is becoming popular and the stated driving motivation behind Open MPI is to bring the best ideas and technologies from the individual projects and create one world-class open source MPI implementation that excels in all areas. One of the OpenMP goals is to Create a free, open source software, peer-reviewed, production-quality complete MPI-2 implementation. Open MPI represents the merger between three well-known MPI implementations:




FT-MPI

FT-MPI is from the University of Tennessee, Knoxville (UTK), and its aim is to build a fault tolerant MPI implementation that can survive failures, while offering the application developer a range of recovery options other than just returning to some previous checkpoint. FT-MPI is built on the HARNESS project. HARNESS (Heterogeneous Adaptable Reconfigurable Networked Sytems) provides a fault-tolerant, dynamic run-time environment, which is used by FT-MPI for process management and failure notification. UTK's FT-MPI implementation is available for free download at http://icl.cs.utk.edu/ftmpi/ .



LAM-MPI

LAM-MPI is an implementation of the Message Passing Interface (MPI) motivated by a growing need for fault tolerance at the software level in large high-performance computing (HPC) systems. The presence of vast number of components present in modern HPC systems, particularly clusters require fault tolerance checks at each component. The individual components -- processors, memory modules, network interface cards (NICs), etc. may have different types of error when the applications run for many hours or even days for completion. LAM/MPI is a high-quality open-source implementation of the Message Passing Interface specification, including all of MPI-1.2 and much of MPI-2. Intended for production as well as research use, LAM/MPI includes a rich set of features for system administrators, parallel programmers, application users, and parallel computing researchers. LAM run-time environment: a user-level, daemon-based run-time environment that provides many of the services required by MPI programs. Both major components of the LAM/MPI package are designed as component frameworks - extensible with small modules that are selectable (and configurable) at run-time. LAM/MPI is from Indiana University and the developers are working on OpenMPI.



MVAPCIH

MVAPCIH is MPI over InfiniBand, & 10GigE network technologies. The software is developed in Network-Based Computing Laboratory (NBCL) of the Ohio State University. MVAPICH/MVAPICH2 software delivers best performance, scalability and fault tolerance for high-end computing systems and servers using InfiniBand, 10GigE/iWARP and RoCE

networking technologies. InfiniBand is emerging as a high-performance interconnect delivering low latency and high bandwidth. It is also getting widespread acceptance due to its open standard. MVAPICH is an open-source MPI software to exploit the novel features and mechanisms of InfiniBand and other RDMA enabled interconnects to deliver performance and scalability to MPI applications.




Intel MIC

The Intel announced the new brand name for all products based on Intel (Many Integrated Core architecture - the Intel Xeon Phi product family - new technology. Intel Xeon Phi is the new brand name for all future Intel® Many Integrated Core Architecture based products targeted at HPC, enterprise, datacenters and workstations. The Intel many integrated Core (Intel MIC) architecture in Intel's upcoming Knight Corner is useful for High-performance computing applications. Intel "Kinght Corner" compute accelerator cards for highly-parallel workloads can be integrated with Clusters to enhance the performance. Intel's Knights Corner accelerator has over 50 cores and delivers more than 1 TFLOPS of double precision floating point performance for general matrix-matrix multiplication benchmarks (DGEMM). The MIC architecture provides higher compute density than the current multi-core processors by packing a larger number of smaller cores that are equipped with hardware threads and wider vector units into a single MIC co-processor, resulting more than one teraflop double precision

Intel MIC products will have compatibility with existing x86 programming model and tools. One of the benefits of Intel MIC architecture is the ability to run existing applications without the need to port the code to a new programming environment. The x86 compatibility allows the huge repertoire of existing tools, libraries and applications to run on it, with little or no modification. Intel's MICs take advantage of the x86 architecture that has dominated the high-performance computing and hence developers can program these cores using standard C, C++, and FORTRAN source code. An opportunity is provided for scientists to use both CPU and co-processor performance simultaneously with existing x86 based applications, this efforts reduce saving time, cost and resources. In otherwords, there is no need of investing valuable time to rewrite the applications. Intel MIC architecture combines many Intel CPU cores onto a single chip. Intel MIC (Many Integrates Coe) is a co-processor attached to CPU host via PCIe bus and it is designed for highly parallel, vectorizable codes For more details on Intel Many Integrated Core (MIC) Knight's Ferry (KNF) platforms visit URL :
http://en.wikipedia.org/wiki/Intel_MIC and TACC-Intel Highly Parallel Computing Symposium



Centre for Development of Advanced Computing