C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

Global Arrays : Global Arrays : It provides a shared memory view of distributed data structures for a MIMD parallel program. The Global Arrays toolkit implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage.

It allows each process to access logical blocks of physically distributed multidimensional arrays as if they were located in shared memory. The GA programming model is primarily a memory rather than interprocess communication model, as it provides interfaces for data movement between shared and local memory. GA is built upon a low-level message passing system called the Aggregate Remote Memory Copy Interface (ARMCI). GA was developed at Pacific Northwest National Laboratory (PNNL). In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran.

In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving parallel efficiency. It also supports a combination of task- and data-parallelism and is available as an extension of the message passing (MPI) model. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems , and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. Virtually all the scalable architectures possess non-uniform memory access characteristics that reflect their multi-level memory hierarchies. These hierarchies typically comprise processor registers, multiple levels of cache, local memory, and remote memory. Over time, both the number of levels and the cost (in processor cycles) of accessing deeper levels has been increasing. It is important for any scalable programming model to address memory hierarchy since it is critical to the efficient execution of scalable applications.

GA allows the programmer to control data distribution and makes the locality information readily available to be exploited for performance optimization. For example, global arrays can be created by 1) allowing the library to determine the array distribution, 2) specifying the decomposition for only one array dimension and allowing the library to determine the others, 3) specifying the distribution block size for all dimensions, or 4) specifying an irregular distribution as a Cartesian product of irregular distributions for each axis. The distribution and locality information is always available through interfaces to query 1) which data portion is held by a given process, 2) which process owns a particular array element, and 3) a list of processes and the blocks of data owned by each process corresponding to a given section of an array. The primary mechanisms provided by GA for accessing data are block copy operations that transfer data between layers of memory hierarchy, namely global memory (distributed array) and local memory.

hyPACK-2013 : Mode-1 Partitioned Global Address Space Prog. Model (PGAS)