C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

One of the non-shared memory model is supported On Intel Xeon host and Intel Xeon Phi Coprocessor, using offload pragmas in the application. The compiler handled the application data at runtime and sends data buffer over to coprocessors by logically ordering the data parameters as provided by the application. The other non-shared memory model is virtual shared memory model in which system level runtime support library is used to maintain the coherency between the host and coprocessor shared memory address space. The non-shared memory model is supported by OpenMP 4.0. Intel Compiler supports non-shared programming model and Intel OpenMP 4.0 Programming sticks to OpenMP 4.0 syntax. Intel compiler had proprietary language extensions that were implemented to support non-shared programming model.

In the non-shared programming model, the main program starts on the host and the data and computation can be sent to Intel Xeon Phi through Offload pragmas when the data exchange between the host and the coprocessor are bit-wise copyable. The bitwise copyable includes scalars, arrays and structures without indirections.

At runtime, the compiler takes care-off copying data back and forth between the host and the coprocessor around the offload block indicated by the pragmas in view of Coprocessor memory space is separate from the host memory space.

The data selected for offload may be implicitly copied if used inside the offload code block provided the variables are in the lexical scope of the code block performing offload and variable listed explicitly as part of the OpenMP 4.0 pragmas.

In OpenMP 4.0, the same code can run on host processor with or without a coprocessor, the compiler runtime determines whether the coprocessor is present on the system or not. So if the coprocessor is not available or inactive during the offload call, the code fragment inside the offload code block may execute on the host as well.

To understand the new language extensions of OpenMP 4.0, for non-shared memory programming, please refer to OpenMP 4.0 Technical Report (TR) i.e. www.openmp.org/mp-documents/TR1_167.pdf . In order to understand the language extensions defined in Open4.0 to deal with coprocessor, we need to understand some terminologies associated with the specification to explain the coprocessor interaction and these are explined below. Terminoloies & Notations