C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

In Multi Core programming environment, within the same time interval, there may be multiple processes active competing for memory, I/O and CPU resources. Several applications are CPU-bound (computation intensive) and some of I/O bound (input-output intensive). The execution of various types of programs in the Multi-Core System to balance bandwidth among various function units i s a challenging task. The program interleaving is intended to promote better resource utilization through overlapping I/O and CPU operations on multiple cores. For example, whenever a thread P₁ is tied up with I/O operations, the OS scheduler can switch the CPU to thread P₂. This allows simultaneous the simultaneous execution of several programs in the system. When P₂ is done, the CPU can be switched to P₃. Note the Overlapped I/O and CPU operations and the CPU wait time are greatly reduced. The performance of system can be limited by computer-bound jobs or input-output (I/O) bound jobs. Various techniques are used to manage I/O data transfer. Techniques used to achieve maximum concurrency of I/O and CPU processing is an important in the Multi-core programming environment.

In Multi-core environment, too many threads can seriously degrade program performance and the issues of cache, virtual memory, thread locks, and time slicing play a key role for performance degradation. In some situations, all the threads waiting for the lock mus know wait for the holding thread to wake up and release the lock. These problems are addressed and the best approach is to limit the the number of runnable threads to the hardware threads, and possibly limit the number of outer level cached in typical multi-socket multi-core systems.

When a thread is blocked waiting for an external event, such as disk I/O request, the OS takes it off the round-robin schedule. Here, a blocked thread does not cause time-slicing overhead and a program may have more software threads than hardware threads, and still run efficiently if most of the OS threads are blocked. The concept of compute threads & I/O threads may help to reduce the overheads. Special care is needed to ensure that the compute threads should match the processor resources. Compute threads should be the threads that are runnable most of the time. Ideally, the compute threads should be the threads never block an external events, instead feed from task queues that provide work. The I/O threads are threads that wait on external events most of the time, and thus do not contribute to having too many threads. Too many threads can degrade program performance and the impact comes in two ways. First partitioning a fixed amount of work among too many threads and this may give too little work for each thread. This leads to overhead of starting and terminating threads swamps the useful work. Second, having too many concurrent software threads incurs overheads from having to share fixed hardware resources.

Much attention has been paid on the implementation issues for concurrent overlapping I/O operations that abide by the atomicity semantics of the Message Passing Interface (MPI) standard. The atomicity refers as a synchronization operation and it is a process often needs to perform a sequence of operations as a single atomic operation. An atomic operation is one that is indivisible (Once it starts, it cannot be interrupted in the middle, meaning other processes cannot see an intermediate state.) and finite (once it starts, it will finish in a finite amount of time). A synchronization operation causes processes to wait for one another, or allows processes that are waiting to resume execution. On Multi-Cores, use of Portable Operating System Interface (POSIX) at granularity level address the concurrent overlapping I/O. The POSIX definition considers atomicity at the granularity of read () /write () calls in which only a contiguous file space can be specified in a single I/O request file regions.

hyPACK-2013 Mode 1 : Software Threading : I/O Perf. Issues