hyPACK-2013 Hands On Session
|
hyPACK-2013 Hands-on Sessions (HoS) will be conducted on HPC Cluster with coprocessors and accelerators. Programming on ARM Processor Cluster and ARM Processor
system with CUDA enabled NVIDIA carma and DSP Multi-Core processor systems for
Mode-1, Mode-2, Mode-3, Mode-4, & Mode-5 modules.
The approach adopted to heterogeneous programming for applications
kernels and numerical linear algebra on hybrid computing systems (
HPC GPU Cluster) is discussed
in Mode-1, Mode-2, Mode-3 & Mode-4 modules of
hypack-2013 ) are given below.
|
Mode-2 : ARM Processor Systems
|
-
System 1 : ARM development platform featuring NVIDIA Tegra processors
are being used in HPC. ARM platforms with CUDA parallel programming
toolkit, provides the foundation for developers to build out the ARM HPC application ecosystem.
The CARMA DevKit features the NVIDIA Tegra 3 Quad-core ARM A9 CPU and the NVIDIA Quadro 1000M GPU
with 96 CUDA cores. It offers HPC developers a simple way to create CUDA applications for
GPU-accelerated systems with ARM processors.
The topics such as
Tuning and Performance Issues, Power Consumption for Application Kernels, Measurement of Power Consumption - using External Power-Off-Meter, and Programming on ARM processor multi-core processor systems will be discussedTuning & Performance of programs
on ARM CoreProcessor System & Performance of Application kernels and measurement
of power consumption for various NLA kernels using different programming paradigms have been
included.
-
System 2 :
Boston combined
hardware, software and system design expertise to build a
server that is far more energy efficient than a general-purpose, x86-based serverCalled the
Boston Viridis, the
server uses an array of RISC architecture processors to run
an application at the same level as a traditional x86 server
at a fraction of the power.
The Boston Viridis workstation ARM Processor system is used for development of codes to
calculate power consuption and performance of application kernels.
The Boston Viridis uses the ARM based Calxeda EnergyCore SoCs (Server on Chip) to create a
rack mountable 2U server cluster comprising 192 processing cores leading the way towards energy
efficient hyperscale computing.
An ARM Cortex-A9 SoC, the Cortex-A9 architecture in the Viridis
platform gives the best
performance/watt, not overall performance for application kernels.
For Scalability of application with low power consumption, the Boston Viridis System i.e., 48 node ultra-low power ARM cluster with
integral high-speed interconnect and storage within a standard single 2U rack mount enclosure can
be to execute hyPACK-2013 codes.
-
System 3 : Texas Instruments C66x - DSP Multi-Core processor is used for hyPACK-2013
laboratory session. The capabilities such as
Floating point capabilities (16 flops/cycle(single precision); SIMD support up to 128 bits;
Eight functional units: two register files; Two general-purpose register files with 32-bit registers
Memory capabilities : 32 Kb of L1P and L1D cache, 4096 Kb of L2 memory (512 Kb per core), Usable as )SRAM, cached or mixed, 4096 Kb of shared MSM (Multicore Shared Memory), Usable as SRAM or L2
64-bit DDR3 external memory interface at 1600 Mhz with ECC;
Programmability and multi-thread support ( Native C/C++ code is supported by the TI C/C++ compiler; Intrinsics and vector datatypes to improve performance; Code Composer Studio as IDE;
Multi-thread support: OpenMP 3.0.
The capabilities of multi-core DSPs for Numerical linear
Algebra and application kernels
are included in hands-on sessin.
|
|
|
|