New programming languages and models

From HP-SEE Wiki

(Difference between revisions)
Jump to: navigation, search
(CAPS HMPP)
(MPI-ACC reference added.)
Line 25: Line 25:
== MPI-ACC ==
== MPI-ACC ==
 +
 +
Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not  integrated into such data movement frameworks, thus providing
 +
applications with no direct mechanism to perform end-to-end data movement. MPI-ACC is an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC’s runtime system enables several key optimizations, including pipelining of data transfers and balancing of communication based on accelerator and node architecture. MPI-ACC could both use CUDA and OpenCL accelerator
 +
programming interfaces.
 +
 +
Reference:  Ashwin M Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-Chun Feng, Keith R Bisset, Rajeev Thakur, MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems, In proceeding of: The 14th IEEE International Conference on High Performance Computing and Communications, At Liverpool, UK, 2012.
 +
 +
Link: http://synergy.cs.vt.edu/pubs/papers/aji-hpcc12-mpiacc.pdf

Revision as of 09:37, 2 October 2012

CAPS HMPP

CAPS HMPP is a complete software solution for the parallelization of legacy codes targeting in the efficient usage of multicore and GPGPU resources. Much like OpenMP HMPP is used as a set of directives which can be used to speed up the execution of a code while preserving its portability across infrastructures. HMPP can be used on already parallelized codes (either with MPI or OpenMP based ones).

HMPP directives that are added in the application source code do not change the semantic of the original code. They address the remote execution of functions or regions of code on GPUs and many-core accelerators as well as the transfer of data to and from the target device memory.

In the procedure of implementing HMPP directives on top of an existing code there are 3 steps to consider:

  * declaration of kernels that withhold a critical amount of computation
  * data management to and from the target device (i.e. GPGPU) memory
  * optimization of kernel performance and data synchronization

In practice HMPP separately handles and compiles an application to be executed on the native host and seperately the GPU accelerated codelet functions that are implemented on top of the native application as software plugins. In effect this means that the resulting application can be executed on a host that either has or does not have an accelerator resource (i.e. a GPGPU) as codelets will be executed on such hardware only as long as they are to be found on the host.

Note that the codelets are translated in NVIDIA CUDA and OpenCL languages by the HMPP backend and are thus compiled with the existing tools for these extensions on the software stack.

OpenACC

OpenACC is compirsed of a set of standardized, high-level pragmas that enable C/C++ and Fortran programmers to extend their code onto utilizing massively parallel processors with much of the convenience of OpenMP. Thus, the OpenACC standard preserves the familiarity of OpenMP code annotation while extending the execution model to encompass devices that reside in separate memory spaces. To support coprocessors, OpenACC pragmas annotate data placement and transfer (in addition to OpenMP) as well as loop and block parallelism.

Note that, in similarity to OpenMP, OpenACC provides portability across operating systems, host CPUs and accelerator resources.

The details of data management and parallelism are implicit in the programming model and are managed by OpenACC API-enabled compilers and runtimes. The programming model allows thus the programmer to handle in a clear way data management and guidance on mapping of loops onto an accelerator as well as similar performance-related details.

OpenACC directives may in-as-far be used via the CAPS HMPP product and via latest versions of the PGI Compiler Suite.

MPI-ACC

Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement frameworks, thus providing applications with no direct mechanism to perform end-to-end data movement. MPI-ACC is an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC’s runtime system enables several key optimizations, including pipelining of data transfers and balancing of communication based on accelerator and node architecture. MPI-ACC could both use CUDA and OpenCL accelerator programming interfaces.

Reference: Ashwin M Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-Chun Feng, Keith R Bisset, Rajeev Thakur, MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems, In proceeding of: The 14th IEEE International Conference on High Performance Computing and Communications, At Liverpool, UK, 2012.

Link: http://synergy.cs.vt.edu/pubs/papers/aji-hpcc12-mpiacc.pdf

Personal tools