CUDA

From HP-SEE Wiki

(Difference between revisions)

Revision as of 14:30, 20 July 2011

Web site: http://www.nvidia.com/object/cuda_home_new.html
Described version: xx.xx
Licensing: Proprietary, freeware, non-exclusive, non-transferable, non-sublicensable: http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/cudaprof_eula.pdf
User documentation: link
Download: link
Source code: link

Authors/Maintainers

Also origin, if the software comes from a specific project.

Summary

The CUDA architecture was hardcoded in the NVIDIA graphics processors in order to make their massively parallel processing power available for general purpose computing. With a speedup of one up to two orders of magnitude with respect to CPUs, the NVIDIA GPUs endowed with CUDA architecture have recorded a great success in real time data processing, simulation and modeling of wave propagation, medical imaging, processing of satellite images and pattern recognition, the exploration of the natural resources, etc. Besides the OpenCL API, CUDA software environment supports other APIs, such as CUDA Fortran and DirectCompute (on Microsoft Windows Vista and 7), and the programming of applications in C/C++, Fortran, Java, Python and MS .NET framework. The CUDA programming model assumes that the HPC system is composed of a host (standard CPU) and at least one massive parallel processor (GPU). In GPGPU programming the developer must create the complete program that contains code for the main (CPU) and the graphics processor (GPU) at the same time. The GPU-based functionality of the software has to be extracted and transformed to the graphics hardware's internal format. Another assumption of the programming model is that each GPU includes a large number of arithmetic execution units. Then, the arithmetic operations in the parallelizable parts of the program can be simultaneously performed with the help of CUDA. The GPU kernels are defined in CUDA C language as C functions whose instances can be executed in parallel by different CUDA threads. Moreover, the threads are grouped in blocks which can be independently processed by a single core and correspond to coarse sub-problems. The number of threads per block (core) is limited by the memory resources that the core must share, and should be carefully designed for optimization. The total number of threads is equal to the product between the number of threads per block times the number of blocks, and can in practice reach values of the order of thousands. This is the reason why CUDA GPGPUs are ideal for high throughput acceleration.

Features

Listed features

Architectural/Functional Overview

high level design info, how it works, performance - may be a link, or several links

Usage Overview

If possible with small example - may be a link

Dependacies

list of all relevant dependencies on other libraries

HP-SEE Applications

HAG (High energy physics Algorithms on GPU)
FAMAD (Fractal Algorithms for MAss Distribution)

Resource Centers

HPCG, BG
ISS_GPU, RO

Usage by Other Projects and Communities

If any

Recommendations for Configuration and Usage

Please describe here any common settings, configurations or conventions that would make the usage of this resource (library or tool) more interoperable or scalable across the HP-SEE resources. These recommendations should include anything that is related to the resource and is agreed upon by administrators and users, or across sites and applications. These recommendations should emerge from questions or discussions opened by site administrators or application developers, at any stage, including installation, development, usage, or adaptation for another HPC centre.

Provided descriptions should describe general or site specific aspects of resource installation, configuration and usage, or describe the guidelines or convention for deploying or using the resource within the local (user/site) or temporary environment (job). Examples are:

Common configuration settings of execution environment
Filesystem path or local access string
Environment variables to be set or used by applications
Options (e.g. additional modules) that are needed or required by applications and should be present
Minimum quantitative values (e.g. quotas) offered by the site
Location and format of some configuration or usage hint instructing applications on proper use of the resource or site specific policy
Key installation or configuration settings that should be set to a common value, or locally tweaked by local site admins
Conventions for application or job bound installation and usage of the resource

CUDA

From HP-SEE Wiki

Revision as of 14:30, 20 July 2011

Contents

Authors/Maintainers

Summary

Features

Architectural/Functional Overview

Usage Overview

Dependacies

HP-SEE Applications

Resource Centers

Usage by Other Projects and Communities

Recommendations for Configuration and Usage

Views

Personal tools

Navigation

Search

Toolbox

@@ Line 1: / Line 1: @@
 __TOC__
-* Web site: link
+* Web site: http://www.nvidia.com/object/cuda_home_new.html
 * Described version: xx.xx
-* Licensing: e.g. LGPL 3, BSD...
+* Licensing: Proprietary, freeware, non-exclusive, non-transferable, non-sublicensable: http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/cudaprof_eula.pdf
 * User documentation: link
 * Download: link
@@ Line 12: / Line 12: @@
 == Summary ==
-One paragraph description of purpose, targer area, approach.
+The CUDA architecture was hardcoded in the NVIDIA graphics processors in order to
+make their massively parallel processing power available for general purpose computing.
+With a speedup of one up to two orders of magnitude with respect to CPUs, the NVIDIA
+GPUs endowed with CUDA architecture have recorded a great success in real time data
+processing, simulation and modeling of wave propagation, medical imaging, processing
+of satellite images and pattern recognition, the exploration of the natural resources, etc.
+Besides the OpenCL API, CUDA software environment supports other APIs, such as
+CUDA Fortran and DirectCompute (on Microsoft Windows Vista and 7), and the
+programming of applications in C/C++, Fortran, Java, Python and MS .NET framework.
+The CUDA programming model assumes that the HPC system is composed of a host
+(standard CPU) and at least one massive parallel processor (GPU). In GPGPU
+programming the developer must create the complete program that contains code for
+the main (CPU) and the graphics processor (GPU) at the same time. The GPU-based
+functionality of the software has to be extracted and transformed to the graphics
+hardware's internal format.
+Another assumption of the programming model is that each GPU includes a large
+number of arithmetic execution units. Then, the arithmetic operations in the
+parallelizable parts of the program can be simultaneously performed with the help of
+CUDA.
+The GPU kernels are defined in CUDA C language as C functions whose instances can be
+executed in parallel by different CUDA threads. Moreover, the threads are grouped in
+blocks which can be independently processed by a single core and correspond to coarse
+sub-problems. The number of threads per block (core) is limited by the memory
+resources that the core must share, and should be carefully designed for optimization.
+The total number of threads is equal to the product between the number of threads per
+block times the number of blocks, and can in practice reach values of the order of
+thousands. This is the reason why CUDA GPGPUs are ideal for high throughput
+acceleration.
 == Features ==
@@ Line 27: / Line 54: @@
 == HP-SEE Applications ==
-* Applications using it
+* HAG (High energy physics Algorithms on GPU)
+* FAMAD (Fractal Algorithms for MAss Distribution)
 == Resource Centers ==
-* RCs supporting it (with version number if not the same as above)
+* HPCG, BG
+* ISS_GPU, RO
 == Usage by Other Projects and Communities ==