CUDA

From HP-SEE Wiki

(Difference between revisions)
Jump to: navigation, search
(Created page with "__TOC__ * Web site: link * Described version: xx.xx * Licensing: e.g. LGPL 3, BSD... * User documentation: link * Download: link * Source code: link == Authors/Maintainers == *...")
Line 1: Line 1:
__TOC__
__TOC__
-
* Web site: link
+
* Web site: http://www.nvidia.com/object/cuda_home_new.html
* Described version: xx.xx
* Described version: xx.xx
-
* Licensing: e.g. LGPL 3, BSD...
+
* Licensing: Proprietary, freeware, non-exclusive, non-transferable, non-sublicensable: http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/cudaprof_eula.pdf
* User documentation: link
* User documentation: link
* Download: link
* Download: link
Line 12: Line 12:
== Summary ==
== Summary ==
-
One paragraph description of purpose, targer area, approach.
+
The CUDA architecture was hardcoded in the NVIDIA graphics processors in order to
 +
make their massively parallel processing power available for general purpose computing.
 +
With a speedup of one up to two orders of magnitude with respect to CPUs, the NVIDIA
 +
GPUs endowed with CUDA architecture have recorded a great success in real time data
 +
processing, simulation and modeling of wave propagation, medical imaging, processing
 +
of satellite images and pattern recognition, the exploration of the natural resources, etc.
 +
Besides the OpenCL API, CUDA software environment supports other APIs, such as
 +
CUDA Fortran and DirectCompute (on Microsoft Windows Vista and 7), and the
 +
programming of applications in C/C++, Fortran, Java, Python and MS .NET framework.
 +
The CUDA programming model assumes that the HPC system is composed of a host
 +
(standard CPU) and at least one massive parallel processor (GPU). In GPGPU
 +
programming the developer must create the complete program that contains code for
 +
the main (CPU) and the graphics processor (GPU) at the same time. The GPU-based
 +
functionality of the software has to be extracted and transformed to the graphics
 +
hardware's internal format.
 +
Another assumption of the programming model is that each GPU includes a large
 +
number of arithmetic execution units. Then, the arithmetic operations in the
 +
parallelizable parts of the program can be simultaneously performed with the help of
 +
CUDA.
 +
The GPU kernels are defined in CUDA C language as C functions whose instances can be
 +
executed in parallel by different CUDA threads. Moreover, the threads are grouped in
 +
blocks which can be independently processed by a single core and correspond to coarse
 +
sub-problems. The number of threads per block (core) is limited by the memory
 +
resources that the core must share, and should be carefully designed for optimization.
 +
The total number of threads is equal to the product between the number of threads per
 +
block times the number of blocks, and can in practice reach values of the order of
 +
thousands. This is the reason why CUDA GPGPUs are ideal for high throughput
 +
acceleration.  
== Features ==
== Features ==
Line 27: Line 54:
== HP-SEE Applications ==
== HP-SEE Applications ==
-
* Applications using it
+
* HAG (High energy physics Algorithms on GPU)
 +
* FAMAD (Fractal Algorithms for MAss Distribution)
== Resource Centers ==
== Resource Centers ==
-
* RCs supporting it (with version number if not the same as above)
+
* HPCG, BG
 +
* ISS_GPU, RO
== Usage by Other Projects and Communities ==
== Usage by Other Projects and Communities ==

Revision as of 14:30, 20 July 2011

Contents


Authors/Maintainers

  • Also origin, if the software comes from a specific project.

Summary

The CUDA architecture was hardcoded in the NVIDIA graphics processors in order to make their massively parallel processing power available for general purpose computing. With a speedup of one up to two orders of magnitude with respect to CPUs, the NVIDIA GPUs endowed with CUDA architecture have recorded a great success in real time data processing, simulation and modeling of wave propagation, medical imaging, processing of satellite images and pattern recognition, the exploration of the natural resources, etc. Besides the OpenCL API, CUDA software environment supports other APIs, such as CUDA Fortran and DirectCompute (on Microsoft Windows Vista and 7), and the programming of applications in C/C++, Fortran, Java, Python and MS .NET framework. The CUDA programming model assumes that the HPC system is composed of a host (standard CPU) and at least one massive parallel processor (GPU). In GPGPU programming the developer must create the complete program that contains code for the main (CPU) and the graphics processor (GPU) at the same time. The GPU-based functionality of the software has to be extracted and transformed to the graphics hardware's internal format. Another assumption of the programming model is that each GPU includes a large number of arithmetic execution units. Then, the arithmetic operations in the parallelizable parts of the program can be simultaneously performed with the help of CUDA. The GPU kernels are defined in CUDA C language as C functions whose instances can be executed in parallel by different CUDA threads. Moreover, the threads are grouped in blocks which can be independently processed by a single core and correspond to coarse sub-problems. The number of threads per block (core) is limited by the memory resources that the core must share, and should be carefully designed for optimization. The total number of threads is equal to the product between the number of threads per block times the number of blocks, and can in practice reach values of the order of thousands. This is the reason why CUDA GPGPUs are ideal for high throughput acceleration.

Features

  • Listed features

Architectural/Functional Overview

  • high level design info, how it works, performance - may be a link, or several links

Usage Overview

  • If possible with small example - may be a link

Dependacies

  • list of all relevant dependencies on other libraries

HP-SEE Applications

  • HAG (High energy physics Algorithms on GPU)
  • FAMAD (Fractal Algorithms for MAss Distribution)

Resource Centers

  • HPCG, BG
  • ISS_GPU, RO

Usage by Other Projects and Communities

  • If any

Recommendations for Configuration and Usage

Please describe here any common settings, configurations or conventions that would make the usage of this resource (library or tool) more interoperable or scalable across the HP-SEE resources. These recommendations should include anything that is related to the resource and is agreed upon by administrators and users, or across sites and applications. These recommendations should emerge from questions or discussions opened by site administrators or application developers, at any stage, including installation, development, usage, or adaptation for another HPC centre.

Provided descriptions should describe general or site specific aspects of resource installation, configuration and usage, or describe the guidelines or convention for deploying or using the resource within the local (user/site) or temporary environment (job). Examples are:

  • Common configuration settings of execution environment
  • Filesystem path or local access string
  • Environment variables to be set or used by applications
  • Options (e.g. additional modules) that are needed or required by applications and should be present
  • Minimum quantitative values (e.g. quotas) offered by the site
  • Location and format of some configuration or usage hint instructing applications on proper use of the resource or site specific policy
  • Key installation or configuration settings that should be set to a common value, or locally tweaked by local site admins
  • Conventions for application or job bound installation and usage of the resource
Personal tools