CUDA
From HP-SEE Wiki
Contents |
- Web site: http://www.nvidia.com/object/cuda_home_new.html
- Described version: xx.xx
- Licensing: Proprietary, freeware, non-exclusive, non-transferable, non-sublicensable: http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/cudaprof_eula.pdf
- User documentation: link
- Download: link
- Source code: link
Authors/Maintainers
- Also origin, if the software comes from a specific project.
Summary
The CUDA architecture was hardcoded in the NVIDIA graphics processors in order to make their massively parallel processing power available for general purpose computing. With a speedup of one up to two orders of magnitude with respect to CPUs, the NVIDIA GPUs endowed with CUDA architecture have recorded a great success in real time data processing, simulation and modeling of wave propagation, medical imaging, processing of satellite images and pattern recognition, the exploration of the natural resources, etc. Besides the OpenCL API, CUDA software environment supports other APIs, such as CUDA Fortran and DirectCompute (on Microsoft Windows Vista and 7), and the programming of applications in C/C++, Fortran, Java, Python and MS .NET framework. The CUDA programming model assumes that the HPC system is composed of a host (standard CPU) and at least one massive parallel processor (GPU). In GPGPU programming the developer must create the complete program that contains code for the main (CPU) and the graphics processor (GPU) at the same time. The GPU-based functionality of the software has to be extracted and transformed to the graphics hardware's internal format. Another assumption of the programming model is that each GPU includes a large number of arithmetic execution units. Then, the arithmetic operations in the parallelizable parts of the program can be simultaneously performed with the help of CUDA. The GPU kernels are defined in CUDA C language as C functions whose instances can be executed in parallel by different CUDA threads. Moreover, the threads are grouped in blocks which can be independently processed by a single core and correspond to coarse sub-problems. The number of threads per block (core) is limited by the memory resources that the core must share, and should be carefully designed for optimization. The total number of threads is equal to the product between the number of threads per block times the number of blocks, and can in practice reach values of the order of thousands. This is the reason why CUDA GPGPUs are ideal for high throughput acceleration.
Features
- extensions to standard programming languages, like C
- parallel computing to masses
- a Toolkit and a SDK
- scalable programming model
- general purpose parallel computing
- heterogeneous
- accessible
Architectural/Functional Overview
- high level design info, how it works, performance - may be a link, or several links
Usage Overview
http://developer.nvidia.com/what-cuda
With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for CUDA, including image and video processing, computational biology and chemistry, fluid dynamics simulation, CT image reconstruction, seismic analysis, ray tracing, and much more.
Cuda in Research & Apps http://developer.nvidia.com/cuda-action-research-apps
CUDA in Education & training http://developer.nvidia.com/cuda-education-training
Dependencies
- NVIDIA graphic card (CUDA enabled GPU)
- NVIDIA video driver
- supported OS (Windows, Linux, MacOSX)
HP-SEE Applications
- HAG (High energy physics Algorithms on GPU)
- FAMAD (Fractal Algorithms for MAss Distribution)
Resource Centers
- HPCG, BG
- ISS_GPU, RO
Usage by Other Projects and Communities
- If any
Recommendations for Configuration and Usage
Please describe here any common settings, configurations or conventions that would make the usage of this resource (library or tool) more interoperable or scalable across the HP-SEE resources. These recommendations should include anything that is related to the resource and is agreed upon by administrators and users, or across sites and applications. These recommendations should emerge from questions or discussions opened by site administrators or application developers, at any stage, including installation, development, usage, or adaptation for another HPC centre.
Provided descriptions should describe general or site specific aspects of resource installation, configuration and usage, or describe the guidelines or convention for deploying or using the resource within the local (user/site) or temporary environment (job). Examples are:
- Common configuration settings of execution environment
- Filesystem path or local access string
- Environment variables to be set or used by applications
- Options (e.g. additional modules) that are needed or required by applications and should be present
- Minimum quantitative values (e.g. quotas) offered by the site
- Location and format of some configuration or usage hint instructing applications on proper use of the resource or site specific policy
- Key installation or configuration settings that should be set to a common value, or locally tweaked by local site admins
- Conventions for application or job bound installation and usage of the resource