CUDA

From HP-SEE Wiki

Revision as of 10:27, 4 August 2011 by Roczei (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents


Authors/Maintainers

  • Also origin, if the software comes from a specific project.

Summary

The CUDA architecture was hardcoded in the NVIDIA graphics processors in order to make their massively parallel processing power available for general purpose computing. With a speedup of one up to two orders of magnitude with respect to CPUs, the NVIDIA GPUs endowed with CUDA architecture have recorded a great success in real time data processing, simulation and modeling of wave propagation, medical imaging, processing of satellite images and pattern recognition, the exploration of the natural resources, etc. Besides the OpenCL API, CUDA software environment supports other APIs, such as CUDA Fortran and DirectCompute (on Microsoft Windows Vista and 7), and the programming of applications in C/C++, Fortran, Java, Python and MS .NET framework. The CUDA programming model assumes that the HPC system is composed of a host (standard CPU) and at least one massive parallel processor (GPU). In GPGPU programming the developer must create the complete program that contains code for the main (CPU) and the graphics processor (GPU) at the same time. The GPU-based functionality of the software has to be extracted and transformed to the graphics hardware's internal format. Another assumption of the programming model is that each GPU includes a large number of arithmetic execution units. Then, the arithmetic operations in the parallelizable parts of the program can be simultaneously performed with the help of CUDA. The GPU kernels are defined in CUDA C language as C functions whose instances can be executed in parallel by different CUDA threads. Moreover, the threads are grouped in blocks which can be independently processed by a single core and correspond to coarse sub-problems. The number of threads per block (core) is limited by the memory resources that the core must share, and should be carefully designed for optimization. The total number of threads is equal to the product between the number of threads per block times the number of blocks, and can in practice reach values of the order of thousands. This is the reason why CUDA GPGPUs are ideal for high throughput acceleration.

Features

  • extensions to standard programming languages, like C
  • parallel computing to masses
  • a Toolkit and a SDK
  • scalable programming model
  • general purpose parallel computing
  • heterogeneous
  • accessible

Architectural/Functional Overview

  • high level design info, how it works, performance - may be a link, or several links

Usage Overview

http://developer.nvidia.com/what-cuda

With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for CUDA, including image and video processing, computational biology and chemistry, fluid dynamics simulation, CT image reconstruction, seismic analysis, ray tracing, and much more.

Cuda in Research & Apps http://developer.nvidia.com/cuda-action-research-apps

CUDA in Education & training http://developer.nvidia.com/cuda-education-training

Dependencies

  • NVIDIA graphic card (CUDA enabled GPU)
  • NVIDIA video driver
  • supported OS (Windows, Linux, MacOSX)

HP-SEE Applications

  • HAG (High energy physics Algorithms on GPU)
  • FAMAD (Fractal Algorithms for MAss Distribution)

Resource Centers

  • HPCG, BG
  • ISS_GPU, RO

Usage by Other Projects and Communities

  • If any

Recommendations for Configuration and Usage

CUDA has a easy install. See http://developer.nvidia.com/getting-started-parallel-computing

Configuration of CUDA threads and blocks depend heavily on the parallel algorithm intended to be run. One can ease application development with CUDA SDK. CUDA SDK provides examples with source code, utilities and white papers to help novice programmers to get started writing GPGPU software. The NVIDIA CUDA Toolkit is requiered to run and compile SDK code samples.