Direct Hit! |
Applies To: Adv C/C++ developers USP: Learn using GPUs for high-performance computing Primary Link: www.nvidia.com/cuda Keywords: CUDA |
Before explaining the implementation of CUDA, let's refresh what the
difference between a CPU and a GPU is. CPUs are designed to carry on serial
tasks whereas GPUs can process in parallel. CUDA is a software platform to use
GPUs for parallel high performance processing. Just look at the amazing video on
'Youtube,' that compares the working of a CPU and a GPU by the popular serial on
Discovery: Myth Busters' (in.). CUDA is small set
of extensions to C language that enables implementation of parallel algorithms;
GPUs have hundreds of cores with shared resources, FPUs, and registers that can
run threads in parallel. CUDA includes C/C++ software development tools,
function libraries, and a hardware abstraction mechanism that hides the GPU
hardware from developers. CUDA works along with conventional C/C++ compilers
making it possible to mix GPU code with the general purpose CPU code.
Installation
In this section we would help you to install CUDA on your machine. You first
need a CUDA enabled graphics cards. In our implementation we are using 'GeForce
9600 GT' graphics card. More information on CUDA enabled products and
development with CUDA can be obtained from the web page: www.nvidia.com/cuda.
Now the next step is to download CUDA software. On the same link click on
'DOWNLOAD CUDA 'and enter your operating system (Windows XP in our case). We
would be using CUDA 2.0; there are three things to be downloaded: CUDA drivers,
CUDA tool kit and CUDA SDK.
CUDA 2.0 supports version 177.35 or later NVIDIA ForceWare graphics drivers
for Windows XP. To check the version of drivers on your machine go to 'NIVIDA
Control Panel' and click on 'Help>System Information'. If the version of
derivers is lower than 177.35 then download and install CUDA drivers. The next
thing to be downloaded and installed is CUDA toolkit. This contains tool needed
to compile and build CUDA applications.
Finally one can download and install CUDA SDK for sample projects. To verify
installation run 'bandwidthTest' program present at 'C:\Program Files\NVIDIA
Corporation\NVIDIA CUDASDK\bin\ win32\ Release.' If properly installed, the
output window will show 'Test PASSED' in the second last line and the name of
graphics card in the first line. To test the version of CUDA drivers, open
command prompt and type 'nvcc -V'. Besides these CUDA software one can also use
Microsoft's Visual Studio 2005 for developing C/C++ applications.
CUDA platform for parallel processing on NVIDIA GPUs. Here one can see how GPU hardware is abstracted from application developers. |
Programming with CUDA
We throw some light on how programming is done for CUDA. It extends C by
allowing programmers to define C functions known as 'kernels'. When these
kernels are called, they execute n times (in parallel) in n different threads.
Here is the code snippet to define a kernel:
__global__ void matAdd(float A
float C
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i < N && j < N)
C
}
int main()
{
// Kernel invocation
dim3 dimBlock(16, 16);
dim3 dimGrid((N + dimBlock.x — 1) / dimBlock.x,
(N + dimBlock.y — 1) / dimBlock.y);
matAdd<<
dimBlock>>>(A, B, C);
}
To check the version of drivers on your machine, go to NVIDIA Control Panel. Click on 'Help>System Information' and check for 'ForceWare Version.' |
Here kernel is defined using '_global_', and number of threats are define
inside a new syntax <<<...>>>. Each of the thread that executes a kernel is
given a unique thread ID that is accessible within kernel through a built in
variable' threadIdx' variable. 'threadIdx' is a 3-component vector, therefore it
can be identified using one-dimensional, two-dimensional or three-dimensional
index forming one/two/three dimensional thread blocks.
While executing, threads can access memory from three different places:
private memory of thread, block memory for all threads present in block and
global memory. A lot of examples are present in 'C:\Program Files\NVIDIA
Corporation\NVIDIA CUDA SDK\projects.' Compile these examples and run them. One
can also customize these projects. Before writing codes programmers should
analyse their code so that they can create small chunks of data that can be
distributed into threads. Also keep in mind that you create sufficient number of
threads to optimally utilize GPU power.
After installing CUDA, run 'bandwidthTest' program in 'C:\Program Files\NVIDIA Corporation\NVIDIA CUDASDK\bin\win32\Release.' It should show 'Test PASSED. ' |
NVIDIA is not the only vendor to provide a programming interface to harness
the parallel processing power of a GPU. ATI has also joined them with the
release of 'ATI Stream Technology' that runs on ATI graphics cards. We shall be
providing more information on this in the near future. So, watch out this space
in the coming issues!