Developers

Harnessing the Power Locked in GPUs

PCQ Bureau

01 Jan 2009 10:27 IST

New Update

Advertisment

Direct Hit!

Applies To: Adv C/C++ developers

USP: Learn using GPUs for

high-performance computing

Primary Link:
www.nvidia.com/cuda

Keywords: CUDA

Before explaining the implementation of CUDA, let's refresh what the

difference between a CPU and a GPU is. CPUs are designed to carry on serial

tasks whereas GPUs can process in parallel. CUDA is a software platform to use

GPUs for parallel high performance processing. Just look at the amazing video on

'Youtube,' that compares the working of a CPU and a GPU by the popular serial on

Discovery: Myth Busters' (in.). CUDA is small set

of extensions to C language that enables implementation of parallel algorithms;

GPUs have hundreds of cores with shared resources, FPUs, and registers that can

run threads in parallel. CUDA includes C/C++ software development tools,

function libraries, and a hardware abstraction mechanism that hides the GPU

hardware from developers. CUDA works along with conventional C/C++ compilers

making it possible to mix GPU code with the general purpose CPU code.

Installation

In this section we would help you to install CUDA on your machine. You first

need a CUDA enabled graphics cards. In our implementation we are using 'GeForce

9600 GT' graphics card. More information on CUDA enabled products and

development with CUDA can be obtained from the web page: www.nvidia.com/cuda.

Now the next step is to download CUDA software. On the same link click on

'DOWNLOAD CUDA 'and enter your operating system (Windows XP in our case). We

would be using CUDA 2.0; there are three things to be downloaded: CUDA drivers,

CUDA tool kit and CUDA SDK.

Advertisment

CUDA 2.0 supports version 177.35 or later NVIDIA ForceWare graphics drivers

for Windows XP. To check the version of drivers on your machine go to 'NIVIDA

Control Panel' and click on 'Help>System Information'. If the version of

derivers is lower than 177.35 then download and install CUDA drivers. The next

thing to be downloaded and installed is CUDA toolkit. This contains tool needed

to compile and build CUDA applications.

Finally one can download and install CUDA SDK for sample projects. To verify

installation run 'bandwidthTest' program present at 'C:\Program Files\NVIDIA

Corporation\NVIDIA CUDASDK\bin\ win32\ Release.' If properly installed, the

output window will show 'Test PASSED' in the second last line and the name of

graphics card in the first line. To test the version of CUDA drivers, open

command prompt and type 'nvcc -V'. Besides these CUDA software one can also use

Microsoft's Visual Studio 2005 for developing C/C++ applications.

CUDA platform for parallel

processing on NVIDIA GPUs. Here one can see how GPU hardware is abstracted

from application developers.

Advertisment

Programming with CUDA

We throw some light on how programming is done for CUDA. It extends C by

allowing programmers to define C functions known as 'kernels'. When these

kernels are called, they execute n times (in parallel) in n different threads.

Here is the code snippet to define a kernel:

__global__ void matAdd(float A, float B,

float C)

{

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

if (i < N && j < N)

C = A + B;

}

int main()

{

// Kernel invocation

dim3 dimBlock(16, 16);

dim3 dimGrid((N + dimBlock.x — 1) / dimBlock.x,

(N + dimBlock.y — 1) / dimBlock.y);

matAdd<<
dimBlock>>>(A, B, C);

}

To check the version of

drivers on your machine, go to NVIDIA Control Panel. Click on 'Help>System

Information' and check for 'ForceWare Version.'

Advertisment

Here kernel is defined using '_global_', and number of threats are define

inside a new syntax <<<...>>>. Each of the thread that executes a kernel is

given a unique thread ID that is accessible within kernel through a built in

variable' threadIdx' variable. 'threadIdx' is a 3-component vector, therefore it

can be identified using one-dimensional, two-dimensional or three-dimensional

index forming one/two/three dimensional thread blocks.

While executing, threads can access memory from three different places:

private memory of thread, block memory for all threads present in block and

global memory. A lot of examples are present in 'C:\Program Files\NVIDIA

Corporation\NVIDIA CUDA SDK\projects.' Compile these examples and run them. One

can also customize these projects. Before writing codes programmers should

analyse their code so that they can create small chunks of data that can be

distributed into threads. Also keep in mind that you create sufficient number of

threads to optimally utilize GPU power.

After installing CUDA, run

'bandwidthTest' program in 'C:\Program Files\NVIDIA Corporation\NVIDIA

CUDASDK\bin\win32\Release.' It should show 'Test PASSED. '

NVIDIA is not the only vendor to provide a programming interface to harness

the parallel processing power of a GPU. ATI has also joined them with the

release of 'ATI Stream Technology' that runs on ATI graphics cards. We shall be

providing more information on this in the near future. So, watch out this space

in the coming issues!

Advertisment