Advertisment

Using GPUs for Parallel Processing

author-image
PCQ Bureau
New Update

Thought graphics cards could do only graphics? Think again! If you look at

the action in the x86 graphics processing arena over the past six months or so,

you would notice they are talking about something called 'Stream Computing.' To

be truthful, stream computing by itself is nothing new. It is taking a process

with multiple threads that can be executed independently of each other, and

executing them in one go using multiple processors in parallel. How is this

related to graphics processors? We will get into the details shortly, but

consider this: a modern graphics processor (GPU) consists of up to 128 parallel

processing units called 'pixel shaders.' Now these pixel shaders cannot do

fantastic things by themselves individually. But take a bunch of them and run

parallel threads through them and you will get much better computer performance.

Advertisment

This has value in applications that have a large pool of data and

the same instruction needs to be applied to all of it. Some applications that

require this sort of computing are used for protein sequences analysis and

geographical data mapping from satellites. So how does stream computing on

graphics processors affect you? Until now, stream computing involved special

stream computing machines (much like super computers) and were restricted to

high-budget research institutions. But now, as graphics cards have this spare

computing capacity within them, they can be harnessed for the same effect at a

much lower cost. And, vendors of such cards (like AMD/ATi and NVIDIA) have come

up with mechanisms to let software developers produce stream computing

applications that can run on an off-the-shelf workstation with nothing more

powerful than a modern graphics card. So maybe your next payroll processing

engine will need the latest high-end gaming card instead of the latest in server

clusters to do their job faster.

Direct Hit!

Applies To: IT managers



USP: Learn about GPUs and how they are harnessed for other
computing applications



Primary Link: www.gpgpu.org




Google Keywords: gpgpu/ stream computing

Inside a modern GPU



Inside a GPU on any modern graphics card you pickup, there are specialized
processors (let's call them 'cores' instead) called 'vertex' and 'fragment'

shaders. A vertex shader operates on a vertex of any polygon being considered

for rendering. For instance, such a processor would determine the position and

color of the vertex. A fragment shader works on sets of pixels and generates

lighting effects (for example) on these pixels. Different shader models that

have evolved in-step with successive DirectX specifications have increased the

capabilities of what GPUs can do. For instance, higher floating point precision

called double-precision, involving 64-bit values instead of traditional 16 or 32

bit ones has enabled a lighting effects paradigm called HDR (High Dynamic Range)

that allows for more realistic shadows and effects.

Advertisment

Consider a GPU with 8 layers of 8 fragment shaders. This makes up 64 cores.

This group can render 64 fragments simultaneously. This sort of processing is of

course called SIMD (Single Instruction Multiple Data) that we discussed in an

earlier part of this series. To sum up, SIMD is an operation where you apply a

single instruction (or transformation) on multiple data elements. For example

increasing the light on a set of rendered pixels is an example of SIMD. Let's

take SIMD a step further.

A GPU can be used to

perform non-graphics tasks by sending those threads of a multi-threaded

application that can be executed independently of each other to pixel-shader

cores in the GPU, using a special instruction helper framework

Advertisment

Stream processing



Because these cores operate independently, they cannot share any data and so
there are two types of memory locations associated with them-one's called a

'Texture' memory which can only be read from; while the other is called

'Framebuffer' which can only be written to. Of course, data in the Framebuffer

can be routed back to the Texture memory to serve as input for another stage of

processing before it is sent to the screen. If you consider that all data in the

texture memory needs to have the same instructions applied to them, then modern

parlance calls this a 'Stream'. The application of the shader logic on this is

called 'Stream Processing' and the logic itself is called 'Kernel'. Related

terms are DPP (Data Parallel Processing) and GPGPU (General Purpose computing on

GPU).

Regular GPU based stream computing platforms utilize identical hardware (as

in graphics cards with identical capabilities). But this need not be the rule

and we can use dissimilar hardware, which AMD calls 'Asymmetric processing'.

ATi's cards can be used this way to handle different tasks depending on the

capabilities of each card plugged in. For instance, one can do physics while the

other can be used for rendering. Of course, in such a system, it is not easy to

scale up the performance by simply adding more cards, as this will only add to

the system's asymmetry. Another problem is workload distribution to optimize

performance. So, asymmetric processing can be used even by the same application

with multiple tasks that are independent, with no direct communication amongst

them.

The GPU contains sets of

Pixel Shader cores which can execute parallel threads very quickly
Advertisment

Developer support



Both ATi and NVIDIA have released SDKs (software development kits) to let
developers make applications that make use of GPUs. The problem was that

traditional compilers cannot directly optimize for graphics related processing

easily. NVIDIA's solution is called CUDA (Compute Unified Device Architecture)

while that of ATi is called CTM (Close To the Metal). CUDA is actually a C-like

compiler for GPU applications. The engine provides a thread controller

architecture that takes care of thread management for applications that stream

through the GPU. CTM on the other hand, is more a hardware driver that exposes

API that GPU applications can call to achieve similar end results.

Applications that want to harness the power of stream computing through the

GPU must use CTM or CUDA to route their code through the GPU. CTM or CUDA will

perform the arbitration of which code to route to the GPU and which to the

regular CPU.

Sadly there is no vendor-agnostic way to let applications run seamlessly

regardless of which of the two vendors supplied your graphics card. For now,

early adopters will have to be contended with building uni-vendor stream

computing infrastructure which may not be the best-of-breed or the most cost

effective solution around. Clearly, much work needs to be done and it is very

early days yet to put this down as a success or write it off as too complicated.



Applications of Stream Computing

The technology behind stream computing can be

used in a variety of applications where you have a large pool of data, each

of which requires the same set of instructions, independent of each other.

In developer parlance, we can put this as-stream computing is best suited

for massively multi-threaded applications, with parallel threads.

Application areas include: protein folding analysis in biotechnology;

seismic analysis for oil and gas exploration; signal processing for defense

services; various simulations and forecasting model processing in the

financial industry; face recognition and speech recognition. Folding@Home, a

distributed stream computing paradigm from AMD/ATi has been around since the

2000s, using the idle time on traditional CPUs to crunch the way through the

complicated problem of analyzing protein folding patterns and effects. In

September 2006, the project switched to using stream computing cores in GPUs

to become faster. At present the projects are helping the medical world to

understand Alzheimer's Disease, Cancer, Huntington's Disease, Osteogenesis

Imperfecta, Parkinson's Disease and the relationships between various

ribosomes and antibiotics.

Advertisment