Advertisment

GPUs Gaining on Power

author-image
PCQ Bureau
New Update

In this age of supercomputers and parallel computing, lot of software

developers and researchers who do computationally intense work might face some

difficulty in exploring the full potential of their hardware. But with several

advancements in technologies things are changing at a rapid pace. With the

paradigm shifting towards parallel computing, the focus is on getting difficult

scientific & engineering problems solved by dividing them into smaller ones.

Parallel computing has been considered to be the high end of computing. The HPC

market has recently seen a surge in interest with an emphasis on modern GPUs.

Most enterprises are developing & working on new techniques that can offer

tremendous potential for performance & efficiency. So the focus is shifting

towards the concept of parallel computing & one must be able to adapt to the

evolving capabilities exhibited by GPUs. Slowly & steadily the performance of

general purpose processor is changing at a rapid pace offering more of a

flexible architecture. With the advent of advanced GPUs developers have started

to explore more applications, beyond rendering in 3D apps. In this article we

will be looking at how GPUs have evolved and how NVIDIA and ATi are taking the

GPU computing to the next level.

Advertisment

Supercomputing capabilities



Refreshing our memories, Central Processing Unit has been known as the brain

of the PC. It is that part where all the logical work is done. Its job role

includes executing the sequence of instructions in a sequential manner. Earlier,

CPUs were delegated to perform all the basic functions & handle the complex

computations. Also, the CPU did all the traditional graphics processing before

GPUs were included within display adapters. CPUs are designed keeping in mind

the different goals that include high performance & throughput levels of

single/multiple threads. Other factors also include less power consumption,

lower cost for the same performance level. A CPU is composed of only few cores

which are designed to execute various instructions. Originally, GPUs were

hardware blocks optimized for graphics use, but with the advent of technology

they became more flexible, more programmable. The Graphics processing unit also

known as the Visual processing unit (VPU) goes just beyond basic graphic

controller. A GPU is a programmable high end computational device that can be

harnessed more broadly for crunching difficult tasks at a greater pace.

Caching, architectural differences



Cache is nothing but a small, faster memory that acts as a repeated memory

storage for the data that a CPU requires. Cache memories increase the

performance of the CPU owing to reduction in memory access latencies. Whereas in

case of GPUs, cache is used to increase the memory bandwidth. With the help of

large caches, it is possible to reduce the memory access latencies in CPUs.

Talking about the GPUs, the interesting point is that GPUs are able to execute

numerous threads simultaneously thus reducing the memory access latency. With

this ability of GPUs they can accelerate the same application by numerous times

over a CPU. Also, if you take a look inside a GPU there are special processors

(cores) called 'vertex' & 'fragment' shaders. The basic functionality of a

vertex shader is to add special effects in a 3D environment to different

objects. A vertex is location dependent and is defined by x, y & z coordinates

in a 3D environment. A fragment shader generates a lightning effect on the

pixels. A GPU receives a set of polygons that performs all necessary functions &

then outputs pixels. As they can be processed in parallel, unlike sequential

instruction thread for CPU they use a lot of execution units. The GPUs are

optimized for high performance of a single command thread processing integer &

floating point numbers. There's a lot of difference in the way GPUs and CPUs

access memory. Memory operations are also different in GPUs & CPUs. GPUs contain

several memory controllers while some of the CPUs have inbuilt controllers.

Higher bandwidth which is relevant for parallel computations is also easily

available to graphics cards. Taking in consideration the multi-threaded

operations, CPUs use SIMD vector units, while GPUs for scalar thread processing

use SIMT. A fine example in which GPU computing is perfectly adopted is the

molecular modeling that requires high processing power.

NVIDIA's CUDA



CUDA developed by Nvidia is a parallel computing architecture that increases the
computing performance. This computing engine in Nvidia GPU enables software

developers, researchers & scientists to perform their complex computational

tasks. Software developers use the 'C ' programming language to program the CUDA

architecture. The GPUs that have CUDA architecture consist of cores that enable

hundreds of computing threads to run collectively. CUDA has been widely used in

the area of scientific research & programming. Some of the key areas are the

areas of fluid dynamics simulation, computational biology & chemistry, ray

tracing, etc. NVIDIA supports CUDA with the following GPUs: GeForce, Ion, Tesla

& Quadro GPUs.

Advertisment

CUDA supports heterogeneous computation wherein serial portion of

applications run on the CPU & parallel portions on the GPU. Thus in this manner,

both CPU & GPU capabilities are utilized. The configuration is designed keeping

in mind that both have their own memory space. Thus these are treated as

separate devices & allow simultaneous computation on both.

Fermi: GPU computing architecture



Fermi, incorporating billions of transistors and featuring upto 512 CUDA

cores is the latest buzz. Fermi incorporates new features & support for Error

correcting code (ECC) memory. Apart from ECC, it also enhances the floating

point performance with the addition of 512 cores. It also supports the GDDR5

memory with an increased memory reach of upto one terabyte. Fermi is available

with a Visual Studio development environment and supports a number of areas. The

areas defined are in ray tracing, physics, sorting & search algorithms, finite

element analysis & more. The innovations that Fermi features includes the

NVIDIA's Parallel data cache & Gigathread engine, 512 cores & ECC support. The

GPU hardware in Fermi multiprocessor consists of cores (stream processors) with

two groups of sixteen microprocessors. In case of GPU, each core can execute

single threads in a sequential manner, while the cores in Fermi execute in a

Single Instruction Multiple Thread fashion. Shared among the cores, attached to

each multiprocessors there is a small software managed data cache which NVIDIA

has termed as 'Shared memory'. The indexable memory runs at registered speed &

is a low latency & high bandwidth memory. This 64KB shared memory on Fermi,

offers flexibility that can be easily configured in two ways. One way can be

configuring it as a 48 KB software-managed data cache with 16KB hardware cache

or 16KB software managed data cache along with 48 KB hardware cache.

NVIDIA's Gigathread Engine: This is the new technology designed by NVIDIA

that allows multiple threads to execute in parallel. The Gigathread engine also

supports a bi-directional data transfer engine & also includes high kernel

execution engine.

Advertisment

NVIDIA's Parallel data cache: Fermi, supports the cache hierarchy with L1 and

L2 caches. The L1 cache in Nvidia's Parallel data cache improves the bandwidth

, reduces the latency for GPUs. While the L2 cache, improves the coherent data

sharing.

ATI's Stream Technology



ATI Fire Stream (or AMD Stream Processor) technology harnesses the power of

the AMD graphic processor working in tandem with the system's central processor

to speed up many applications beyond graphics. Stream allows many parallel

stream cores inside AMD graphic processors to accelerate general purpose

applications. This allows Stream enabled programs to be optimized or have extra

features. ATI stream uses parallel computing architecture that takes advantage

of the graphic card's stream processors to compute and execute applications or

tasks that can be broken down into parallel as well as identical operations and

run at the same time on a single processor. The main advantage is that Stream

uses SIMD (Single Instruction, Multiple Data) whereas a CPU uses a modified SISD

(Single Instruction, Single Data Stream).



Next-Innovations in Servers

Advertisment