Trends Watch

High Performance Computing

PCQ Bureau

06 Dec 2006 17:50 IST

New Update

High Performance Computing is something which is always linked with Rocket

Sciences. Normally we tend to think that this type of technology is only used

for fields such as Weather forecasting, Genome mapping, Simulating Chemical

Reactions, etc. But that's not the true picture of today. Not going too far,

if you just pick up our last six or seven issues, you will see that we have

talked about around half a dozen ways to use an HPC in an Enterprise. Today HPC

is no longer only for research labs and universities but it has become a key

enabler for your business applications.

Advertisment

Still not convinced? Ok, the let's talk figures. According to the current

report of top500.org, a site that ranks top 500 supercomputers across the globe,

among the top most 500 Supercomputer across the globe 57% are been used in

different type of industries and an aggregate of those in academics and research

is just 40.2%. That is, now as the prices of hardware and the software are more

within reach and more and more technologies are emerging to make the setup,

management and application easier, HPC is entering all segments of industries.

HPC is providing the required edge to the enterprises to do more than others in

lesser time. There are quite a few recent and not-so-recent technologies, which

are responsible for this happening. In this article, we look at some of them.

So, to begin with let's talk about the technologies, which are involved in the

architecture of an HPC. The most common architectures are clusters and MPP.

Massively Parallel Processing

Massively Parallel Processing (MPP), is similar to SMP or Symmetric Processing,
that we see today in a normal multiprocessor server. Even the hyperthreading

processors are also a form of SMP. In both cases we have tightly bound

processing units. This means the interconnect is more sophisticated and in most

cases it's internal. But the difference between the two is that in SMP systems

all CPUs share the same memory while in MPP systems, each CPU has its own

memory. This implies that the application must be divided in a way that all

executing segments can communicate with each other. Hence, MPP systems are more

difficult to program. But because of this architecture, MPP systems don't have

the same bottleneck problems as are present in SMP systems, when all the CPUs

attempt to access the same memory at the same time.

Today, MPP is used in the core of a majority of high-end supercomputers. But
there's a catch. You can't build an MPP-based HPC with a commodity PC. For

doing so, you need to go to vendors such as IBM or Cray. And obviously because

of this reason, the cost involved is pretty high. But the benefits are in the

reduction of bottlenecks caused by the interconnect and a less complex

architecture in terms of manageability.

Advertisment

Clustering

The other technology is clustering, or rather high performance clustering. The
major difference between MPP and Clustering is that in case of clustering we

have loosely bound processing units which are referred to as nodes, and the

interconnect is mostly external, such as a standard high speed LAN, a Myrinet or

an InfiniBand. We will discuss these interconnect technologies later. A good

thing about such an HPC is that it can be built by commodity hardware and

networking equipment, and you can build one with very less investment. There are

plenty of software, applications and middleware available to build such an HPC.

Clustered HPCs are divided into SSI and PVM based Clusters. We have discussed

them quite a few times in our previous articles. Here's a recap.

P-Grid

Imagine a heterogeneous network, with 10 computers per branch in 10

branches. Some of these computers have Windows Installed whereas some have

Linux and Solaris. Some of them are P4s and some are AMD 64-bits. And you

have to render multiple huge 3D files. Such a requirement will obviously

require a huge amount of processing power. All your machines are connected

over Internet and you have a cluster that has around 100 to 150 GFLOPS of

speed.

Remember SETI@Home-a free screensaver distributed over the Internet. And
anyone downloading the screensaver becomes a part of a grid and shares the

huge workload of processing data captured by the SETI Radio telescopes.

The concept of P-Grid is also not too different from that.

P-Grid is essentially a cluster that runs over P2P connections. Both

the data transfer and the CPU cycle migration are done over P2P.

Currently, the framework being used runs on Gnutella network. Although,

there is no full-fledged application available which can leverage such a

concept, you can use an application called GPU (downloadable from http://gpu.sf.net).

This application is still an alpha and can only run some test applications

such as Image rendering, Net Crawling, etc. To install it, download the

EXE file (if you are a Windows user) and run it. After it is installed,

run the GPU from the Start menu. Now, to use it, you can either connect to

any public P-Grid or can create one in your own network (the default

setting is set for connecting it to the Internet). To do so, click on

Connect. To restrict it from connecting to the Internet, go to the Edit

menu, select Setting and scroll to 'Local Cluster Management'. In the

window that opens, select 'Connect to the local subnet only'. Now you

can run the applications, which are provided with the software by going to

its main window and selecting it to from the top right corner. You can

even see the status of the cluster by going to the '3D network mapper'

option. Here it will show you 3D map of the network and list the machines

and their type which are connected to the network. You can see the Cluster's

processing power and RAM at the center of window.

Advertisment

SSI based clusters

Single System Image (SSI) is a clustering technology that can use a number of
nodes (say 'n') on a network and make them work like a single virtual

machine with 'n' processors. SSI doesn't require any modification to your

applications to run them on new virtual machine. But because of this, there are

some drawbacks too. SSI works very well when you run many tasks simultaneously

on the virtual machine, for instance, converting hundreds of media files from

one format to another. In such a case, the SSI cluster will migrate all tasks

evenly to all machines available in the cluster and complete the job

significantly faster than if it were on a single machine. Alternately, if you

deploy a single job or thread which requires a large amount of number crunching,

the SSI cluster will not give you any performance improvement. This is because

it cannot divide a single task into multiple threads and spread them across the

nodes of the cluster. An example of clustering middleware for SSI is OpenMosix.

With SSI based clusters, you can deploy any standard (Linux) application on

the cluster, without any modification to the application. So, for enterprises

that want to migrate an existing application (mainly batch processing

applications) on to a cluster, but at the same time don't want to invest in

re-creating their applications with the PVM/MPI library, or are using a

third-party application where they don't have access to the code, SSI based

clusters are the best solution.

PVM clusters

Parallel Virtual Machine (PVM) is the other type of clustering technology. It's
different from SSI in the way that here you need to recompile or build the

application which you want to run on this cluster with PVM/MPI support. This

means that you cannot run any ordinary existing application as is on this

cluster. The commonly used clustering middleware is OSCAR.

Advertisment

If you are running a single application which needs a huge number crunching

capability on a PVM cluster, the same application will automatically take care

of thread management and job migration between nodes. Scientific applications

for one are best suited for PVM clusters. For instance, genome mapping, data

modeling and forecasting jobs are best run on PVM.

MS Compute Cluster Server

One of the biggest news of this year has been the software giant Microsoft
finally getting into the HPC business. As their first initiative, they have

released the first Beta version of MS CCS in March 2006, and the first general

public release was done in August this year.

The MS Compute Cluster pack provides support for MPI2 libraries. It also

contains an integrated job scheduler and cluster resource management tools. MPI

is a standard API and specification for message passing. It has been designed

specifically for high-performance computing scenarios executed on large computer

systems or on clustered commodity computers.

Advertisment

It uses the MS MPI, an MPI version of the Argonne National Labs Open Source

MPI2 implementation, is widely used by the existing HPC clusters. MS MPI is

compatible with the MPICH2 Reference Implementation and other MPI

implementations and supports a full-featured API with more than 160 function

calls. The MS Visual Studio 2005 also includes support for developing HPC

applications such as parallel compiling. And this is sure to become a plus point

for Microsoft's Compute Cluster initiative, as now developers will get a

familiar interface for developing HPC programs and subsequently deploy and run

them on familiar environments.

High Speed Commodity Network

A commodity technology, which is being used in more than 50% of the top 500 HPCs
across the world, is Gigabit Ethernet. Yes, that's what it is--normal Gbps

LAN. Nowadays it is also used for Cluster Interconnect. The devices required for

such a topology are standard gigabit switches/routers and CAT5 enhanced UTP

cables.

Being a technology that can work with commodity product, it has become the

most common interconnect for small or mid-sized HPC systems in very recent past.

Not only that, it is being made use of in large deployments. The cost of

deploying such interconnect is very easy on pocket and it can actually work on

your existing infrastructure with very minimal or no modification. But there are

some drawbacks too. These include relatively high latency and the absence of a

built-in QoS or HA in the hardware.

Advertisment