High performance computing (HPC) is invariably linked to Rocket Sciences.
Normally, we think that this technology is used only for fields such as weather
forecasting, genome mapping, simulating chemical reactions, etc. But that's
not the complete picture. Well, not taking the argument too far, just pick up
our last six or seven issues where we have talked about half a dozen different
ways to use HPC in an Enterprise. Today HPC is no longer meant for research labs
and universities only, but it has become a key enabler for business
applications.
As per top500.org's latest report, 51% of the top 500 supercomputers worlwide are being used across different industry verticals |
Still not convinced? Let's do some number crunching. There is a website
called www.top500. org which is responsible for deciding and ranking top 500
supercomputers across the globe. It conducts a survey every six months. As per
its latest report, amongst the top 500 supercomputers across the globe, 51% have
been used across different verticals, while the combined aggregate of those used
in academics and research comes to around 41%. The point we are trying to make
here is that with declining hardware and software costs, and with a choice of
more than one technology, setup, management and application of HPC have become
easier. It is gradually entering each and every vertical. Be it banking or
finance, image processing/rendering or gaming and entertainment, automobiles or
medical sciences, HPC is providing the much needed competitive edge to
enterprises to do more in lesser time. In this article we take you through some
of the latest technologies in this field.
HPC architectures
To begin with, let's talk about the HPC technology architecture. The most
common architectures used today are MPP and Clusters.
MPP: MPP or massively parallel processing, is pretty much similar to
SMP or Symmetric Processing, that we see today in a normal multiprocessor
server. Even the hyperthreading processors are also a form of SMP. In both cases
we have tightly bound processing units. This means the interconnect is more
sophisticated and in most cases it's internal. The difference between the two
is that in SMP systems all CPUs share the same memory while in MPP systems, each
CPU has its own memory. This implies that the application must be divided in
such a way that all executing segments can communicate with each other. Hence,
MPP systems are difficult to program. But because of this architecture, MPP
systems don't have the bottleneck problems as are present in SMP systems, where
all the CPUs attempt to access the same memory at the same time.
Today, MPP is used in the core of a majority of high-end supercomputers. But
there's a catch. You can't build an MPP-based HPC with a commodity PC. For
doing so, you need to go to vendors such as IBM or Cray. And because of this,
the cost involved is pretty high. But the benefits are in the reduction of
bottlenecks caused by the interconnect and a less complex architecture in terms
of manageability.
HPC solution providers |
Cray : www.cray.com IBM : www.ibm.com/servers/deepcomputing Intel : www.intel.com/go/hpc IS : www.interactivesupercomputing.com NEC : www.hpce.nec.com SGI : www.sgi.com/products/servers SUN : www.sun.com/servers/hpc/index.jsp |
Clustering: The other technology is clustering, or rather high
performance clustering. The major difference between MPP and Clustering is that
in clustering we have loosely bound processing units which are referred to as
nodes, and the interconnect is mostly external, such as a standard high speed
LAN, a Myrinet or an InfiniBand. We will discuss these interconnect technologies
later. A good thing about such an HPC is that it can be built on commodity
hardware and networking equipment, which brings down the cost. There are plenty
of software, applications and middleware available to build such an HPC.
Clustered HPCs are divided into SSI and PVM based Clusters. We have discussed
them quite extensively in our previous articles. Here's a quick recap.
1. SSI based clusters: Single System Image (SSI) is a clustering
technology that can make the nodes on a network work like a single virtual
machine with multiple processors. The best thing about SSI is that for running
on the new virtual machine, it doesn't require any modification to your
application. However, because of this, there are certain drawbacks as well. SSI
works very well when you run many tasks simultaneously on the virtual machine,
for instance, converting hundreds of media files from one format to another. In
such a situation, the SSI cluster will migrate all tasks evenly to all machines
available in the cluster and complete the job significantly faster than if it
were on a single machine.
On the other hand, if you deploy a single job or thread which requires a
large amount of number crunching, the SSI cluster will not give you any
performance improvement. This is because it can not divide a single task into
multiple threads and spread them across the nodes of the cluster. One example of
clustering middleware for SSI is OpenMosix. The charm of SSI based clusters is
that you can deploy any standard (Linux) application on the cluster, without any
modification to the application. So, for enterprises that want to migrate an
existing application (mainly batch processing applications) on to a cluster, but
at the same time don't want to invest in re-creating their applications with the
PVM/MPI support or are using a third-party application where they don't have
access to the code, SSI based clusters are the best solution.
2. PVM clusters: Parallel Virtual Machine (PVM) is the other
clustering technology. It's different from SSI as here you need to recompile
or build the application which you want to run on this cluster with PVM/MPI
support. This means that you cannot run any existing application, without
modification, on this cluster.
The commonly used clustering middleware is OSCAR. A major benefit of using
such a cluster is that if you are running a single application which needs huge
number crunching capability on a PVM cluster, then the same application will
automatically take care of thread management and job migration between nodes.
What would you use a PVM cluster for? Scientific applications for one are
best suited for PVM clusters. If you want to build a cluster which can do genome
mapping, for example, then PVM is the best choice. Similarly, data modeling and
forecasting jobs are also best run on such a cluster.
HPC at work |
|
We tested an SSI framework based cluster. For this, we built an 18-node OpenMosix cluster and compared it against a standard dual Xeon 2.4 GHz processor-based server with 1GB RAM. The cost of this server was nearly equal to the cost of our cluster. We compared both using two different tests. The results we got were really exciting. |
|
Here, the server is fully loaded, while the cluster is under 10% load only (simultaneously converting 75 WAV files to OGG). It took about 50% less time on the cluster |
The cluster and server both are fully loaded with the same load (zipping and taring 55 MB files in batches of 100, 150 and so on upto 300). The cluster gave 6 times better performance |
Interconnects
After architecture, the next most important thing in an HPC is the interconnect.
Generally, if you choose an MPP based architecture then you don't need to
bother about the interconnect, as it's already there in the system. But if you're
going for a cluster based approach then you have to decide about the right
interconnect to use. In the following portion, we discuss some of the key
technologies involved in loosely attaching interconnects.
Myrinet: Myrinet is a high-speed LAN system, designed by Myricom. It is
designed to be used as an interconnect amongst multiple machines, to form
computer clusters. One of the benefits of using Myrinet is that it has much less
protocol overhead than standard interconnects such as Ethernet. As a result, it
provides better throughput, less interference and latency. This is also one of
the most popular interconnect techniques for clusters.
A standard Myrinet consists of two fiber optic cables (one for upstream and the
other for downstream) per node, switches and a router with low overhead. A
fourth generation Myrinet can give a speed of 10 Gbps. But this is not the only
reason for its popularity. The other benefit that you can get is very low
latency when compared to a normal LAN. And this low latency is achieved by a
technique in which the application that is running on the cluster is aware of
the NIC's firmware and can bypass the OS by sending messages directly to the
network. Some other key features of Myrinet are heartbeat, flow control and
error control in each link.
InfiniBand: InfiniBand is a point-to-point bi-directional serial link
used for connection of processors with high speed peripherals such as disks. It
supports several signaling rates. Initially, InfiniBand technology was used for
connecting servers with remote storage and networking devices, and other
servers. But later it was to be used inside servers for inter-processor
communication (IPC) in parallel clusters. The serial connection's signaling rate
is 2.5 Gbit/s in each direction per connection. InfiniBand supports double and
quad data speeds-5 and 10 Gbit/s respectively.
Links can also be aggregated in units of 4 or 12, called 4x or 12x. A
quad-rate 12x link can carry 120 Gbit/s raw or 96 Gbit/s of useful data. Other
benefits include greater performance, lower latency, easier and faster sharing
of data, built-in security and quality of service, improved usability (the new
form factor will be far easier to add/remove/upgrade than today's shared-bus I/O
cards). But again this is not a commodity product and to deploy such a setup you
need to hire specialists.
Gigabit LAN : Now, this technology is known to everyone. Yes, it is
the standard Gigabit Ethernet connection which is used in standard LANs. It is
also used as a cluster interconnect. The devices that will be required for such
kind of topology are standard Gigabit switches/routers and CAT5 enhanced UTP
cables.
Being a technology that can work with a commodity product, it is one of the
most common interconnect for small or mid-sized HPC systems. The cost of
deploying such an interconnect is very low and it can actually work on your
existing infrastructure with minimal or no modification.
But as compared to other inter-connects, it has drawbacks such as a relatively
high latency and a lack of QoS or HA built into the hardware.
Final verdict
Broadly speaking, we have two options before going for an HPC deployment. It
could either be a specialized deployment or it can be made up of commodity
hardware, software and interconnects. Now the decision is completely yours. And
it depends on the type of work you want to do.
If you want to run common applications on top of a cluster, an SSI based
commodity cluster will be fine for you. In case you have a substantial amount of
unutilized processing power on your network, then also a commodity cluster will
do.
But if you need to run some specially designed apps (most likely a single job
which requires a huge amount of processing power) with hardware level Failsafe
and rapid scalability, and in case you don't have the in-house expertise, then
you should approach a vendor to do the deployment for you.
Setting up a commodity cluster |
||||
How much does it cost to set up a high performance cluster? The answer depends on the number of nodes you want to deploy. Here is what it cost us to deploy a 20 node cluster: | ||||
Item | Configuration | Number | Unit Cost (Rs) | Total |
Nodes | P4, 2.4GHz, 40 GB HDD 256MB RAM and CD Drive | 20 | 12,000 | 240,000 |
Switch | 24 port, gigabit | 1 | 25,000 | 25,000 |
Monitors | 14” color | 1 | 4,500 | 4,500 |
Keyboard | 101 Standard | 2 | 200 | 400 |
Sub Total |
269,900 | |||
Option I | Low Cost | |||
Angel Rack | 2 | 2500 | 5000 | |
Power strips - 15 amp | 5 | 150 | 750 | |
Ethernet cabling - 50 m | 1 | 500 | 500 | |
Sub Total |
6250 | |||
Option II | High Cost | |||
Server Racks Installed | 2 | 30000 | 60000 | |
The cost does not include cooling and power solutions. Also, depending on the make of the rack used, your costs could go up by another half a lakh or so for this setup. One monitor is always connected to the cluster manager machine while the other one is used for troubleshooting. To keep costs down, we did not use a KVM switch. What we did instead was to use Rdesktop and SSH on Linux (Rdesktop for Linux to Windows and SSH for Linux to Linux) for desktop sharing. We used the Remote desktop client on Windows for Windows to Windows and Putty for Windows to Linux management. Doing away with the KVM switch, however, caused a few trips to the cluster to physically connect the monitor and keyboard for troubleshooting. |