Computation has changed drastically since the days of the first computer. In
the 60s and 70s, mainframes took charge of all processing and computation for
government, scientific and organizational needs. Thereafter, we saw the advent
of desktops or 'Micro Computers.' Almost parallelly, the concepts of networking
started to develop. And it didn't take long thereafter when grids and clusters
were implemented. In this article, we look into the concept of the computation
extremes achieved taking clusters a step further. Yes, we are talking about the
still in infancy yet very promising Computation Grid. Read on to find out what
it is, how it works, and most importantly which way it is heading.
What is a Grid?
Well its name and concept is derived from the electric power grid. To put it
shortly a grid is the way to share computational power and data storage over the
Internet. Just like the electric grid you don't have to worry where are you
receiving power from. Basically, the computational grid brings all the resources
under it into one entity. This collection of resources can then be used for high
end computation and with the storage of the participating systems combined,
provide an infinite but cheap storage option. While some might define it as a
'collection of clusters' or other definitions, we would like to stick to the
definition we gave a little while ago without giving any specific structural
example.
Now let us get down to a more elaborate definition. Grid computing can best
be defined as a form of distributed computing that works by sharing computing,
application, data, storage, or network resources across dynamic and
geographically dispersed organizations or computers. This is the reason we say
that a collection of clusters is not an appropriate definition. Clusters don't
work by bringing together systems or computers located geographically apart. We
will get down to differences between grids and clusters in detail a little
later.
Grid technologies promise to change the way organizations tackle complex
computational problems. However, the vision of large scale resource sharing is
not yet a reality in many areas-grid computing is an evolving area of computing,
where standards and technology are still being developed to enable this new
technology.
Need for a grid
Science has advanced by leaps and bounds and has grown more dependent on
computational power for research and analysis. While a powerful machine was
enough to analyze or compute whatever data, say a Pharma researcher had a decade
ago; things have changed a lot. Specifically in areas such as medical research,
nuclear physics, molecular studies, etc. For example, the amount of data that
scientists download from satellite monitoring activities in outer layers of
atmosphere goes up to approx 200 GB daily. Now you might realize the kind of
giant processing power you would need to consume data recorded over say a week
and perform computations on it. It has to be huge and powerful. This is one of
the reasons scientists demanded a system powerful enough and with near infinite
storage that could easily perform computation on the kind of data they
accumulate. It is scenarios like this which lead to the need for Computational
Grid. Rest as they say is history.
Grid architecture
Much like the Electric Grid from where the idea of Computational Grid came, the
architecture is a layered one. Thus we have grid applications as the top most
layer that might be scientific, engineering, and commercial or even web portals.
The next layer is that of the grid environment and tools. This layer provides
the libraries, runtime interfaces, even compilers and most importantly
parallelization tools. Next comes the layer which is rather a vendor specific
implementation, the Grid Middleware. This layer is in-charge of all the resource
management, scheduling services, job submission, storage access, and info
services across the entire grid. The middleware can further be segregated as a
layer comprising two sub layers. Some conceptualize two different layers. The
User-level middleware which takes care of the first two of all the tasks we
mentioned for middleware. The second one, Core Grid Middleware that handles the
latter four. Now since the grid will be using Internet as the communication,
computation and in-fact storage infrastructure and will be communicating or
connecting to clusters/grids across geographies; a Security Layer becomes
indispensible. Also referred to as the Security infrastructure, this layer
provides authentication and secure communication. The bottom most layer is the
'Grid Fabric' which is nothing but the existing 'network of networks' and its
components, clusters running on various OS, storage devices, databases and even
specific devices such as sensors.
Grid Architecture |
Grid application Science, engineering, commercial applications, Web portals |
Grid programming environments and tools Languages, interfaces, libraries, compliers, parallelization tools |
User-level middleware—resource aggregators Resource management and scheduling services |
Core grid middleware Job submission, storage access, info services, trading accounting |
Security infrastructure Single sign-on, authentication, secure communication |
Grid fabric PCs, workstations, clusters, networks, software, database, devices |
How it works
At the heart of the Grid is what we call the broker. We can describe the working
of the Grid at a rather abstract level as follows. Once a job is submitted for
operation in a Grid, the broker discovers resources that the user can access
through 'Grid Information Servers.' It then negotiates with grid-enabled
resources or their 'Agents' using middleware or middleware services, maps these
to the resources (also known as scheduling in Grid context) and then stages the
data for processing or application to be run. This last step is referred to as
'Deployment' in Grid context. The broker finally collects results. It monitors
the application's execution progress also. It also takes care of changes in the
Grid structure and resource failures.
In a grid environment, we have a loosely coupled architecture of systems
connected majorly over a Wide Area Network or an Internet. The job is more or
less the same as is done by a Computational Cluster, which is to harness
resources of multiple ideal machines. But in case of a Grid it's not necessary
that it will only leverage the processing power of all the machines. You can
instead create a Data Grid which actually creates and manages distributed data
storage and is also called a Grid.
The other key feature of a Grid which actually differentiates it from a
Cluster is its de-centralized model, where you generally don't have a controller
in place and each and every node works independently. In this case the nodes can
also be heterogeneous in terms of Operating Systems hardware architecture.
One example of grid computing is the infamous SETI@home project to search for
extraterrestrial intelligence. There is a centralized telescope which captures
radio signals from space and then transfers the data captured in small packets
to several million computers connected to the Internet. The nodes then process
these packets of data in their idle time and return the results back to a data
center. This way high processing power is
obtained, utilizing the idle time of several computers spanning across the
globe.
In this example you can clearly see that the architecture is completely
de-centralized and loosely coupled. And is also very highly heterogeneous
because over the Internet one can't control which OS or architecture will a node
be using.
Clusters on the other hand use a single server or controller to manage and
distribute/aggregate the processes and one or more client nodes connected via a
tightly coupled environment such as a high speed LAN or some specialized high
speed interconnect such as Myranet, etc. But, unlike grid computing, where each
client computer can run its own OS, this one is controlled and managed by a
single OS running across the computers in the cluster, making it highly
homogeneous in nature. The server provides various files to clients for
execution. Applications are run on clients using parallel processing algorithms.
The clients are just dumb terminals, with no display in most of the cases or
input devices connected to them. The server is the single interface for the
entire system, where all input and output takes place. To the user the entire
setup appears as a single system. These formations of clusters are commonly
known as SSI or Single System Image.
Beowulf clusters, which are built from commodity 'off the shelf' computer
parts running free OSes like Linux, are an example of such a kind of cluster.
They provide very cost-effective parallel processing.
Enterprises have their own complex applications and huge repositories of data
which also require high if not mammoth (as is the case with scientific data)
computational power to analyze. And not surprisingly, vendors like Sun
Microsystems, Oracle, Fujitsu, and Informatica as well as others have started
utilizing and implementing grid based solutions to tackle diverse issues. For
example, Sun and Informatica are providing grid computing based solutions for
data centric needs of organizations. They also provide data integration using a
grid. By using a grid for data centric needs brings with it major advantages
such as high availability, automatic recovery, adaptive load balancing where-in
load balancing works on the basis of situation at hand, and also sessions on
Grid. Similarly Oracle's grid implementations cover a wide range of services for
the enterprise.
The most interesting one from these is the grid solution for SOA runtime
governance and SOA infrastructure monitoring. Now this is really interesting
because as you would know and as we have gone on record saying that SOA
implementations more often than not bring together a variety of systems,
components, and applications under one roof. Implementing a grid control for SOA
runtime governance would make runtime recording of service requests, monitoring
the complex process flows and similar tasks easier and more manageable due to
the high grade computation power that grid provides. Other than this their grid
solution also supports identity management, and the other wise cumbersome task
of application server cluster deployment.
With the grid making a steady progress into enterprises, for primarily
smoothing out management or deployment of very large implementations, these
technologies can surely address a lot more pain areas if carefully matured over
time. After all, who would not want their processes, analytics or even data
needs to be not limited by computational power or storage considerations.
Let's now consider some of the latest trends in this sphere.
P2P Grid: One of the newest technologies in grid computing is P2P
grids. We talked about it in the December issue of PCQuest in detail and also
showed how to implement it. P-Grid is essentially a grid that runs over P2P
connections. Both the data transfer and the CPU cycle migration are done over
P2P. Currently, the framework being used runs on Gnutella network.
Between Nov 97 and Feb 06, PrimeNet Grid has handled 11,579,649,914 P90 machine-hours. Its throughput rate can be characterized by a fitted, exponential trend line |
Although, there is no full-fledged application available which can leverage
such a concept, you can use an application called GPU (downloadable from http://gpu.sf.
net). This application is still an alpha and can only run some test applications
such as Image Rendering, Net Crawling, etc.
But imagine what will happen when this technology matures. Any one with a
machine and an Internet connection can become a part of a public Grid and share
processing power the same way as we share MP3 and music files today. So, in that
case we will truly be able to achieve Internet computing or rather Internet
Super Computing.
Grid management: You must have heard about many types of grids and
clusters and read about them in PCQuest, such as heterogeneous Grid Platform
called Condor, or Globus or some simple clustering middleware such as SSI-based
like OpenMosix and MPI-based ones like Oscar and Flash Mob etc. If you search
over the Net, you will find there are quite a few different kinds of grid
products available. Some have a graphical front end to monitor the nodes and
some even don't have one. Let's take a classic example, OpenMosix.
In a matter of 5 mins, we were able to connect to a P2P grid with 10 GB of RAM and 10 GFlops of processing power using GPU |
This one has a graphical monitoring application called OpenMosixView, but
have you ever noticed that if the number grows to something around a hundred
nodes, then how difficult it becomes to monitor? Plus, it only shows you the
current RAM and CPU utilization of the nodes. What about the disk usage? Or if
in case, you want to see what the CPU utilization was in the last one hour or
day, then?
These are things which are very difficult to monitor in case of large grids
or clusters. To make things worse, let's say you have multiple grids, one based
on Condor and the other one on Globas. Another one could just be a cluster using
Oscar or ROCKS with MPI support. And you want to monitor both of them from one
place. Then, what will you do?
Let's take a case of a cluster or a grid with hundreds and thousands of nodes
over a wide geographical distribution. Managing them all from one place can be
really difficult. So, this is one area that is picking up on the Grid technology
front. The most common and popular tool out there which solves this purpose is
Ganglia. We have talked about this in detail in our June 2006 issue. And
this is the one being used by most of the biggies using Grid technologies
such as NASA, CRAY, SUN, Boeing, US Air Force and Microsoft.
Glossary |
Cluster Interconnect: A very high speed connection allowing computers in a cluster to interconnect. Enterprise Grid Alliance: A vendor-neutral, open and independent organization that works as a consortium for focusing on obstacles enterprises face in grid implementations, and promoting open and interoperable solutions for problems. Enterprise N1: Sun's architecture for next-generation data-center that makes Utility Computing: 'PAY-AS-YOU-GO' model of computing analogous Utility Data-Center: An infrastructure solution proposed by HP |