by April 14, 2006 0 comments

Through the years, computing has evolved at breakneck
speed. At the forefront of that evolution have been the processors. They have
been the driving force behind all innovation. As the clock speeds rise, the
speeds increase and all other components have to be revamped to keep up with the
processor. This has generally led to a far better overall computing experience.

The speed has obviously meant that we have reached the
limits of single-core processors. As Netburst has no doubt taught everyone,
making the core clocked too high will give you some pretty serious burns and if
you are in the server market, that’s never a good thing.

However, going by the current trend as showcased at events
like IDF and CeBIT, it is clear that the next bet from these companies is not on
dual cores but on multi-cores. Everything that we have seen (from roadmaps to
demonstrations) has got us rubbing our hands with glee in anticipation.

While speeds are important, an interesting shift that we
are witnessing is that finally, manufacturers (some early, some like Intel later
on) have realized that clocking the core higher is not the only way.

Future development is, thus, focused more and more on
overall system performance. From removing bottlenecks in the bus speeds to
supporting higher clocked RAMs, its all coming together to give you 

performance like never before.

Even Intel has come down from its high horse of high clock
speeds and is getting serious about performance per watt. Indeed, in the past
couple of IDF sessions, performance/watt has been its mantra. Other companies
realized this quite some time back and the rival Opterons and UltraSPARCs have
been running cooler and more efficiently already.

Number crunching
With your business growing, you obviously need to cram ever more processing
power into as small an area as possible.  The space constraint obviously
gets in the way of heat management but today it’s possible to get some heavy
duty power in relatively smaller packages. Currently, we know them as dual core
processors but that is really the first stage, a stepping stone, as we make a
true transition to multi-core multi-processor systems which can give you four
times or more the current

performance levels with an equal, if not smaller footprint.

Cell Processor

This is perhaps the
most promising of technologies that comes in a grossly under-rated
package. The Cell Processor from IBM finds itself rather under-utilised in
the yet to be released PlayStation3. The Cell is frankly the work of sheer
genius. It is made up of 8 Power PC cores called SPEs or Synergistic
Processing elements which are all connected to an arbiter called the PPE
or the PowerPC Processing Element. The PPE decides the task for each SPE
and doles it out accordingly. The claimed processing power is around 2

Considering that the
PlayStation 3 will be no larger than your average DVD player, that’s a
very powerful and space efficient solution right there! The amazing part
about the Cell is that since it is inherently a system on chip design,
scaling up the processor itself is a simple matter of adding more SPEs. In
fact, we can connect entire cell processors (PPEs, SPEs et al) to multiple
others and get tremendous computing power. Of course, we haven’t yet
seen a demo of this so far but we are sure that the potential for a killer
server grade processor is definitely there.

Imagine the
possibilities where you can take your entire server cluster and replace
them with just a cell processor based server,

perhaps the size of just 1U! 

Definitely, Cell
Processor-based events can’t come out fast enough!

Intel’s NGMA (Next Gen Micro Architecture) Intel
has mostly been quiet about its processors recently. Clearly the reason being
the thrashing it has received at the hands of AMD. Having said that, Intel is
definitely very strong on dual and multi-core processors to give you the maximum
performance per square inch! 

Intel has had a dramatic shift from pure clock speeds
towards being power conscious-their


processor being a case in point. Granted that the processor is merely dual core
but the other features like the 16 MB cache and the thinner 65 nm fabrication
process boost

performance tremendously. The Blackford chipset, which the


runs on, offers a total memory bandwidth of 17GB/s and the total RAM capability
of 64 GB!

The other upcoming processors from Intel are Woodcrest,
featuring a 4 MB L2 cache and its successor Whitefield, which will be a quad
core with 16 MB L2 cache! All these multi-cores coupled with multiple processors
will lead to an unprecedented jump in calculation power that each server will
provide you with.

The chip has been rechristened UltraSPARC T1 (

being the code name) and is a very good example of Sun’s engineering
brilliance. This is the first time they have implemented their throughput
philosophy onto

Silicon. It is also a perfect example of how CPUs in future will be, which is
why it finds mention here.


runs at 1.2GHz and each CPU has a  32KB L1 data cache and 16KB L1
instruction cache. These numbers really aren’t too impressive considering the
UltraSPARC 4+ (Sun’s first dual core) is faster (clock to clock) by around 20
percent and has 2 MB L2 cache vis-à-vis 3 MB for

. So for single core performance, the UltraSPARC4+ beats the

hands down. If you look at the overall performance however, the eight-core

, with help from its revolutionary crossbar memory controller beats all its
previous brethren by a huge margin. The focus is on TLP (Thread Level
Parallelism) and a maximum of 32 different program threads can be processed
simultaneously! This is exactly the kind of performance we are looking for to
fit our objective of maximum computing per square inch!

Cluster computing
Cluster computing could actually gain and lose with individual servers
getting so powerful and costing the same for all practical purposes. Those with
basic requirements could perhaps do away with their clusters altogether
(depending of course on the type of apps you run on your servers) and just buy a
couple of thin servers, making data-center management seem like a vacation!

Those with much higher performance requirements would
obviously make a cluster of these ever more

powerful systems and get much more performance out of them.

Power consumption and heat

In our quest for maximum computing per square inch, we obviously cannot neglect
the power consumptions. Electricity bills are perhaps the bane of every data

The costs increase basically on two counts-running the
servers and then keeping them cool. If we have processors with a lesser TDP and
thus lesser heat generation, both these costs can be cut quite significantly.
The processor manufacturers are not oblivious to these needs and are pulling out
all stops to give you maximum power at lowest energy bills.

Apart from the usual L1 and L2 caches, the Itanium houses a massive L3 cache

Intel’s NGMA Up until now Intel’s philosophy has
been, ‘the highest clock speed will win you the world’. But not anymore.
With the last IDF (Fall 2005), Intel has undergone a paradigm shift in its
concept of performance. It is now focusing on performance/watt. This means more
efficient overall system performance, something that AMD has been touting for

General Purpose GPUs

This is an interesting
concept that is being pursued by graphics-card vendors. In particular ATi
has been backing this up quite strongly. They say that the GPU today (their’s
anyway) have enough processing power to offload some of the tasks from the
CPU. The first step they have taken towards this is in decoding multimedia

In fact, both NVIDIA
and ATi have successfully demonstrated that while using their GFX cards on
a system, the load on a processor (as far decoding multimedia content
goes) gets significantly reduced. This leads to overall system efficiency
as your CPU is free to perform other tasks that you might want to do while
multimedia processing goes on in the background.

Due to sheer
optimization that GPUs undergo due to the increased vertex shader
implementation in games, they are more like super-tuned computing
machines. The graphics-card companies want to exploit this fact. The
result could very well be high levels of parallel processing and PCs
optimized for multi-tasking.

ATi is taking this
concept further. They claim that their X1900XTX has a total computing
power of 500+ GFLOPS compared to a mere 80 GFLOPS (maximum) in the Dual
Core Pentium 4s. Obviously, this means that any calculation intensive
tasks can be easily done by the GPU instead of loading up the CPU. Thus,
ATi says that they will not require a dedicated engine that solves Physics
problems for giving life like realism. Their claim is that due to the giga-flops
of available power, their cards can do rendering, Physics calculations and

Intel has also moved to a unified micro architecture across
platforms. So whether it is Yonah for the notebooks,


for the desktop or Woodcrest for servers, the basic architecture remains the
same. The base architecture is actually a Banias derivative which, being an
architecture for the notebooks, makes it extremely frugal on electricity.

Intel just launched their low voltage Xeon processor which
is the first of its kind (from Intel anyway) to

feature dual core as well as low power consumptions. The processor has a TDP of
just 31 W! TDP, of course, is the power the chip requires to run conventional
software at their maximum. This is usually around 90 percent of the maximum
power the chip is ever expected to require.

If you compare this 31 W to the 110 W for the single core (Irwindale)
dual processor servers, you begin to appreciate the drop in power consumptions.
It has been cut down to less than 30%. So even though initially the investment
might be a bit high, the running costs will make it up in no time.

While the eight cores of

are pretty impressive themselves, the other incredible thing about it is the
sheer frugality of power consumption. While PowerPCs can consume around 100 W of
power per core and Intel’s Paxville and single core

use between 110 and 135W of power, the

uses a maximum of 72 W! So if you have a large data center, the saving in terms
of power would perhaps be worth the switch! The amazingly low-power consumption
also means that you can pack your datacenter a lot more closely as these systems
will run far cooler than your current rack.

Compare this with the LV Xeon, the reason why this 72W is
impressive is because

has eight cores while the LV Xeon is merely dual core. The obvious limitation
here is that

supports only Solaris 10 while Xeon will let you load pretty much any OS on it.

AMD Opterons AMD has actually been having quite a
good run with its recent Opteron series. No longer are AMDs plagued by heating
issues that they had earlier. Their current Opterons run at a TDP of 85-90 which
is significantly cooler than the Intel’s Xeons or even Itaniums.

So clearly, the shift from all the manufacturers has gone
from making the fastest processors to making processors that are more
economical, run cooler and more efficiently. The focus is shifting towards the
fastest clock speed to increasing the overall system performance and it
couldn’t have come sooner.

Virtualization technology
After years of having virtualization at the top end server level, it has
finally trickled down and even desktops are coming with VT (virtualization
technology) at the processor level! All major processor vendors, from Intel to
IBM to AMD to SUN implement VT in one form or another.

This grossly increases the ease with which you can run
multiple operating environments/systems without resorting to any sort of
software optimization. You will be able to run completely independent
environments,  which may or may not have the same OS virtualized right down
to the hardware level. The dual and multi-cores will facilitate this even more
by providing you hardware level multiple processors (or cores). Each environment
will then probably get mapped to a core. The more cores you have, the better
your VT performance!

The Opteron has a hefty L2 cache which saves CPU clocks while computing

Note that we are still talking of cores per processor
(Sun’s T1 gives you upto 8!). Once you combine these multi-cores into
multi-processors systems, the true impact on performance begins to emerge.
Virtualization promises to make administration tremendously simpler. No longer
will you have to go from PC to PC doing routine maintenance tasks. You will also
not have to rely on individuals to maintain their machines. All you need to do
is create two different virtual machines, one is for the user and the other is
for you to use (remotely at that!) for running your maintenance tasks. You can
in fact do both these things simultaneously with almost no downtime!

If the processing power grows enough (and it will, by the
looks of it) you might even be able to club your servers doing different tasks
into a single machine! Say you have a mail server, a web server and a print
server. Instead of having multiple machines, hardware level VT will let you
combine all these into a single server with various environments virtualized.
This saves you space, money and running costs.

The biggest benefit of hardware level VT that processors
today carry is that each environment runs completely independent of others. So
even if one of them crashes, or gets infected by a virus, the others run
unaffected. Moreover, you can simply reset the crashed environment without
having to restart the whole machine!

All major vendors today provide VT at the processor level.
While Intel calls it Intel Virtualization Technology, AMD calls it


and Sun, Logical Domains.

Is the software optimized?
So far, we’ve seen technologies and innovations that are being
incorporated in the CPUs of today and tomorrow. But performance is a mix of
software and hardware and having the fastest hardware will get you nowhere if
you use software that can’t take advantage of it.

That might be one of the bottlenecks initially. Hardware
normally does leap ahead and then the software catches up. So even though you
might invest into a

super fast system right now, the benefits might not show up till manufacturers
release proper software which exploits the capabilities. Still, the time lag
might be a small price to pay for the leap in performance that we can expect in
the coming quarters.

Licensing issue
With all these myriad cores changing the very definitions of processors, it
is natural that the legal eagles have their work cut out for them. Earlier,
processors used to be simple devices and licensing was done on the basis of per
processor, or indeed per box. We then migrated to dual or multi-processor
architectures which complicated licensing further. Should the firms charge
licensing per server box? Or should they charge on a per processor basis. The
reason being that multi- processor systems might replace some installed servers,
thus, reducing the total deployment. This would mean that the company needs
lesser licenses, thus, translating into a loss of revenue for software vendors.

Things get even more complex when we enter the realm of
multi-core multi-processor systems. The tremendous jump in performance will
require even lesser licenses, which means software companies could start
charging per core!

All this translates to some pretty high cost while

migrating to multi-core and/or multi-processor systems and is something you
should enquire and ascertain before embarking on spending your money.

There is no doubt that processing power will go through the
roof in the immediate future. But unlike

earlier times they will also run cooler, be more energy efficient and give you
many important features like virtualization, dual/multi-core etc.

The only, hitch, as we mentioned could be the perennial lag
while software catches up to hardware and the licensing. Let’s hope that the
software companies show maturity and come up with customer-friendly licensing
agreements which don’t penalize you for going in for a higher server. All said
and done though, the future of CPUs is definitely an exciting one!

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.