In 1965, Gordon Moore predicted that the number of transistors on a chip
doubles about every 2 years. More than three decades have passed since the
prediction, and we still see Moore's Law being followed by all leading chip
manufacturers. This one prediction has revolutionized the way processor
manufacturers look at the future of processor technology.
Today, breaking all possible technological barriers we have processors based
on 45 nm technology, a huge development since the initial 800 nm processors.
Over the last few years, we have also seen processor frequency reaching a
saturation point, exceeding which doesn't seem feasible . This is mainly due to
excessive power consumption and heat generation. Hence the only feasible
alternative was to add more cores to a single die. This made it more power
efficient and increased the performance as well. Single core ruled the market
for a long period but the jump from single to dual and now to quad has been
rather fast. Trying to cater to the heavy demand from all quarters of the
industry, vendors were made to think out of the box and to dish out solutions
that can meet all ends. Does adding more cores help? Will reducing the die of
the processor and increasing the number of transistors on a chip be of any use?
Will 45 nm suffice or will we shrink the die further? Moreover,how much can
we shrink the die further? Which applications will be able to utilize so much
processing power? These are the questions that plague most individuals, and in
this story, we'll make things more clear.
The shrinking die
One of the current trends that have gained rapid speed for the least few
years is shrinking the die and cramming more and more transistors in it. If we
look back, 90 nm was all that we had couple of years back, but the transition
from 90 nm to half of it has been really fast and now we are already moving to
32 nm in the near feature. Does reducing the die and adding more transistors on
the die make sense? Yes it does and for good reason too. Smaller dies means more
processing power in lesser area. It also means more efficient power consumption,
and lesser heat dissipation. Today's data centers control hundreds of servers
powered by thousands of processor cores. Even an incremental increase in power
per core translates to a large overall power consumption. Enough has been said
about shrinking processor dies and the move to multiple cores. But does it
impact the processors from various vendors? Let's find out.
The server champs
The server domain is not new to multi-core CPUs. In fact, vendors like Sun
and IBM have had multi-core processors for a long time. It's only recently that
the x86 CPU giants Intel and AMD have introduced their multi-core offerings.
Therefore, we'll focus on the latest developments in server processors by all
the key players.
Sun's Niagara
In 2005, SUN's UltraSPARC T1 processor (codenamed Niagara) was launched with
8 cores, each supporting 4 threads. This year Sun released a sequel to Niagara,
the UltraSPARC T2 processor (codenamed Niagara 2), which also has 8 SPARC cores,
like the previous one. All the 8 cores are connected to 4MB of shared L2 cache.
Each of the cores is capable of eight-way simultaneous multithreading,
enabling a total of 64 simultaneous threads of execution.
All this thread processing power enables twice the throughput and performance
per/watt gain over Niagara 1. The UltraSPARC T2 processor is amongst the first
'system on chip,' having the most cores and threads. Niagara2 is fabricated on a
65 nm process and has 503 million transistors, though we won't be surprised if
Sun also decides to launch their 45nm processor, ie Niagara 3, sometime soon.
What is expected from Niagara 3 is more processing power with emphasis on power
consumption and better memory bandwidth. It has been designed in such a way that
delays in accessing the memory are avoided. For the time being, Sun is planning
to provide the OpenSPARC T2 RTL (register transfer level) processor design to
the Open Source community under the GPL license.
IBM's POWER
POWER 6 is the latest in series from IBM for its servers, which was launched
in the middle of 2007. Running at 3.5, 4.2 and 4.7 GHz, the POWER6 promises to
deliver twice the speed of the previous generation POWER5 CPU.
The most impressive part of this new processor is the memory bandwidth of
about 300 Gbps. It can download the entire iTunes catalog in about 60 seconds.
The server based on POWER6 processor comes with specialized hardware and
software which allows it to create many 'virtual' servers on a single box.
It is the first UNIX microprocessor with the ability to calculate decimal
floating point arithmetic in hardware. There is a vast improvement in the way
instructions are executed inside the chip. The performance has been enhanced by
keeping the number of pipeline stages static, making each stage faster and doing
more work in parallel. Hence, lesser execution time and ultimately less energy
consumed.
Another major advantage with POWER6 is that the processor clock can be
dynamically turned off or on depending upon the requirement. IBM also has plans
to provide customers with the ability to move live virtual machines from one
physical UNIX server to another while maintaining continuous availability. Known
as POWER6 Live Partition Mobility function, this technology will be an added
advantage to have. As has been the norm, after POWER 6 which is based on the
65nm technology, it's very much possible that POWER7 will be based on 45nm
technology.
Intel's Xeon
This CPU has now become a household name in the server domain. Since the
first Xeon based processor, 'Pentium II Xeon' in 1998 (codenamed “Drake”), there
has been a huge demand for Xeon processors in the server domain. Intel has
designed the Xeon processors family in such a manner that each family has a
specific target segment in mind. If the 3000 (3040, 3050, 3060 etc) sequence is
for the SMBs, then the 5000 (5100, 5110, 5120) sequence is the most commonly
used amongst enterprises. Then there is the 7000 (7100, 7200) sequence which is
meant for large scale enterprise computing and server consolidation. If you want
even more computing power within your server you can opt for the Itanium 9000
sequence meant for massive, mission-critical computing and RISC replacement.
Itanium can scale up to 512 dual—core processors and a whopping 1000 TB RAM.
The advantage of multi-core over single core is the fact that different applications can be handled by dedicated threads hence enabling faster processing of tasks. |
Future of server processors
It's not possible to drill down into the details of each server processor,
as each upgrade brings with it a different codename or an additional socket;
different FSB or cache design; or a different micro architecture and so on. So
we'll concentrate on the future plans of processor majors and see what
architecture they come up with.
Intel released the relabeled versions of its Core 2 Quad processor as Xeon
3200-series this year, codenamed Kentsfield. This 2x2 'quad-core' comprised two
separate dual core dies next to each other in one CPU package. Even the 7300
series which is codenamed Tigerton was also announced this year. It has four
sockets and more capable Quad core processor, consisting of two dual core 2
architecture silicon chips on a single ceramic module. It uses Intel's Caneland
platform and twice the performance compared to the previous generation of
processors. With the launch of the 45 nm Penryn processor, Intel has announced
their Penryn based Xeon (5400 series) models codenamed Harpertown, with a higher
FSB of 1600 MHz.
The dual core version of the CPU, codenamed Wolfdale will be available from
1.89 to 3.4 GHz. Intel plans to launch their Quad Core processor codenamed
Gainestown based on their new Nehalem architecture, which will also be based on
45 nm technology but will be based on a new micro architecture. Soon we will see
both quad and dual core processors based on Westmere and Gesher architecture
which will mark the arrival of 32 nm technology.
Like Intel, AMD too jumped into the Quad Core bandwagon, though a little
late. Their new Barcelona is the first 'native' Quad core processor, as it is
not made up of two dual core dies like Intel's Kentsfield. It is based on 65 nm
technology and the major change in it is the inclusion of what AMD is terming as
SSE128. In the initial K8 architecture, it can execute two SSE operations in
parallel, although the SSE execution units are only 64-bits wide. So when a
128-bit SSE operation drops in, K8 architecture handles them as two 64-bit
operations. With Barcelona the execution units have been widened, as now 128-bit
SSE operations don't have to be broken up into two 64-bit operations resulting
into more decode bandwidth.
AMD Opteron 2300 series, which supports up to two processor configuration,
and Opteron 8300 series, which supports up to eight processor configuration,
both are Quad Core offerings from AMD and are based on 65 nm technology. With
the launch of Phenom in the coming month, we will notice Quad core Opteron
processor based on 45 nm technology from AMD as well. The future processor will
see an implementation of the Montreal core based on 45 nm fabrication node,
manufactured using MCM (Multi-Chip Module) techniques. They even have plans to
incorporate Bulldozer core in the upcoming server processor having support for
SSE5 instruction set allowing enhanced HPC and cryptographic computation.
Bulldozer is AMD's codename for its next generation CPUs that will improve
performance per watt ratio of the processor.
The desktop kings
We spoke about Quad Core for the server domain, but as soon as it was
popularized in the server domain, there was demand even in the desktop domain.
Intel Core 2 Extreme Quad Core QX6700 is much faster than Core 2 Extreme X6800.
This new processor doesn't save on power by any means and is meant for heavy
computational usage, engineering analysis, and other financial applications
which require heavy computation power.
AMD was also not far behind with their Quad FX platform that includes two
sockets, four core processors designed for extreme multi-tasking, megatasking
for PC enthusiasts, and power users who run the the most demanding tasks
simultaneously. There was demand for more processing power with the chip from
specialized segments and hence Intel Core 2 Extreme found its presence in most
gamer machines. AMD's Athlon has always been a favorite when it comes to gaming
and they too came up with a limited edition 6400+ black edition processor
running at 3.2 GHz, probably the only one with such high frequency within its
range. Gamers, power workers need more processing power and hence there is a
high demand for a processor with higher frequency and processing power. The
upcoming desktop processors will be based on 45 nm technology similar to servers
and soon we may see quad core ruling the desktop market as well.
45 nm and its benefits
Pre-45 nm technologies used Silicon dioxide as the transistor material. With
the introduction of 45 nm processors, an entirely different transistor material
has been used. Both Intel and AMD have devised their own solutions and are using
different materials to replace Silicon dioxide.
Intel has devised a combination of Hafnium based high-k (Hi-k) gate
dielectrics and a new metal material for the gates. Hafnium is a metal that
significantly reduces electrical leakage and provides high capacitance necessary
for good transistor performance. This would help the current lot of processors
to attain higher performance while reducing the amount of electrical leakage
from transistors that can hamper chip and PC design, size, power consumption,
and costs. The 45 nm processor will see increased transistor switching speed,
enabling higher core and bus clock frequencies and more performance in the same
power and thermal envelop.
As compared to 65 nm, the 45 nm technology provides 30% reduction in
transistor switching power, and double the transistor density. Many people
believe that ever since the introduction of polysilicon gate MOS transistor in
the 1960s, this is the biggest change in transistor technology.
Did we hear Triple Core? |
So far we have heard about dual cores and quad cores, now we hear about as many as 8 cores, so the trend has been to double the number of cores in a single die. There is no such logic that says only doubling of cores is possible. The demand of more versatile and unique products have finally prompted processor manufacturers to think out of the box and hence the emergence of Triple Core Processors. AMD decided to launch a triple core Phenom processor along with their current dual core and quad core processors. Whether it makes sense to launch a tri core processor or not, time will tell, but for the time being AMD is first to cash on to this new concept. Technologically there is nothing new or exclusive, just that it will be a compromised version of the Quad Core processor, with one core disabled. The initial Socket AM2+ tri core processor will be identical in specs to Phenom Quad Core processor, including 512 KB L2 per core, 2 MB shared L3 cache, architecture enhancements to processing resources and many other quad core advantages, only difference being that its one core would be disabled. For the time being it will find its presence only in desktops, but could be a useful processor for notebooks also. With one core being disabled and especially if it is electrically isolated from the others in the parent native quad core design, implies that the processor could be a power-efficient multi-core processor for mobile designs. What lies in future is yet to be seen, but the exclusivity of a triple core makes it an interesting segment to look out for. |
Intel SSE4 Instruction with 45 nm CPUs
45 nm processors from Intel will come with Intel's Streaming SIMD extensions
4 (SSE4) instructions. This new instruction set will deliver further performance
gains for SIMD (single instruction, multiple data) software and will enable the
new microprocessor to deliver superior performance and energy efficiency to a
broad range of 32 and 64 bit software. Applications involving graphics, video
encoding and processing, 3-D imaging, and gaming will surely benefit from this
new instruction set. It will also boost the high-performance applications like
audio, image, and compression algorithms. It would be interesting to see how it
performs, as it promises to provide dramatic performance gains.
SOI: AMD's next choice
While Intel opted for high-k material to replace silicon dioxide, AMD opted
for Silicon on Insulator. Here, the conventional silicon substrate is replaced
by a layered silicon-insulator-silicon substrate mainly to reduce parasitic
device capacitance and hence improve performance. SOI substrates are compatible
with most conventional fab processes. The only barrier to SOI implementation is
an increase in substrate cost that will increase the overall manufacturing
costs.
Improved cache design
The next lot of processors from Intel and AMD will witness better cache
design. Intel's Penryn processor will include a 50% larger L2 Cache with 24-watt
power consumption, to further improve the hit rate and to maximize utilization.
So dual cores will have up to 6MB of L2 cache and quad cores will have upto
12MB. AMD also plans to have shared L3 cache in addition to the 512K L2 cache
per core. It will have up to 2MB of L3 cache shared across 4 cores. The key
benefit being touted for this cache is that it would improve the probability of
data access by each core, thereby improving performance.
Tech beyond multi-cores
Virtualization is a key driver behind multi-core CPUs, and both Intel and
AMD have independently developed their own virtualization extensions. Intel has
further plans to add Virtualization for Directed I/O (VT-d), which will provide
a way of configuring interrupt delivery to individual virtual machines and an
IOMMU for preventing a virtual machine from using DMA to break isolation.
AMD too has similar plans to add specifications for an I/O Memory Management
Unit (IOMMU) that would provide a way of configuring interrupt delivery to
individual virtual machines. It will also play an important role in advanced
OSes. HyperTransport, whose primary use is to replace the FSB, is mainly a
bidirectional serial/ parallel high bandwidth, low latency point to point link.
A HTX (Hyper Transport eXpansion) plug in card was developed to support
direct access to a CPU and DMA access to the system RAM. It was designed mainly
to tackle the issue of bandwidth between the CPU and co-processor. AMD has
already announced an initiative named Torrenza to promote the usage of Hyper
Transport for plug-in cards and coprocessors. This technology is widely used by
AMD, Transmeta, NVIDIA, VIA and SiS.
Another technology which finds varied usage in servers is called CPU based
VMX (AltiVec), which is an instruction set that can apply a single processing
instruction to multiple data elements. Macro Fusion, a term coined by Intel
refers to a processor's ability to combine several instructions into one, thus
optimizing it and making for a faster execute. Other than SMP (Symmetric
Multiprocessing), SMT (Simultaneous Multithreading) along with instruction sets
like 3DNow, SIMD, L3 cache, etc have contributed to the success of multi-core
processors.
In the multi-core domain, research in tera-scale computing is on, where
terabytes of data must be handled by a platform capable of teraflops of
computing performance. Tera-scale computing is the way to bring massive compute
capabilities of super computers to devices of everyday use such as servers,
desktops and notebooks. So, we will have processors capable of dishing tera-scale
of computing power to desktops and servers.