Tech Explained

Breaking the Barriers of Cores

PCQ Bureau

03 Jan 2008 09:18 IST

New Update

In 1965, Gordon Moore predicted that the number of transistors on a chip

doubles about every 2 years. More than three decades have passed since the

prediction, and we still see Moore's Law being followed by all leading chip

manufacturers. This one prediction has revolutionized the way processor

manufacturers look at the future of processor technology.

Advertisment

Today, breaking all possible technological barriers we have processors based

on 45 nm technology, a huge development since the initial 800 nm processors.

Over the last few years, we have also seen processor frequency reaching a

saturation point, exceeding which doesn't seem feasible . This is mainly due to

excessive power consumption and heat generation. Hence the only feasible

alternative was to add more cores to a single die. This made it more power

efficient and increased the performance as well. Single core ruled the market

for a long period but the jump from single to dual and now to quad has been

rather fast. Trying to cater to the heavy demand from all quarters of the

industry, vendors were made to think out of the box and to dish out solutions

that can meet all ends. Does adding more cores help? Will reducing the die of

the processor and increasing the number of transistors on a chip be of any use?

Will 45 nm suffice or will we shrink the die further? Moreover,how much can

we shrink the die further? Which applications will be able to utilize so much

processing power? These are the questions that plague most individuals, and in

this story, we'll make things more clear.

The shrinking die

One of the current trends that have gained rapid speed for the least few

years is shrinking the die and cramming more and more transistors in it. If we

look back, 90 nm was all that we had couple of years back, but the transition

from 90 nm to half of it has been really fast and now we are already moving to

32 nm in the near feature. Does reducing the die and adding more transistors on

the die make sense? Yes it does and for good reason too. Smaller dies means more

processing power in lesser area. It also means more efficient power consumption,

and lesser heat dissipation. Today's data centers control hundreds of servers

powered by thousands of processor cores. Even an incremental increase in power

per core translates to a large overall power consumption. Enough has been said

about shrinking processor dies and the move to multiple cores. But does it

impact the processors from various vendors? Let's find out.

Advertisment

The server champs

The server domain is not new to multi-core CPUs. In fact, vendors like Sun

and IBM have had multi-core processors for a long time. It's only recently that

the x86 CPU giants Intel and AMD have introduced their multi-core offerings.

Therefore, we'll focus on the latest developments in server processors by all

the key players.

Sun's Niagara

In 2005, SUN's UltraSPARC T1 processor (codenamed Niagara) was launched with

8 cores, each supporting 4 threads. This year Sun released a sequel to Niagara,

the UltraSPARC T2 processor (codenamed Niagara 2), which also has 8 SPARC cores,

like the previous one. All the 8 cores are connected to 4MB of shared L2 cache.

Advertisment

Each of the cores is capable of eight-way simultaneous multithreading,

enabling a total of 64 simultaneous threads of execution.

All this thread processing power enables twice the throughput and performance

per/watt gain over Niagara 1. The UltraSPARC T2 processor is amongst the first

'system on chip,' having the most cores and threads. Niagara2 is fabricated on a

65 nm process and has 503 million transistors, though we won't be surprised if

Sun also decides to launch their 45nm processor, ie Niagara 3, sometime soon.

What is expected from Niagara 3 is more processing power with emphasis on power

consumption and better memory bandwidth. It has been designed in such a way that

delays in accessing the memory are avoided. For the time being, Sun is planning

to provide the OpenSPARC T2 RTL (register transfer level) processor design to

the Open Source community under the GPL license.

IBM's POWER

POWER 6 is the latest in series from IBM for its servers, which was launched

in the middle of 2007. Running at 3.5, 4.2 and 4.7 GHz, the POWER6 promises to

deliver twice the speed of the previous generation POWER5 CPU.

Advertisment

The most impressive part of this new processor is the memory bandwidth of

about 300 Gbps. It can download the entire iTunes catalog in about 60 seconds.

The server based on POWER6 processor comes with specialized hardware and

software which allows it to create many 'virtual' servers on a single box.

It is the first UNIX microprocessor with the ability to calculate decimal

floating point arithmetic in hardware. There is a vast improvement in the way

instructions are executed inside the chip. The performance has been enhanced by

keeping the number of pipeline stages static, making each stage faster and doing

more work in parallel. Hence, lesser execution time and ultimately less energy

consumed.

Another major advantage with POWER6 is that the processor clock can be

dynamically turned off or on depending upon the requirement. IBM also has plans

to provide customers with the ability to move live virtual machines from one

physical UNIX server to another while maintaining continuous availability. Known

as POWER6 Live Partition Mobility function, this technology will be an added

advantage to have. As has been the norm, after POWER 6 which is based on the

65nm technology, it's very much possible that POWER7 will be based on 45nm

technology.

Advertisment

Intel's Xeon

This CPU has now become a household name in the server domain. Since the

first Xeon based processor, 'Pentium II Xeon' in 1998 (codenamed “Drake”), there

has been a huge demand for Xeon processors in the server domain. Intel has

designed the Xeon processors family in such a manner that each family has a

specific target segment in mind. If the 3000 (3040, 3050, 3060 etc) sequence is

for the SMBs, then the 5000 (5100, 5110, 5120) sequence is the most commonly

used amongst enterprises. Then there is the 7000 (7100, 7200) sequence which is

meant for large scale enterprise computing and server consolidation. If you want

even more computing power within your server you can opt for the Itanium 9000

sequence meant for massive, mission-critical computing and RISC replacement.

Itanium can scale up to 512 dual—core processors and a whopping 1000 TB RAM.

The advantage of multi-core

over single core is the fact that different applications can be handled by

dedicated threads hence enabling faster processing of tasks.

Future of server processors

It's not possible to drill down into the details of each server processor,

as each upgrade brings with it a different codename or an additional socket;

different FSB or cache design; or a different micro architecture and so on. So

we'll concentrate on the future plans of processor majors and see what

architecture they come up with.

Advertisment

Intel released the relabeled versions of its Core 2 Quad processor as Xeon

3200-series this year, codenamed Kentsfield. This 2x2 'quad-core' comprised two

separate dual core dies next to each other in one CPU package. Even the 7300

series which is codenamed Tigerton was also announced this year. It has four

sockets and more capable Quad core processor, consisting of two dual core 2

architecture silicon chips on a single ceramic module. It uses Intel's Caneland

platform and twice the performance compared to the previous generation of

processors. With the launch of the 45 nm Penryn processor, Intel has announced

their Penryn based Xeon (5400 series) models codenamed Harpertown, with a higher

FSB of 1600 MHz.

The dual core version of the CPU, codenamed Wolfdale will be available from

1.89 to 3.4 GHz. Intel plans to launch their Quad Core processor codenamed

Gainestown based on their new Nehalem architecture, which will also be based on

45 nm technology but will be based on a new micro architecture. Soon we will see

both quad and dual core processors based on Westmere and Gesher architecture

which will mark the arrival of 32 nm technology.

Like Intel, AMD too jumped into the Quad Core bandwagon, though a little

late. Their new Barcelona is the first 'native' Quad core processor, as it is

not made up of two dual core dies like Intel's Kentsfield. It is based on 65 nm

technology and the major change in it is the inclusion of what AMD is terming as

SSE128. In the initial K8 architecture, it can execute two SSE operations in

parallel, although the SSE execution units are only 64-bits wide. So when a

128-bit SSE operation drops in, K8 architecture handles them as two 64-bit

operations. With Barcelona the execution units have been widened, as now 128-bit

SSE operations don't have to be broken up into two 64-bit operations resulting

into more decode bandwidth.

Advertisment

AMD Opteron 2300 series, which supports up to two processor configuration,

and Opteron 8300 series, which supports up to eight processor configuration,

both are Quad Core offerings from AMD and are based on 65 nm technology. With

the launch of Phenom in the coming month, we will notice Quad core Opteron

processor based on 45 nm technology from AMD as well. The future processor will

see an implementation of the Montreal core based on 45 nm fabrication node,

manufactured using MCM (Multi-Chip Module) techniques. They even have plans to

incorporate Bulldozer core in the upcoming server processor having support for

SSE5 instruction set allowing enhanced HPC and cryptographic computation.

Bulldozer is AMD's codename for its next generation CPUs that will improve

performance per watt ratio of the processor.

The desktop kings

We spoke about Quad Core for the server domain, but as soon as it was

popularized in the server domain, there was demand even in the desktop domain.

Intel Core 2 Extreme Quad Core QX6700 is much faster than Core 2 Extreme X6800.

This new processor doesn't save on power by any means and is meant for heavy

computational usage, engineering analysis, and other financial applications

which require heavy computation power.

AMD was also not far behind with their Quad FX platform that includes two

sockets, four core processors designed for extreme multi-tasking, megatasking

for PC enthusiasts, and power users who run the the most demanding tasks

simultaneously. There was demand for more processing power with the chip from

specialized segments and hence Intel Core 2 Extreme found its presence in most

gamer machines. AMD's Athlon has always been a favorite when it comes to gaming

and they too came up with a limited edition 6400+ black edition processor

running at 3.2 GHz, probably the only one with such high frequency within its

range. Gamers, power workers need more processing power and hence there is a

high demand for a processor with higher frequency and processing power. The

upcoming desktop processors will be based on 45 nm technology similar to servers

and soon we may see quad core ruling the desktop market as well.

45 nm and its benefits

Pre-45 nm technologies used Silicon dioxide as the transistor material. With

the introduction of 45 nm processors, an entirely different transistor material

has been used. Both Intel and AMD have devised their own solutions and are using

different materials to replace Silicon dioxide.

Intel has devised a combination of Hafnium based high-k (Hi-k) gate

dielectrics and a new metal material for the gates. Hafnium is a metal that

significantly reduces electrical leakage and provides high capacitance necessary

for good transistor performance. This would help the current lot of processors

to attain higher performance while reducing the amount of electrical leakage

from transistors that can hamper chip and PC design, size, power consumption,

and costs. The 45 nm processor will see increased transistor switching speed,

enabling higher core and bus clock frequencies and more performance in the same

power and thermal envelop.

As compared to 65 nm, the 45 nm technology provides 30% reduction in

transistor switching power, and double the transistor density. Many people

believe that ever since the introduction of polysilicon gate MOS transistor in

the 1960s, this is the biggest change in transistor technology.

Did we hear Triple Core?

So far we have heard about dual

cores and quad cores, now we hear about as many as 8 cores, so the trend has

been to double the number of cores in a single die. There is no such logic

that says only doubling of cores is possible. The demand of more versatile

and unique products have finally prompted processor manufacturers to think

out of the box and hence the emergence of Triple Core Processors. AMD

decided to launch a triple core Phenom processor along with their current

dual core and quad core processors. Whether it makes sense to launch a tri

core processor or not, time will tell, but for the time being AMD is first

to cash on to this new concept. Technologically there is nothing new or

exclusive, just that it will be a compromised version of the Quad Core

processor, with one core disabled. The initial Socket AM2+ tri core

processor will be identical in specs to Phenom Quad Core processor,

including 512 KB L2 per core, 2 MB shared L3 cache, architecture

enhancements to processing resources and many other quad core advantages,

only difference being that its one core would be disabled. For the time

being it will find its presence only in desktops, but could be a useful

processor for notebooks also. With one core being disabled and especially if

it is electrically isolated from the others in the parent native quad core

design, implies that the processor could be a power-efficient multi-core

processor for mobile designs. What lies in future is yet to be seen, but the

exclusivity of a triple core makes it an interesting segment to look out

for.

Intel SSE4 Instruction with 45 nm CPUs

45 nm processors from Intel will come with Intel's Streaming SIMD extensions

4 (SSE4) instructions. This new instruction set will deliver further performance

gains for SIMD (single instruction, multiple data) software and will enable the

new microprocessor to deliver superior performance and energy efficiency to a

broad range of 32 and 64 bit software. Applications involving graphics, video

encoding and processing, 3-D imaging, and gaming will surely benefit from this

new instruction set. It will also boost the high-performance applications like

audio, image, and compression algorithms. It would be interesting to see how it

performs, as it promises to provide dramatic performance gains.

SOI: AMD's next choice

While Intel opted for high-k material to replace silicon dioxide, AMD opted

for Silicon on Insulator. Here, the conventional silicon substrate is replaced

by a layered silicon-insulator-silicon substrate mainly to reduce parasitic

device capacitance and hence improve performance. SOI substrates are compatible

with most conventional fab processes. The only barrier to SOI implementation is

an increase in substrate cost that will increase the overall manufacturing

costs.

Improved cache design

The next lot of processors from Intel and AMD will witness better cache

design. Intel's Penryn processor will include a 50% larger L2 Cache with 24-watt

power consumption, to further improve the hit rate and to maximize utilization.

So dual cores will have up to 6MB of L2 cache and quad cores will have upto

12MB. AMD also plans to have shared L3 cache in addition to the 512K L2 cache

per core. It will have up to 2MB of L3 cache shared across 4 cores. The key

benefit being touted for this cache is that it would improve the probability of

data access by each core, thereby improving performance.

Tech beyond multi-cores

Virtualization is a key driver behind multi-core CPUs, and both Intel and

AMD have independently developed their own virtualization extensions. Intel has

further plans to add Virtualization for Directed I/O (VT-d), which will provide

a way of configuring interrupt delivery to individual virtual machines and an

IOMMU for preventing a virtual machine from using DMA to break isolation.

AMD too has similar plans to add specifications for an I/O Memory Management

Unit (IOMMU) that would provide a way of configuring interrupt delivery to

individual virtual machines. It will also play an important role in advanced

OSes. HyperTransport, whose primary use is to replace the FSB, is mainly a

bidirectional serial/ parallel high bandwidth, low latency point to point link.

A HTX (Hyper Transport eXpansion) plug in card was developed to support

direct access to a CPU and DMA access to the system RAM. It was designed mainly

to tackle the issue of bandwidth between the CPU and co-processor. AMD has

already announced an initiative named Torrenza to promote the usage of Hyper

Transport for plug-in cards and coprocessors. This technology is widely used by

AMD, Transmeta, NVIDIA, VIA and SiS.

Another technology which finds varied usage in servers is called CPU based

VMX (AltiVec), which is an instruction set that can apply a single processing

instruction to multiple data elements. Macro Fusion, a term coined by Intel

refers to a processor's ability to combine several instructions into one, thus

optimizing it and making for a faster execute. Other than SMP (Symmetric

Multiprocessing), SMT (Simultaneous Multithreading) along with instruction sets

like 3DNow, SIMD, L3 cache, etc have contributed to the success of multi-core

processors.

In the multi-core domain, research in tera-scale computing is on, where

terabytes of data must be handled by a platform capable of teraflops of

computing performance. Tera-scale computing is the way to bring massive compute

capabilities of super computers to devices of everyday use such as servers,

desktops and notebooks. So, we will have processors capable of dishing tera-scale

of computing power to desktops and servers.

Advertisment