Advertisment

Intel's Dunnington Six-core Processor

author-image
PCQ Bureau
New Update

Just a few months ago we reviewed the Harpertown processor. This processor

was launched just after the launch of 45 nm Penryn based Xeon 5400 series model.

This time we had the opportunity to test the next in line from Intel: Dunnington

processor. The server we received had 4 distinct processors with 6 cores each.

As there are not many applications that use all available cores, this processor

is meant for very high end computing and for virtualization in large

enterprises. It can also be used in cloud computing or rack optimized and

ultra-dense SKUs.

Advertisment

Technology in Dunnington



To understand and appreciate the tech used in Dunnington, we'll start with a

bit of history of the previous generation server processors. The Xeon 5200

series codenamed Woodcrest, based on the Intel's core-micro architecture was the

server and workstation version of the Intel Core 2 processor. The fastest

processor in this category operated at 3.0 GHz, claiming better performance and

also less energy consumption than previous processors. In Jan 2007, Intel

launched its quad-core, Core2quad, as the 3200 series which comprised of two

separate dual core dies placed next to each other in one CPU package. This was

targeted at blade servers. The 3300 series was similar to 3200 series but was

manufactured using 45 nm process and featured XD bit and virtualization

technology.

Direct Hit!
Applies To:

Data centers



Price: Yet to be released


USP: How six-core processors rev up the
performance of your server



Primary Link: www.intel.com


Keywords: dunnington


True to Intel's tick-tock release cycle of processors, where tick means a

refresh of the current architecture and tock means a brand new architecture, the

clock ticked and the Harpertown Xeons were released in late 2007. This family of

processors consisted of dual-die Quad-core processors manufactured on a 45 nm

process and featured 1333 to 1600 MHz front side bus with lesser TDPs, rated

between 50W to 150W depending upon the model.

Advertisment

And now Intel has become the first in the x86 processor market to launch a

processor with six cores. Their offering before this was the 7300 series, code

named Tigerton and consisted of two dual core architecture silicon chips on a

single ceramic module. Boasting of greater processing capabilities, the Tigerton

was based on Intel's Caneland (Clarksboro) platform. But now the Dunnington, or

the 7400 series, features a single-die six core design and is based on Intel's

45 nm Penryn processor. Like the Harpertown Xeon processor it has three dual

cores clubbed together. Compared to its predecessor, this processor has

significantly more cache, ie 16 MB L3 cache which is shared among all six cores,

3 MB L2 cache which is shared among two cores and 96 KB L1 cache. The increase

in the size of cache will lead to improvement of performance, mainly by reducing

the latency in accessing frequently used data. However, the processor speed

remains approximately the same, ranging from 2.13 to 2.66 MHz. However, the FSB

of Dunnington (1066 MHz) compared to Harpertown which has 1600 MHz (reviewed in

Jan 2008) is much lower, which can be a bottleneck but the 3 MB cache size is

said to reduce this to great extent. One good thing is that if you already have

a Tigerton's nPGA604 socket, then you just have to plug this CPU into that. And

it is compatible with Caneland chipsets too.

The above image shows six cores in the Dunnington CPU, having 16 MB L3 cache

shared among all the cores.

This 7400 series processor also supports VT-x technology, ie Intel Flex

migration and Flex priority technology. Earlier successful live virtual machine

migration was dependent upon the compatibility of the two CPUs between which the

migration is being done. And also to ensure that the VM is stable after the

migration is done. All these issues have been taken care of with the new Intel

Flex migration technology. Now this also solves the requirement of buying a

compatible resource pool across multiple generations of Xeon processors. This

gives you the option of choosing the right server platform with respect to

performance, cost and power for your enterprise. Flex Priority is another such

hardware feature which helps in optimizing virtualization by improving virtual

machine access to the task priority register.

Advertisment

How we tested



To test the performance of the server we ran different benchmarks such as
SunGard, Linpak, POVRay and Cinebench. We ran these benchmarks on Windows Server

2008 64-bit OS, first with 2 processors and 8GB RAM, and then with 4 processors

and 16 GB RAM. The HDD was configured on RAID 0 so that the IO doesn't create

any bottleneck during the benchmarking process. Initially we simply took out 2

processors, 8 GB RAM and ran the benchmarks. Then we placed the processor and

RAM back again and then ran the benchmarks to get full system performance. For

checking the power consumption, we connected this device via a 'wattmeter' to

the main power supply and then calculated the maximum, minimum and average power

consumption.

The performance graph while running SunGard on Dunnington (left) and

Harpertown (right). Dunnington took 95 secs less than Harpertown.

Benchmark results



Initially we started the test with Cinebench 10, which measures the performance
of processor and graphics card, and finally we gave a Cinebench score. This test

process consists of two different parts: the first part is processor intensive

and second is graphics intensive. Initially it makes use of a single CPU for

running the test whereas the latter part of the test uses all the cores. In the

second test, ie the graphics test, the test runs inside a 3D window. An animated

scene is played starting with a low demand for graphics which is increased later

on. Finally a score is generated, when the processor works on maximum speed for

the scene to be displayed properly. The higher the scores the better will be the

server performance.

Advertisment

Results: With 2 processors and 8GB RAM, Cinebench gave scores of 3262

CB-CPU while rendering 1 CPU and gave 26816 CB-CPU rendering with all the CPUs.

The GPU score ticked to 190 CB-GFX which is good for a server processor like

this one. Now with all the 4 CPUs and 16 GB RAM i.e. with full blown

configuration, this monster gave scores of 3266 CB-CPU for rendering 1 CPU which

is of course the same as the earlier case. But when it rendered all the CPUs

then the score ticked to 31372 CB-CPU which means 14% increase in the

performance compared to earlier configurations. However, pls note that this

benchmark 'CINEBENCH 10 64-bit' didn't use more than 16 cores.

As the next benchmark we used a ray tracing program POVRay which is used for

CPU benchmarking. It uses the raytracing rendering technique to calculate an

image, by simulating how light travels in the real world. For benchmarking with

POVRay, we used the standard 'benchmark.pov' as this file uses every internal

feature of POVRay and stresses the CPU to limits. One more reason for using this

benchmark file is that as it is the standard for all processors and it becomes

easier for others to compare scores.

Advertisment
All 24 cores being utilized while running the SunGard benchmark. Apart from

Linpack, this benchmark was the only one to stretch all 24 cores.

Results: With 2 processors and 8 GB RAM, POVRay rendered an average

120.38 PPS over 147456 pixels and with 4 processors and 16 GB RAM, it rendered

an average 120.25 PPS. POVRay used a maximum of 3 cores for executing the

benchmark.

Then we used SunGard Adaptive Analytics as a component of SunGard's Suite of

risk management products. More precisely, it is the stripped down version of the

actual product. This benchmark utilizes Monte Carlo method financial engine to

predict the future of fictitious portfolio. It requires two different files to

run, the first one contains a sample data that represents the actual market

condition and the second file contains the sample customer's investment

portfolio. The benchmark scores are calculated on the base of time in seconds,

so the lesser the time it will take to run, the better the server performed.

Advertisment

Results: In the first test, with 2 processors and 8 GB RAM, the total

time taken to run the benchmark is 156.2 seconds and with 4 processors and 16 GB

RAM, it took only 105.9 seconds. Harpertown with 8 cores and 16 GB RAM took

around 200 seconds which is 47% less than what this Dunnington processor took.

Comparison of the time taken by different machines to execute SunGard.

Next we ran Linkpack which takes down almost any server to its feet. It

basically measures a system's floating point computing power by making the

system solve an N by N linear equation (i.e. Ax = b). It calculates how much

amount of GFlops can be generated. The greater the number of GFLops generated

the better the system is.

Results: With 2 processors and 8 GB RAM the system generated 53.69

GFlops and with 4 processors and 16 GB RAM, the system gave 62.02 GFlops which

is lower than GFlops generated by Harpertown (65 GFlops). We got lower score for

Dunnington as Linpack that we had was customized for Harpertown.

For checking the min power consumption we kept the system idle which came to

be 438 W, whereas in the case of max power drawn, the wattmeter showed 715 W.

Advertisment