The Nehalem based Xeon processor -Xeon 5500 Series is the first processor
from Intel to have Native Quad Core support. We received a 1U rack server from
Intel with two 2.9 GHz Nehalem processors and 24GB RAM. We ran quite a few tests
on it, and as expected, the results were mind boggling. We will discuss the
results later. First, let's take a quick look at some of the key features of the
processor. The basic features are pretty much similar to the Core i7 desktop
architecture and offer the following functionalities:
Native quad-core
Till now, all previous Xeon processors with more than two cores were built
using multi-chip modules of dual core processors. So they were essentially two
or three dual cores modules fixed in one chip creating quad or hex core CPUs.
With Xeon 5500 series, there is native quad core design. This is similar to
AMD's Phenom X4 CPUs. The same feature is also available with Intel's Nehalem
based desktop processors, the 'Core i7'.
Price: On request Meant for: Data centers Key Specs: Native quad-core, inclusive level 3 cache, integrated memory controller, hypre-threading Cons: None SMS Buy 130569 to 56677 |
The advantages of having a native quad-core over an MCM (Multi core modules)
are significant in terms of processor energy efficiency, performance, and
dynamic scalability. We will see some of these in our benchmark results.
Inclusive level 3 cache
First showcased on earlier Xeon server chips and then on desktop Core i7
CPUs, the Xeon 5500 family of CPUs feature up to a massive 8MB of level 3 cache
(shared between all four cores) as compared to 2 MB of Phenom X4. The cache is
also described as an inclusive level 3 cache. Intel claims, an inclusive cache
is more efficient than an 'exclusive' cache design, even if it does mean that
1MB of Nehalem's 8MB Level 3 cache is taken up by storing a copy of the 256 KB
Level 2 cache inside each processing core.
Integrated memory controller
By modularizing the design of the CPU and the Northbridge, the memory
controller has been brought to the Nehalem CPU die. The separate processing
cores and caches are linked to the on board memory controller via a new bus
standard called the QuickPath interconnect, replacing the conventional front
side bus. As QuickPath replaces the Front side Bus (FSB), it also takes over the
role of allowing the CPU to connect to other system components, buses and
controllers such as the PCI Express controller and DDR3 memory, reducing latency
and improving performance considerably.
This shows time taken by different problem sizes on 2 socket Nehalem. |
Hyper-threading
Another feature worth mentioning is Hyper-threading. Using spare resources
of a core to execute a second process thread, Hyper-threading enables a
quad-core Nehalem processor to accept and process eight threads simultaneously,
making it even more massively parallel and powerful than the current Core 2 Quad
CPUs.
Performance results
We ran three benchmarks on the server -LINPACK, SunGard and Cinebench. Plus
we also recorded its power consumption in different levels. For running all
these benchmarks, we used Windows Server 2008 as the OS. Here is what all we
got.
A dual socket Nehalem server is able to show 16 processors due to its Hyper-Threading capabilities. |
Linpack
The test was really exciting with some really interesting results.
Undoubtedly, this gave the best result when compared with Intel's Harpertown or
Dunnington processors. But surprisingly, it even gave better performance than
Dunnington with 24 Cores, even though it only had 8 Cores and 16 threads. The
final result we got was a whopping 76 GFlops, which was 14 Gflops more than the
24Core (6Core * 4 Socket) Dunnington. This result was achieved with a problem
size of 50000 and 16 threads in Linpack.
Another interesting observation is that, due to Hyper Threading, the
processor was really getting an edge over its predecessor. When we ran the same
problem on 8 Threads, which was equal to its actual number of cores, it gave
much lower performance.
SunGuard
We used SunGard Adaptive Analytics as a component of SunGard's Suite of risk
management products. More precisely, it is the stripped down version of the
actual product. This benchmark utilizes Monte Carlo method financial engine to
predict the future of a fictitious portfolio. It requires two different files to
run. The first one contains sample data that represents the actual market
condition and the second file contains the sample customer's investment
portfolio. The benchmark scores are calculated on the base of time in seconds.
So lesser the time it will take to run, the better thel performance. In this
test, our server was able to finish the task in 130.5 seconds. If we compare it
against Dunnington, with 3 times the number of cores, Dunnington was able to
finish the same test in 105.9 seconds which is just 20% faster than Nehalem. If
we multiply both results with the number of cores available in each server, we
get 1044 for Nehalem and 2541 for Dunnington. If we see per core performance of
both servers, Nehalem gave 2.4 times better performance than Dunnington. This is
indeed a brilliant score.
CineBench
And finally we ran CineBench 10 x64. This benchmark measures the performance
of processor and graphics card. This test consists of two parts: first is
processor intensive and second is graphics intensive. Initially it makes use of
a single CPU for running the test whereas the latter part uses all cores. In the
graphics test, the test runs inside a 3D window. An animated scene is played
starting with a low demand for graphics which is increased later. Finally a
score is generated, when the processor works on maximum speed for the scene to
be displayed properly. The higher the scores the better the server performance.
The score we got for a single CPU was 4429, which was again 30% better than
Dunnington, which gave around 3266 CBCPU with one CPU. With all CPUs, the score
of Nehalem was 28667 CB-CPU.
Bottomline: If you are planning to scale your datacenter and want to buy
servers which can cope up with mission critical virtualization and parallel
processing, then no doubt this architecture is for you.