We got a double treat in labs this time. On one side we received Intel's
latest Nehalem-EX CPU, which is the chipmaker's bet for high-capacity SMP
servers, and boasts of many of the RAS (reliability, availability, and
serviceability) features that were only found in the upper spectrum of server
CPUs, like the Itanium. The other treat was the server that Intel sent, which
contained this processor-the latest PowerEdge R810 rack server from Dell. You'll
know why that was a treat when you hear about the specs of the 2U rack server.
Nehalem-EX CPU
The Nehalem-EX is designed for SMP environments, and can therefore scale
from 2 to 256 CPUs. This essentially implies very large workload environments
like Decision Support Systems, virtualization, HPC, databases, ERP, CRM, and
other mission critical applications. Intel claims that a single Nelahem-EX based
sever can replace 20 single core servers. The LGA 1567 socket based Nehalem-EX
is built on the 45 nm fabrication process, and supports up to 8-cores in a
single CPU. Moreover, each core supports two threads with the help of
Hyperthreading technology, due to which, each 8-Core Nehalem-EX provides for 16
logical cores.
While each core has 3 MB of L3 cache, the cache interconnect allows each core
to share all the L3 cache. Due to this, an 8-core Nehalem-EX provides a whopping
24 MB of L3 cache to each core. Each CPU has two integrated memory controllers,
which support up to 16 DIMMs. So a 4-socket server based on the Nehalem-EX will
support up to 64-DIMM slots, which can take up to 1 TB of memory. The processor
has a base clock speed of 2.26 GHz, which goes up to 2.66 GHz with turbo boost.
The other thing we mentioned about the Nehalem-EX was its RAS (reliability,
availability, and serviceability) features. In this, a Nehalem-EX based system
will work with the firmware/OS to recover from hardware errors. It would
automatically attempt to recover or restart processes so that the machine
continues to function normally.
About the Dell R810 Server
As is standard in Dell servers, the R prefix is used to denote a rack
server, and the 0 at the end of its model, viz. 810 denotes that it's based on
Intel CPUs (a '5' at the end denotes AMD CPUs). The one we received was powered
by two Intel Xeon X7560 processors. This server is meant for virtualization or
workload consolidation applications, as well as for mid-size databases. The 2U
PowerEdge 810 weighs around 26 kg when fully populated. Its front panel has a
DVD ROM and 2 USB 2.0 ports to connect any USB device or flash drives. There is
also an LCD display unit that prompts for basic troubleshooting info to the
server admin. There are two SD card ports inside the server which provide
redundancy. Six hot pluggable redundant cooling fans keep the server's internals
cool.
The PowerEdge 810's motherboard is based on Intel 7500 Chipset and has 4 CPU
sockets. These can be populated with either four Xeon 7500 or two Xeon 6500
series CPUs of quad, six, or eight core versions. The server came with two,
eight-core Xeon 7560 CPUs, meaning we had the power of 16 physical or 32 virtual
cores in our hands. The server has 32 DIMM slots that support DDR3 memory, and
luckily for us, they were all populated with 4 GB DDR3 DIMMs, meaning a whopping
128 GB of RAM in a single 2U box! If that's not sufficient for your requirement,
then each DIMM slot can support up to 16 GB memory, meaning the maximum memory
capacity supported by this server is 512 GB. The server reached us populated
with six 146 GB SAS drives of 15k RPM each. You can plugin six 2.5” SAS or SATA
hard drives including SSDs, and it supports up to 3 TB of storage. The server is
powered with the help of 2 redundant hot pluggable 1100 watt power supplies,
configured for 1+1 redundancy. We tested the server with Windows Server R2
64-bit.
Performance
Considering that this is a fairly high-end configuration, we focused on two
aspects of its performance. One, we tried to see how the performance scales as
the number of cores increased. Two, we measured the power consumed by the server
as more cores were activated, to see if the power consumption increases linearly
as more CPU power is consumed. Plus, we also ran the CineBench benchmark, which
is a 3D content creation benchmark that measures the performance of the CPU and
graphics sub-system of a machine.
SunGard & CineBench benchmark results
We used a financial risk analysis application benchmark called SunGard for
the job. The application allows you to select the number of threads to use,
which essentially controls how much CPU power to extract from the system. The
application uses a Monte Carlo method financial engine to determine the future
value of a fictitious portfolio. We ran this test with 8, 16, and 32 threads,
and as can be seen, the time taken to determine the future value of the
portfolio reduces significantly as the number of threads increases. So
essentially, when we moved from 8 to 16 threads, there was a 48% jump in
performance, and as we moved further to 32 threads, we saw a 34% jump. We then
measured the power consumed by the server with 8, 16, and 32 threads. With 8
threads in use, the average power consumption by the server came to 521 watts.
As we moved from 8 to 16 threads, the power consumption increased to 600 watts,
which is a 15% increase in power consumption. Further, as we moved from 16 to 32
threads, the power consumption increased to 623 watts, which is a mere 4%
increase in power consumption. So essentially, the jump in performance is far
greater with more cores, as compared to the jump in power consumption. Since
power consumption in the data center is something that worries most CIOs, this
statistic would come in handy while choosing a new server platform.
We also ran the CineBench benchmark on Dell R810, and compared it against a
4-way, six-core Dunnington processor based server. Yhe Dunnington system had 24
cores running inside. The good thing about CineBench is that apart from showing
results of the current system, it also shows test results for similar systems
that have been conducted by others. Initially, we were a little disappointed by
the R810 server's CineBench results-14.81 against Dunnington's 18.58 CPU points.
After some pondering over why its performance was lower, we found some probable
answers. For one, the Dunnington system had 24 cores, against 16 cores in the
R810. Plus, the Dunnington CPUs were running at 2.66 GHz vs 2.27 GHz in the
R810. Another interesting outcome we observed for Intel-based servers was that a
higher number of threads doesn't give a huge performance jump. That's why, even
though the R810 had 32 threads against only 24 running in the Dunnington, its
performance was lower. The CPU frequency provides a minor advantage in
performance. Interestingly, CineBench reported far lower results for a 12C/12T
AMD Opteron based system.
Bottomline: Whether the Nehalem-EX platform is worth shifting to or
not depends upon two things-price and performance. Since the price of this
system was not known to us, we can't comment about the same. In terms of
performance, the system is definitely worth evaluating.