by August 5, 2013 0 comments

Enterprises have started looking to MDCs (Mega Data Centers) for guidance and to emulate MDC architectures for their own private clouds, large scale compute clusters and applications for big data analytics. MDCs have become classrooms for lessons in efficiency, cost effectiveness, scale deployments and monetization of data.

Anatomy of an MDC

MDCs such as those at Facebook, Amazon, and Google feature distinct platforms optimized for storage, database, analytics/search/graph analysis or web servers. Their scale is jaw-dropping. They house 200,000 to 1,000,000 servers and anywhere from 1.5 million to 10 million disk drives.
MDC servers are clustered in groups of 20 and 2,000 server nodes per cluster.

A server may contain only boot storage, unprotected direct map drives with data replication across geographic locations, or RAID protected storage for database and transactional data. Because applications are clustered, a misbehaving node can affect the performance of the entire cluster, so it’s more efficient to take a server off line and allow the other 99% to run at full speed.

MDC operating systems and infrastructure are open source and most improvements have been given back to the community. Applications are self-built, and many of these have been offered back to the open source community too. The hardware infrastructure is often self-built or at least self-specified.

 

Since MDC networks tend to be static configurations focused on minimizing transaction latency, MDC architects are also deploying software defined network (SDN) infrastructure to improve performance and reduce cost.
MDCs deploy “lights out” infrastructure maintained by automated scripts and technicians with simple maintenance tasks. There is a ruthless focus on minimizing infrastructure cost.

MDCs are careful to eliminate anything not central to core applications, even if provided for free. One unnecessary LED to 200,000 servers, an excess that adds $10,000 in LED costs and 26,000 watts of power consumption – equal to 26 handheld hairdryers running 24/7.
Learning from today’s mega data center

Like MDCs, enterprises can deploy homogenous infrastructure that is easy to maintain and manage. Optimization, efficiency and incorporating lights-out-style self-management can support more capability with fewer resources.

Rather than maintain five nines reliability which is expensive and almost impossible architecturally in a large-scale infrastructure, enterprises can architect a resilient data center where subsystems can fail, but the entire system continues to operate..

One of the most important infrastructure subsystems affecting application performance and server utilization is storage. MDCs are leaders in optimizing data center storage efficiency, managing immense scale and unprecedented data traffic while providing high-availability operation and satisfying legal requirements for data retention and country location.

All rely exclusively on direct-attached storage (DAS), which is simpler than SAN or NAS storage, less expensive to purchase and maintain, has lower latency to the processor and offers higher performance.

Many MDCs use consumer SATA hard disk drives (HDDs) and solid state drives (SSDs) for direct-attached storage, but they almost always deploy Serial-Attached SCSI (SAS) infrastructure to maximize storage system performance and simplify management.

When evaluating storage, enterprises have long focused on IOPs and Mbytes/s measurements. MDCs have discovered applications driving IOPs to SDDs quickly reach internal self-limits – often peaking below 200,000 IOPs – and that MByte/s performance has only a modest impact on work done.

I/O latency is far more important to application performance, work done and server utilization and write latency affects database performance profoundly.

MDCs are increasing work-per-dollar by deploying more SSDs, solid state caching or both. Typical HDD read/write latency is around 10milliseconds. SSD read latencies average 200 microseconds and write latency around 100 microseconds. Specialized PCIe cards can reduce latency to tens of microseconds.

SSDs can supplement or replace HDDs to improving application performance, and enabling servers and applications to accomplish four to ten times more work. Solid state caching delivers the lowest latency when plugged directly into a server’s PCIe bus.

Intelligent caching places hot data (the most frequently accessed or temporally important) in low-latency flash storage, which is transparent to applications. Some flash cache acceleration cards support multiple terabytes of solid state storage, holding entire databases or working datasets as hot data.

Enterprise decisions on SSDs versus HDDs focus on the storage layer and cost per GB or IOP. But solid state storage is more reliable, less disruptive, easier to manage, faster to replicate and rebuild, and more power-thrifty than HDDs. The superior performance of SSDs enables more work with fewer servers, software licenses and service contracts to reduce TCO.

Pioneering the data center of the future

MDCs have pioneered application solutions that can scale far beyond any commercial product, such as Hadoop analytics and clustered query and databases like CassandraTM and Google’s Dremel. Enterprise are adopting these applications and inspiring new commercial solutions.
Two recent initiatives seek to bring MDC architectures, cost and management efficiencies to the enterprise market.

OpenCompute promises a minimalist, cost-effective, easy-to-scale hardware infrastructure for clustered compute datacenters. It could also lead to an open hardware support services business model similar to the one for open source software

OpenStack software seeks to deliver MDC clustered automated management to enterprise datacenters, creating pools of automatically managed compute, storage, and networking- the holy grail of the software defined datacenter.

Some architects make aggressive estimates that these solutions will reduce TCO by a staggering 70 percent. There are plans to dis-aggregate servers at the rack level – separating the processor from memory, storage, networking and power, while managing the life-cycle of each separately – to improve work-per-dollar.

Optimizing work-per-dollar at the rack or data center level can enable more work and spend less and drive cascading benefits such as reduced management and maintenance costs.

(pic caption:
OpenCompute holds the promise of a minimalist, cost-effective, easy-to-scale hardware infrastructure for clustered compute datacenters.)

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

<