Advertisment

Striking the Right Balance Between Data Storage and Latency in a Datacenter

author-image
PCQ Bureau
New Update


Advertisment

Advertisment

- Jeff Richardson, Executive Vice President and Chief Operating Officer at LSI

It's a curse in any network infrastructure, especially in the datacenter: Clear one performance bottleneck and another drag on data or application speed surfaces elsewhere in a never-ending game of “Whack-A-Mole.” In today's datacenters, the “Whack-A-Mole” mallet is swinging like never before as these bottlenecks pop up with increasing frequency in the face of the data deluge, the exponential growth of digital information worldwide.

Some of these choke points are familiar, such as the timeworn Input/ Output (I/O) path between servers and disk storage, whether directly attached or in a storage area network, as microprocessor capability and speed has outpaced storage. Other, newer bottlenecks are cropping up with the growing consolidation and virtualization of servers and storage in datacenter clouds as more organizations deploy cloud architectures to pool storage, processing and networking to increase computing resource efficiency and utilization, improve resiliency and scalability, and reduce costs.

Advertisment

Improving datacenter efficiency has always come down to balancing and optimizing these resources, but this calibration is being radically disturbed today by major transitions in the network, such as the growth of 1 Gbps Ethernet to 10 Gbps and soon to 40 Gbps, the emergence of multi-core and other ever-faster processors, and the rising deployments of solid state storage. As virtualization increases server utilization, and therefore efficiency, it also exacerbates interactive resource conflicts in memory and I/O. And even more resource conflicts are bound to emerge as big data applications evolve to run over ever-growing clusters of tens of thousands of computers that process, manage and store petabytes of data.

With these dynamic changes to the datacenter, maintaining acceptable levels of performance is becoming a greater challenge. But there are proven ways to address the most common bottlenecks today-ways that will give IT managers a stronger hand in the high-stakes bottleneck reduction contest.



Bridging the I/O gap between memory and hard disk drives

Advertisment

Hard disk drive (HDD) I/O is a major bottleneck in direct-attached storage (DAS) servers, storage area networks (SANs) and network-attached storage (NAS) arrays. Specifically, I/O to memory in a server takes about 100 nanoseconds, whereas I/O to a Tier 1 HDD takes about 10 milliseconds-a difference of 100,000 times that chokes application performance. Latency in a SAN or NAS often is even higher because of data traffic congestion on the intervening Fibre Channel (FC), FC over Ethernet or iSCSI network.

These bottlenecks have grown over years as increases in drive capacity have outstripped decreases in latency of faster-spinning drives and, in confronting the data deluge, IT managers have needed to add more hard disks and deeper queues just to keep pace. As a result, the performance limitations of most applications have become tied to latency instead of bandwidth or I/Os per second (IOPS), and this problem threatens to worsen as the need for storage capacity continues to grow from 50 percent to 100 percent per year. Keep in mind that the last three decades have seen only a 30x reduction in latency, while network bandwidth has improved 3000x over the same period. Processor throughput, disk capacity and memory capacity have also seen large gains.

Caching content to memory in a server or in the SAN on a Dynamic RAM (DRAM) cache appliance can help reduce latency, and therefore improve application-level performance. But because the amount of memory possible in a server or cache appliance, measured in gigabytes, is only a small fraction of the capacity of even a single hard disk drive, measured in terabytes, performance gains from caching are often inadequate.

Advertisment





Solid state storage in the form of NAND flash memory is particularly effective in bridging the significant latency gap between memory and HDDs. In both capacity and latency, flash memory bridges the gap between DRAM caching and HDDs, as shown in the chart. Traditionally, flash has been very expensive to deploy and difficult to integrate into existing storage architectures. Today, decreases in the cost of flash coupled with hardware and software innovations that ease deployment have made the ROI for flash-based storage more compelling.

Solid state memory typically delivers the highest performance gains when the flash acceleration card is placed directly in the server on the PCI Express (PCIe) bus. Embedded or host-based intelligent caching software is used to place “hot data” in the flash memory, where data is accessed in about 20 microseconds-140 times faster than with a Tier 1 HDD, at 2,800 microseconds-giving users data they care about far faster. Some of these cards support multiple terabytes of solid state storage, and a new class of solution now also offers both internal flash and Serial-Attached SCSI (SAS) interfaces to create a combination high-performance solid state and RAID HDD storage solution. A PCIe-based flash acceleration card can improve database application-level performance by 5 to10 times in either a DAS or SAN environment.



Scaling the virtualized datacenter network

Advertisment

Among common bottlenecks in virtualized datacenters today is the switching control plane-a potential choke point that can limit network performance as the number of virtual machines grows. Control plane workloads increase in four, sometimes related, ways:

- Server virtualization adds considerable control overhead, especially when migrating virtual machines (VMs);

- More and larger server clusters, such as for analyzing big data, substantially increase the traffic flow for inter-node communications;

Advertisment

- The explosion in CPU cores-driven by the need to avert bottlenecks in server processing power-increases both the number of VMs per server and the size of server clusters; and

- Datacenter networks flatten as they grow to help accommodate these changes, and maintain latency and throughput performance in the face of relentless growth.

These changes are severely stressing the control plane. During a VM migration, for example, rapid changes in connections, address resolution protocol (ARP) messages and routing tables can overwhelm existing control plane solutions, especially in large-scale virtualized environments. As a result, large-scale VM data migration is often impractical because of the overhead involved.

To enable large-scale VM migration, the control plane needs to scale either up or out. In the traditional scale-up approach, the existing control plane solutions within networking platforms are supplemented by additional or more powerful compute engines, acceleration engines or both to help scale up control plane performance. These supplemental resources free up CPU cycles for other tasks, improving overall network performance.





In emerging scale-out architectures, the control plane is separated from the data plane, and then typically executed on standard servers. In some cases, control plane tasks are divided into sub-tasks, such as discovery, dissemination and recovery, which are then distributed across these servers. Emerging architectures such as SDN (Software Defined Networking) leverage scale-out approaches for greater control plane scalability. These architectures also enable IT managers to virtualize the network substrate, and better manage and secure datacenter traffic.

In both scale-up and scale-out architectures intelligent multi-core communications processors, which combine general-purpose processors with specialized hardware acceleration engines for specific functions, can produce dramatic improvements in control plane performance. Some functions, such as packet processing and traffic management, often can be offloaded entirely to line cards equipped with such purpose-built communications processors.



Near-term advances that promise to improve both server I/O and network performance

In many organizations today, milliseconds matter, driving strong demand for shorter response times. For some, like trading firms, latency can be measured in millions of dollars per millisecond. For others, such as online retailers, every millisecond of delay caused by latency can compromise customer satisfaction and competitiveness, and ultimately directly affect revenue.

As more digital information is driven throughout the datacenter, fast solid state storage will be increasingly deployed for storage server caching, and for solid state drives (SSDs) in tiered DAS and SAN configurations. The growth of SSD capacity and shipment volumes continues, reducing the cost per gigabyte through economies of scale, while smart flash storage processors with sophisticated garbage collection, wear-leveling and enhanced error correction algorithms continue to improve SSD endurance.

Increasing use of 10 Gbps and 40 Gbps Ethernet, and broad deployment of 12 Gbps SAS technology, will also contribute to higher data rates. Besides doubling the throughput of existing 6 Gbps SAS technology, 12 Gbps SAS will leverage performance improvements in PCIe 3.0 to achieve more than 1 million IOPS.

As datacenter networks continue to flatten, new forms of acceleration and programmability in both the control and data planes will be needed. Greater use of hardware acceleration for both packet processing and traffic management will deliver deterministic performance under varying traffic loads in these flat, scaled-up or scaled-out networks.



More bottlenecks to come

As servers migrate to 10 Gbps Ethernet, the rack will become its own bottleneck. To help clear this bottleneck, solid state storage will shuttle data among servers at high speed; purpose-built PCIe cards will enable fast inter-server communications; and all components within a rack will likely be restructured to optimize performance and cost. As datacenters begin to resemble private clouds and increasingly leverage public cloud services in a multi-tenant, hybrid arrangement, the switching services plane will need to more intelligently classify and manage traffic to improve application-level performance and enhance security. With the increasing use of encrypted and tunneled traffic, these and other CPU-intensive packet processing tasks will need to be offloaded to function-specific acceleration engines to enable a fully distributed intelligent fabric.

High-speed communications processors, acceleration engines, solid state storage and other technologies that increase performance and reduce latency in datacenter networks will take on increasing importance as networks and datacenters continue to struggle with massive data growth, and as IT managers race to increase data speed within their architectures just to keep up with relentless demand for faster access to digital information.

Advertisment