Trends Watch

WAN Optimization and Management For Data Replication

PCQ Bureau

04 Mar 2010 07:48 IST

New Update

The challenges in data replication over WAN are varied and different compared

to LAN replication and hence need appropriate data reduction and bandwidth

optimization techniques

Advertisment

As the saying goes, “A chain is as strong as its weakest link”; hence a

network is only as fast as its slowest link. Any enterprise looking to leverage

the best performance for its applications by an effective network optimization

strategy must look at the communication media that it uses. However, for any

application that is expected to work over the Internet or other Wide Area

Networks must resign themselves to the fact that the medium and hence the

maximum bandwidth that they can achieve is something outside their control and

scope. What they could do is to look at strategies at how efficiently they could

use the bandwidth available to them and this holds true for any effective data

replication strategy over the WAN as well.

Business must continue under any and all circumstances and any down time is

likely to cost an enterprise dearly. The modern emphasis on Disaster Recovery

(DR) strategies such as data replication is grown out of this need felt by all

enterprise owners. Often, such DR strategies, sacrifice loss of some minimal

data, for the ability to recover from even major site disasters by ensuring that

the DR site and the primary data centers are spread across geographical

boundaries, often across continents. This means that all such data replication

must happen across the WAN, unless prohibitive dedicated line costs are not a

problem for the enterprise.

Data Reduction techniques

Often, Data Reduction techniques are used to minimize the amount of traffic

transferred over the WAN. The higher the data reduction ratios achieved, the

faster the data can be transferred across the network. However, common data

reduction techniques such as data deduplication and data compression are CPU

intensive operations that could impact the performance of the applications

hosted on the primary storage server. Also, employing elaborate data

deduplication techniques for actively used online data would defeat the purpose

of data reduction as the duplicate references could be modified soon by the

application and hence might need retransmission. Thus, data reduction strategies

could be used for WAN-based asynchronous replication strategies that often use a

non-changing consistent image such as a snapshot. Snapshots in a storage system

are, by themselves, incremental delta images that lend themselves to

de-duplication efforts. In addition to this, a simple, but-effective

deduplication mechanism would leverage the fact that applications often perform

over-writes of almost the same data and hence naturally de-duplicating the delta

blocks would be the most natural way of deduplication. More elaborate

deduplication strategies that perform across-the-board deduplication could be

performed based on the amount of deduplication ratios expected.

Advertisment

The impact of round trip latencies and packet losses in WAN

on the effective bandwidth obtained in data replication.

Apart from data deduplication, data compression techniques where data is

encoded using fewer information-carrying units thereby reducing the amount of

data transferred. The remote servers will then decode these compressed data to

generate the original data. There are several standard data compression

algorithms available today that can reduce the data transmission overloads. The

amount of compression can be configured by setting different compression levels.

The higher the compression level, the more does the algorithm attempt to reduce

the data. While this might result in lesser data being transmitted, it might not

always be the best approach, as higher levels of compression will consume more

processor cycles and hence would impact application performances significantly.

Alternately, data replication WAN services in storage servers use an adaptive

compression mode, where the depth of compression is determined by an analysis of

the current and the statistical load on the system. Thus, by employing the right

data reduction strategies data replication solutions will limit the amount of

data transferred over the WAN to minimum levels.

Bandwidth Optimization

Whereas, data reduction strategies help in reducing the amount of data

transferred over the WAN, the link latencies and reliability make a significant

dent in the throughputs that are achieved. Data transferred as electrical

signals at the speed of light can have noticeable lag when transferring over

geographically distant locations, unlike the smaller distances of LAN. These

round-trip delays range from a couple of milliseconds for inter-city connections

to around 80-100 ms from coast-to-coast, and as much as 250-300 ms for submarine

transmissions across the globe. When geostationary satellites are used,

naturally the distances covered are much greater, resulting in delays of about

700ms.

Advertisment

While, there is not much that can be done to these latencies, a number of

bandwidth optimization techniques can be used to ensure the effective

utilization of this bandwidth. Typical data replication solutions over WAN use

connection-oriented TCP/IP protocols that alleviates the application from the

headaches of reliability, flow control, congestion control etc. While, these

work well for LAN networks, the choice of transport protocol makes a significant

impact on the utilization of bandwidth. Due to the very nature of the TCP

protocol and its dependence on round-trip acknowledgments and sliding windows,

the round trip time (RTT) it incurs plays a very dominant role.

A second major factor that adds additional challenges is the issue of packet

loss. At such significant transport distances, packets can be dropped due to

congestion or bit errors. While recovering from these hiccups, the TCP protocol

gets into a 'slow start' mode, where it carries out more conservative corrective

actions, resulting in even more restricted performance. In essence, the

throughput achieved in long distance replication depends on two basic

parameters: the link bandwidth and the transport delays and losses.

Various remedies exist to counteract the irregularity of TCP over long

distances; specific tunings or accelerated protocols can sometimes be

implemented to help alleviate this problem. Another fairly common, though

expensive, solution is to place pairs of special dedicated appliances along the

transport path to boost or improve its throughput. As pointed out earlier,

although the performance of the TCP stack can be tweaked, it is a fairly

accurate generalization that TCP is more suitable for the LAN environment than

it is in long-haul networks.

In iSCSI storage servers it is much more desirable to have the TCP stack

optimized for the LAN environment. This method of optimization is preferable,

since the server is used to serve I/Os over the iSCSI interconnect to the

storage network (SAN), which is LAN-like in behavior. For replication over long

distances, the data replication can be done using an intelligent combination of

several standard IP transport protocols that mix connection-oriented,

acknowledgement-based traffic for certain control packets with connectionless

protocols that use the bandwidths much more efficiently. The lost packets can be

then determined by this custom protocol and requested for retransmission through

either connectionless or connection-orientedw modes as desired. Thus by taking

control of bandwidth optimization and not leaving it to generic transport

protocols storage servers can ensure that the data replication be performed at

near-line bandwidth rates. The bandwidth allocated for data replication can also

be configured so that it does not flood the network and starve other

applications of the bandwidth.

Advertisment