The challenges in data replication over WAN are varied and different compared
to LAN replication and hence need appropriate data reduction and bandwidth
optimization techniques
As the saying goes, “A chain is as strong as its weakest link”; hence a
network is only as fast as its slowest link. Any enterprise looking to leverage
the best performance for its applications by an effective network optimization
strategy must look at the communication media that it uses. However, for any
application that is expected to work over the Internet or other Wide Area
Networks must resign themselves to the fact that the medium and hence the
maximum bandwidth that they can achieve is something outside their control and
scope. What they could do is to look at strategies at how efficiently they could
use the bandwidth available to them and this holds true for any effective data
replication strategy over the WAN as well.
Business must continue under any and all circumstances and any down time is
likely to cost an enterprise dearly. The modern emphasis on Disaster Recovery
(DR) strategies such as data replication is grown out of this need felt by all
enterprise owners. Often, such DR strategies, sacrifice loss of some minimal
data, for the ability to recover from even major site disasters by ensuring that
the DR site and the primary data centers are spread across geographical
boundaries, often across continents. This means that all such data replication
must happen across the WAN, unless prohibitive dedicated line costs are not a
problem for the enterprise.
Data Reduction techniques
Often, Data Reduction techniques are used to minimize the amount of traffic
transferred over the WAN. The higher the data reduction ratios achieved, the
faster the data can be transferred across the network. However, common data
reduction techniques such as data deduplication and data compression are CPU
intensive operations that could impact the performance of the applications
hosted on the primary storage server. Also, employing elaborate data
deduplication techniques for actively used online data would defeat the purpose
of data reduction as the duplicate references could be modified soon by the
application and hence might need retransmission. Thus, data reduction strategies
could be used for WAN-based asynchronous replication strategies that often use a
non-changing consistent image such as a snapshot. Snapshots in a storage system
are, by themselves, incremental delta images that lend themselves to
de-duplication efforts. In addition to this, a simple, but-effective
deduplication mechanism would leverage the fact that applications often perform
over-writes of almost the same data and hence naturally de-duplicating the delta
blocks would be the most natural way of deduplication. More elaborate
deduplication strategies that perform across-the-board deduplication could be
performed based on the amount of deduplication ratios expected.
The impact of round trip latencies and packet losses in WAN on the effective bandwidth obtained in data replication. |
Apart from data deduplication, data compression techniques where data is
encoded using fewer information-carrying units thereby reducing the amount of
data transferred. The remote servers will then decode these compressed data to
generate the original data. There are several standard data compression
algorithms available today that can reduce the data transmission overloads. The
amount of compression can be configured by setting different compression levels.
The higher the compression level, the more does the algorithm attempt to reduce
the data. While this might result in lesser data being transmitted, it might not
always be the best approach, as higher levels of compression will consume more
processor cycles and hence would impact application performances significantly.
Alternately, data replication WAN services in storage servers use an adaptive
compression mode, where the depth of compression is determined by an analysis of
the current and the statistical load on the system. Thus, by employing the right
data reduction strategies data replication solutions will limit the amount of
data transferred over the WAN to minimum levels.
Bandwidth Optimization
Whereas, data reduction strategies help in reducing the amount of data
transferred over the WAN, the link latencies and reliability make a significant
dent in the throughputs that are achieved. Data transferred as electrical
signals at the speed of light can have noticeable lag when transferring over
geographically distant locations, unlike the smaller distances of LAN. These
round-trip delays range from a couple of milliseconds for inter-city connections
to around 80-100 ms from coast-to-coast, and as much as 250-300 ms for submarine
transmissions across the globe. When geostationary satellites are used,
naturally the distances covered are much greater, resulting in delays of about
700ms.
While, there is not much that can be done to these latencies, a number of
bandwidth optimization techniques can be used to ensure the effective
utilization of this bandwidth. Typical data replication solutions over WAN use
connection-oriented TCP/IP protocols that alleviates the application from the
headaches of reliability, flow control, congestion control etc. While, these
work well for LAN networks, the choice of transport protocol makes a significant
impact on the utilization of bandwidth. Due to the very nature of the TCP
protocol and its dependence on round-trip acknowledgments and sliding windows,
the round trip time (RTT) it incurs plays a very dominant role.
A second major factor that adds additional challenges is the issue of packet
loss. At such significant transport distances, packets can be dropped due to
congestion or bit errors. While recovering from these hiccups, the TCP protocol
gets into a 'slow start' mode, where it carries out more conservative corrective
actions, resulting in even more restricted performance. In essence, the
throughput achieved in long distance replication depends on two basic
parameters: the link bandwidth and the transport delays and losses.
Various remedies exist to counteract the irregularity of TCP over long
distances; specific tunings or accelerated protocols can sometimes be
implemented to help alleviate this problem. Another fairly common, though
expensive, solution is to place pairs of special dedicated appliances along the
transport path to boost or improve its throughput. As pointed out earlier,
although the performance of the TCP stack can be tweaked, it is a fairly
accurate generalization that TCP is more suitable for the LAN environment than
it is in long-haul networks.
In iSCSI storage servers it is much more desirable to have the TCP stack
optimized for the LAN environment. This method of optimization is preferable,
since the server is used to serve I/Os over the iSCSI interconnect to the
storage network (SAN), which is LAN-like in behavior. For replication over long
distances, the data replication can be done using an intelligent combination of
several standard IP transport protocols that mix connection-oriented,
acknowledgement-based traffic for certain control packets with connectionless
protocols that use the bandwidths much more efficiently. The lost packets can be
then determined by this custom protocol and requested for retransmission through
either connectionless or connection-orientedw modes as desired. Thus by taking
control of bandwidth optimization and not leaving it to generic transport
protocols storage servers can ensure that the data replication be performed at
near-line bandwidth rates. The bandwidth allocated for data replication can also
be configured so that it does not flood the network and starve other
applications of the bandwidth.