Today, data centers are facing an indomitable challenge of
provisioning colossal capacity of storage space at an affordable price yet
meeting ever-increasing performance demands. The biggest threats that data
centers and the storage industry are facing today are power consumption, housing
space and environmental concerns. However, many administrators and planners fail
to recognize the fact that the value of data to an organization decreases over
time, as it loses its relevance, freshness, and “popularity”. One question that
administrators should be asking themselves is: why should data that is
decreasing in value remain in expensive front line storage, subject to the same
backup, replication, and recovery policies and procedures as key data? Would it
not be useful to have a system or methodology in place for analyzing and
tracking data freshness, so that storage space could be made free for more fresh
and relevant data, and time/ bandwidth consuming data protection policies be
relaxed as data loses its value?
Direct Hit! |
Applies To: Database managers |
The noteworthy leap that the storage industry is forced to
take in this regard is Storage Tiering. Here, the capacity to be provisioned is
divided into separate pools of storage space with various cost/ performance
attributes. At the top resides the Tier 1 pool, which is the most expensive but
high performing nonetheless. The bottom tier is occupied by more cost-effective
storage arrays. The next challenge is to devise a sophisticated software layer
that intelligently places data into the different tiers according to their
value. This concept is variously known as data classification or Information
Lifecycle Management (ILM).
What is ILM?
ILM is a concept that encompasses the discovery, classification, analysis, and
maintenance of data, across the entire period of its useful life. It adds
structure and context to data, marking the transition from data to information.
ILM is a part of the larger concept of Business Continuity Planning, but has
become increasingly prominent in the storage arena in recent years thanks to
several factors, including advancements in data storage management techniques
and the technology that underpins it, and evolution in the storage environment,
including:
-
Coexistence of Fibre Channel and iSCSI (IP-Storage) in
the data center -
SAS and SATA storage coexisting in storage systems.
Storage consolidation practices, for reducing the use of solitary “islands of
data” in direct attached storage (DAS) -
Regulatory requirements for data archiving and recall
(SOX, etc.)
Though many vendors offer ILM services or modules as part
of their products, ILM is above all a concept or a strategy, rather than a
product. However, for a practical explanation of what the concept embodies, we
can safely generalize that many implementations of ILM encompass such components
as:
-
Database Management
-
Storage System Performance and Monitoring
-
Storage Capacity Planning and Management
-
Business Controls for Data Degradation and EOL
How is this done?
In a tiered storage system, storage is not merely seen as a container of
data. Another important dimension of intelligence is appended to every block,
transitioning blocks of data into blocks of information.
Data + Intelligence = Information
This intelligence associated with every block of data,
forms very vital metadata, which automatically tracks the access patterns to
these blocks. Therefore, data is first classified, then moved at the block-level
from tier to tier, based on frequency of access. At the peak of its popularity,
data is stored in the fastest, most responsive top-tier storage on hand and
subject to the most stringent replication and backup controls. Since the ILM
system is constantly monitoring the data's value in comparison to other data, as
it loses value, it is migrated down the chain to less expensive, less powerful
storage, where it may not be accessed as frequently, or protected as carefully.
In the final stage, it is migrated out of the storage system completely. Data of
the lowest value is either purged from the system or transferred to other media
(eg, written to tape and delivered to offsite storage) depending on the
organization's policy and regulatory requirements for data end-of-life.
Why ILM?
Having examined how an ILM system can be implemented, we should next look
more closely at the reasons why more and more organizations are accepting the
need for a comprehensive ILM strategy.
Exponential growth of data
With data growth averaging near 80% to 100% every year, managing storage
effectively has become a challenging task. Storage administrators face limited
budgets, and are charged with not only expanding capacity by purchasing new
hardware wisely to meet projected storage needs, but also optimize the use of
existing capacity, in order to maximize the investment in current storage
hardware. Moreover, any changes or additions need to be considered carefully, as
the downstream effects of new hardware are often unforeseen, and can quickly
wipe out any short term cost gains.
Data accessibility/freshness
As mentioned at the beginning of this article, data does not have a constant
value; rather that value is changing, whether it is due to time, relevance,
security, or popularity. Policies and procedures must therefore be set in place
to continuously shift and monitor the location (and therefore the accessibility)
of data so that information that is highest in demand is in the most accessible
location.
|
Carbon dioxide emissions of traditional storage servers versus tiered storage servers |
Cost (TOC) issues
The overall cost of a storage system is measured not just in the initial
price paid for the hardware and its commissioning. The total operating cost
(TOC) includes maintenance, power and cooling expenses, together with the cost
to staff and train administrators. As storage arrays grow, power usage (for
server operation and cooling) is just one factor that has an enormous impact on
the TOC of a storage solution. If less expensive solutions are available,
administrators should by all means devise a careful plan to incorporate these
components, with some restrictions. When possible, additional storage technology
should be adopted that does not require significant investment of time and
resources to learn its operation. New solutions that are more power or space
efficient should be integrated into the array.
Ability to protect and recover lost data
Because key data has to be protected against loss to ensure business
continuity, the term Continuous Data Protection has come into being. It
describes a scheme of ensuring data survival in the face of disasters such as
power/network outages and natural catastrophes, and incorporates techniques such
as backups, data snapshots and remote replication to do so. To add to the
challenges surrounding data protection, regulatory requirements for the
preservation and archiving of several types of corporate data continue to mount.
Data of a particularly sensitive or critical nature must be
available for recall within clearly established time limits if circumstances
demand it, and kept secure as well. Therefore a successful ILM implementation
integrates well with the backup solution and recovery solution of an
organization along several touch points. ILM dictates that as items age they can
be taken offline completely and migrated to tape storage, for example, yet some
data still must be available for recall, even at this point. Since only a
percentage of data has to be protected in the same manner, the ILM solution must
be flexible enough to manage varying CDP requirements.
Green data centers
As mentioned earlier, one of the primary challenges facing data centers
today is the amount of power consumption. Thus, while the initial cost of
acquisition of the storage might have been low, the higher cost of power
consumption and cooling means that the TCO is very high. In addition to the
tangible financial burdens this adds, the other, often intangible, concern in
such a data center is its environmental impact. Today, global warming and
pollution are major hazards that cannot be ignored. There are both regulatory as
well as financial incentives to reducing carbon dioxide emissions, which often
result in a direct cost saving due to increased carbon credits.
Conclusion
Storage Tiering in enterprise-class storage is becoming a highly desirable
feature today. It is only a matter of time before the cost, environmental and
performance benchmarks of a tiered system become critical parameters on which
decisions of storage system procurement will be based.
Tiered storage servers implementing ILM offer a greater
cost advantage and performance. It is important to realize that with storage
servers with Tiered Storage and ILM enable data centers to reduce footprint,
electricity costs, and CO2 emissions, for the creation of a greener and more
eco-friendly data center.