Data Problems in the Age of Virtualization
01/04/2019

Data is a very important and expensive source in the data center. Applications in the IT infrastructure demand accelerated virtualization, and at the same time increasing the amount of critical data. So the infrastructure by stacking the server cannot run.

Where data centers generally require many different products to provide solutions for enterprise applications: performance, data protection, data efficiency, and in centralized global management.

Why are so many different people?

Enterprise applications need this important capability. When the data center is expanded and the volume of data increases, this advanced technology is the only option available to meet application needs. At first, it might only need one deduplication device for data backup or device optimization for WAN only. But over time, the number of servers began to increase. Many were initially not designed for virtualization applications and workloads, most of which were not designed to work with other products.

Generally, each of these products is purchased from different vendors, each requiring separate training, each managed from a separate management screen, each requiring separate support and maintenance contracts, and each purchased in a different refresher cycle .

The efficiency of data, including deduplication, compression, and writing optimization, is a specific example of what causes the proliferation of different devices and technologies in data centers. When introduced in the market in the mid-2000s, deduplication was specifically designed for data backup. In this case, optimizing capacity is very important, given the enormous data redundancy and increasing volume of data to be reserved and maintained. Deduplication then spreads to separate phases of the data cycle because IT organizations can see the benefits:

  • Enhanced data mobility - Server virtualization is a basic principle of VM mobility, but managing data with inflexible data structures such as LUN can significantly hamper mobility in traditional infrastructure environments. When data is duplicated, it is easier to move VMs from one data center to another.
  • Performance enhancements - When data is duplicated, less data needs to be saved to disk or read from. This strengthens the application's environmental conditions such as virtual desktop infrastructure (VDI), where a lot of booting can produce several GB to be saved to disk.
  • Efficient storage usage - Capacity needed can be reduced by 2-3X in the main uses based on the effective use of deduplication, compression, and optimization.
  • More efficient lifetime of flash storage - The deduplication process that operates at the right point in the data stream can reduce the use of data stored on solid-state drives (SSD), which have a limited lifespan based on the amount of writing. Optimizing writing can further increase the life of the SSD by dividing the workload evenly across the entire disk.
  • Drastic bandwidth reduction for replication between sites - Twenty years ago, IT organizations were dedicated to one main data center, but nowadays, almost all IT managers hold many sites. Efficient data transfer between sites is a fundamental requirement of infrastructure for many sites. Duplicating data before it is sent to the site makes the transfer itself more efficient and saves more bandwidth resources.

Regardless of the development of deduplication over the past decade, as well as the magnitude of the benefits of the capacity and performance it produces, old technology still cannot fulfill the promise of data efficiency provided for all life cycles. As with the initial use of deduplication on backup devices, data efficiency is only carried out at certain product or device points at the individual stage in the life cycle of a data. Some products only apply deduplication to a portion of the data, so it only provides limited benefits when viewed from overall efficiency. Other products only apply compression technology and are wrong in using the term "deduplication." In the main storage system, the latency that might be generated deduplication can provide data efficiency that occurs in "initial-process," which greatly limits other operations such as replication and backup. Most implementations of this responsibility are the result of adding deduplication to the architecture that is already owned, rather than developing it as the foundation for the overall architecture.

The results of the responsibility given by the existing technology have various levels of benefits, but the existing problems cannot be addressed; so that ultimately it cannot provide the optimal mobile data infrastructure. The IT team will spend higher acquisition costs and are even more complicated because they are only doing patchwork in finding incomplete data efficiency solutions in the midst of other infrastructure burdens.

Many IT organizations have invested in nine or more separate products, each of which is designed to provide several levels of efficiency (deduplication, compression, and / or optimization) given in several stages that are more specific in the data life cycle. These stages include:

  1. Flash cache on the server
  2. DRAM and / or Flash cache / tier in array storage
  3. Disk in array storage
  4. All Flash arrays
  5. Backup tool in the main data center
  6. Archive or secondary storage array
  7. DR storage array
  8. WAN optimization tool
  9. Cloud gateway tool

What is needed is the use of comprehensive data efficiency throughout the data lifecycle. Duplicating, compressing, and solving data problems before they become a problem provides performance and optimizes capacity before redundancies occur and resources are spent on all infrastructure.