Traditional Data Efficiency Technology
01/04/2019

In today's data center, the main concern is IOPS, not disk capacity.

With hard drive-based storage equipment, storage performance is very difficult to achieve. This is the reason: When a user decides to save a document, the mechanical component on the hard drive where the document is stored will start acting. The head for writing is responsible for storing the data, positioning itself just as thin as a strand of hair above the rotating magnetic plate between 7,200 and 15,000 revolutions per minute. After being in position, the head section starts writing the file and then saves it. However, there may not be enough space on the plate to write all the files at once; so that head section will store a portion of the file throughout the disk and then save a comprehensive index so that it knows where each snippet of file is stored.

The more time it takes to record files, the longer the user has to wait for the computer to save the file. The time between the command to write a file and the certainty that the file has been recorded is called latency. Latency is one of the most serious problems in the current data center, and complex problems are influenced by the types of new applications that enter the business application market. Many new applications introduce performance requirements on storage systems that actually increase latency. Finally, latency is so bad that it harms ongoing business operations.

To overcome this growing problem, new types of storage are on the market. Called flash storage or solid-state storage, this type of storage does not experience latency difficulties that interfere with traditional hard drives. However, if modern hard drives can each store 6 terabytes (6 trillion bytes) or more per disk, solid-state disks can store only a fraction of that amount (1 TB or 1 trillion bytes, for example). Moreover, solid-state disks remain relatively expensive compared to hard drives intended for capacity.

The data efficiency technologies developed for the devices described in the previous section have historically been designed to address capacity problems. Since the HDD has added costs, a number of devices have been developed for the purpose of saving as much capacity as possible on the HDD at the expense of CPU resources. But as we have just discussed, the real challenge is IOPS.

Just as the choice of disk technology has its drawbacks, the use of different data efficiency technologies also has drawbacks.

 

Compression
Compression is the process of reducing the size of a given data element. Not all data can be compressed - for example, most video or audio files cannot - while text can be compressed very well. The challenge is that there is no way to know exactly how well the data will be compressed without doing compression first.

 

Inline Compression

Inline compression occurs before data is entered to disk. Although it requires less IOPS and less capacity, it increases latency and consumes excessive CPU.

  • Resource intensive, requires a large amount of CPU
  • The time consuming process increases latency
  • Benefits: Reduced HDD capacity used

 

Post-process compression

Post-processing delays compression to reduce the need for storage capacity significantly. The system writes data to disk, then re-reads the data from the disk and tries to use it. This significantly increases IOPS and CPU consumption.

  • Additional IOPS disks are needed after starting performance to read and then potentially having to rewrite data
  • CPU is required after initial recording to read data
  • CPU is needed to process data
  • Many IOPS are needed to record data entered to disk
  • Advantages: Saving disk storage capacity

 

Deduplication

Deduplication is a special data efficiency technique used to increase storage utilization. This technique identifies and stores only unique pieces of data, or byte patterns, eliminating duplicate data copies. Most deduplication systems are only one phase of the data lifecycle and usually need to delete them from the duplicated state to move them to another phase.

 

Inline Deduplication

Like inline compression, inline deduplication occurs before data is recorded. The aim is to remove redundancies and the capacity needed to store excessive data.

  • Provides performance warnings on all I / O because it requires significant CPU and memory resources
  • Storage system: Data is first read by the application to the server, then transferred through a storage area network to all places
  • Backup system: Data is first read to the server and storage before being backed up and duplicated on a backup device
  • Benefits: Reduced disk capacity used

 

Post-process deduplication

In post-process deduplication, data is first recorded to disk and then duplicated at a later time. Again, the goal is to reduce disk capacity requirements, but it will require large CPU and IOPS costs.

  • Requires sufficient space in the beginning to enter data before deduplication
  • Requires additional IOPS to record data, then read data for deduplication, then record again in duplicate form
  • Requires additional SAN bandwidth to transfer blocks around the cable before deduplication occurs
  • Benefits: Reducing disk capacity used will eventually be recognized

 

Each data efficiency technology described in the previous section has some of the same fundamental weaknesses:

  • They need things to sacrifice - whether it's sacrificing the CPU at the start or IOPS at the end, expensive resources are wasted, maybe at the expense of application performance.
  • They have hidden costs - In a hyperconverged environment, more CPU resources may be needed to support application performance, which could mean more hypervisors and database license fees.
  • They are designed for capacity - This technology is designed to solve capacity problems, not IOPS problems, therefore performance is often sacrificed.
  • They are designed for each stage in the data - Every technology currently applied on one device at a time that has been adjusted. Every time the data is moved to the next stage in its life cycle, it needs to be processed continuously.

 

All of this points to a clear solution to data problems: To have a truly efficient process, data must be reduced, compressed, and optimized at the start of processing, and maintained in that state through the entire life cycle of that data. When data is deleted the duplication process will be carried out from the beginning at each point, he

has a significant resource-producing factor in the downstream, and opens up the advanced functions needed in today's virtualization world.