Deduplication Methods – Client-side Deduplication
With Backup Exec 2012 SP2, exciting possibilities for remote office protection are available. The concept of client-side deduplication – where the remote system is responsible for deduplication calculations and where backup data is sent over the network in its deduplicated form – can make the process of protecting remote offices a much more streamlined experience. Remote offices can be challenging to protect effectively; WAN environments may only utilize a fraction of the bandwidth available to a LAN backup. Backups over the WAN can be a challenge to set up, as well as to complete. Some environments include backup servers that are not as powerful as the application servers they are protecting – often, the SQL server or the Exchange server in the environment is the most powerful machine available in terms of processor speed or disk throughput. Where appropriate, why not leverage some of this remote computing power to achieve faster backups? Both of these situations are problems where client-side deduplication can offer a comprehensive solution to the data protection challenges brought on by the environment.
Generally, remote office backup strategies have two basic architectures. First, there are remote offices which do not have local storage, and where backup data is sent directly over the LAN or WAN to the central data center for storage. Second, there are remote offices that employ local storage and then “forward” that locally stored backup data to the central data center for protection. Both of these configurations can use the Backup Exec 2012 SP2 Deduplication Option to streamline and improve backup and recovery for remote offices.
Client-side deduplication is the act of skipping redundant data blocks at the backup source before transmitting the backup stream to the Backup Exec server. Data from the source system is refined into smaller deduplication blocks, and only the unique blocks (that is, the data the Backup Exec server doesn’t yet contain) are sent to the Backup Exec server’s deduplication disk storage device.
A deduplication disk storage device is special type of disk storage configured by Backup Exec where all deduplication data blocks are stored. With the client-side deduplication method, the majority of the processing necessary for deduplication is done on the remote system rather than as the data arrives at the Backup Exec server. Client-side deduplication is the default deduplication method Symantec recommends for several reasons:
Client-side deduplication enables greater scalability by spreading processor usage out across all clients running backups, enabling the Backup Exec server to process more concurrent backups.
Reduced Network Data Transfers
Client-side deduplication minimizes network data transfers as only unique data blocks – not yet stored by the Backup Exec server – are transferred. Most environments – either LAN or WAN environments – can benefit from less data being sent across the network.
Each Backup Exec Agent for Windows and Agent for Linux has the built-in capability to perform client-side deduplication calculations. Note that all deduplication operations require the Deduplication Option to be licensed on the Backup Exec server.
|Backup Exec 2012 SP2 Agent||Client Deduplication Support|
|Agent for Windows||Yes|
|Agent for Linux||Yes|
|Agent for Mac||No|
|Agent for Applications and Databases||Yes|
|Agent for VMware and Hyper-V (VMware)||No*|
|Agent for VMware and Hyper-V (Hyper-V)||Yes**|
|*While it is possible to utilize client-side deduplication when protecting VMware virtual machines, this configuration requires that backups be processed by locally installed agents within the virtual machines themselves (the Agent for Windows or the Agent for Linux). This configuration bypasses the optimized, image-level backup capabilities of the Agent for VMware and Hyper-V in VMware environments. For these reasons, using client-side deduplication in VMware environments is generally not recommended. Backup Exec server-side deduplication is usually optimal.|
|**Client-side deduplication can be used when protecting Hyper-V environments using the Agent for VMware and Hyper-V. In this configuration, optimized, image-level backups of virtual machines are captured and deduplicated through the Backup Exec Agent for Windows installed locally to the Hyper-V host. It is not necessary to install an individual agent into each Hyper-V virtual machine in order to realize client-side deduplication in Hyper-V environments.|
Modern Data Management and Protection Challenges
Customers of all types and sizes are seeking new and innovative ways to overcome challenges associated with data growth and storage management. While these challenges are not necessarily new, they continue to become more complex and more difficult to overcome due to the following:
- Pace of data growth has accelerated
- Location of data has become more dispersed
- Linkages between data sets have become more complex
Data and storage management challenges are compounded by the need for companies to protect critical data assets against disaster through backup and recovery solutions. In order to maintain backups of critical data assets, additional secondary storage resources are required. This additional layer of backup storage must be implemented wherever backups occur, including central data centers and remote offices.
Storage Efficiencies through Data Deduplication
Backup Exec 2012 includes advanced data deduplication technology that allows companies to dramatically reduce the amount of storage required for backups, and to more efficiently centralize backup data from multiple sites for assured disaster recovery. These data deduplication capabilities are available in the Backup Exec 2012 Deduplication Option.
Backup Exec 2012 Data Deduplication Technology
The data deduplication technology within Backup Exec 2012 breaks down streams of backup data into “blocks.” Each data block is identified as either unique or non-unique, and a tracking database is used to ensure that only a single copy of a data block is saved to storage by that Backup Exec server. For subsequent backups, the tracking database identifies which blocks have been protected and only stores the blocks that are new or unique. For example, if five different client systems are sending backup data to a Backup Exec server and a data block is found in backup streams from all five of those client systems, only a single copy of the data block is actually stored by the Backup Exec server. This process of reducing redundant data blocks that are saved to backup storage leads to significant reduction in storage space needed for backups.
Figure 1: Deduplication Process
The deduplication technology within Backup Exec is applied across all backups managed by a deduplication-enabled Backup Exec server.
Deduplication Methods within Backup Exec 2012
The Backup Exec 2012 Deduplication Option gives administrators the flexibility to choose when and where deduplication calculations take place. Three deduplication methods are supported by Backup Exec 2012. These are as follows:
The client-side deduplication method is a software-driven process. Deduplication takes place at the source or protected client, and backup data is sent over the network in deduplicated form to the Backup Exec server. Only unique blocks of backup data are sent to the backup server and saved to backup storage; non-unique blocks are skipped.
Backup Exec Server-side Deduplication
The Backup Exec server-side deduplication method is also a software-driven process. Deduplication takes place after backup data has arrived at the Backup Exec server and just before data is stored to disk (also known as inline deduplication). Only unique blocks of backup data are stored; non-unique blocks are skipped.
The appliance deduplication method is a hardware-driven process. Deduplication takes place on the deduplication appliance (can be in-line or post-process deduplication, for example, ExaGrid or Quantum). 3rd-party deduplication devices handle all aspects of deduplication.
Administrators can mix and match deduplication methods to fit their unique needs. For example, a single Backup Exec server enabled for deduplication can simultaneously use client-side deduplication for some jobs, Backup Exec server-side deduplication for others, and appliance deduplication for yet another set of jobs.
Figure 2: Deduplication Methods
The different deduplication methods supported by Backup Exec 2012 have various configurations for which they are best suited. The benefits of each method, as well as the configurations for which each method is best suited, will be detailed in the following weeks.
Very loosely, we were instructed to delete everything pre dot com bubble bursting (2000), keep everything post and now we are fast running out of data centre disk allocation space, err?
In fact it’s wonder we manage to do anything given the amount of information we need to process. As a consequence we are now facing a greater threat – too much information. There are somewhere between 60 to 160 Billion mails sent around the world every single day. These emails include attachments such as reports, presentations, letters and pictures. In spite of the limitations such as privacy and too much unwanted mail, email is the best way to communicate efficiently, quickly and cheaply. The danger with email, as with any other way of sharing information, is that too much information simply clogs the system up and become a bottleneck to productivity.
Here are some useful top tips that may help:
- Understand the new business user – organisations must better understand the challenges employees are facing when navigating the world of information management. Look at when and how employees are accessing their information, make sure that data is indexed and categorised, and that intelligent archiving and search tools are available
- Prepare the infrastructure – with the relentless flow of information only set to continue, IT infrastructure must be able to cost effectively manage the increasing requirements for storage by implementing solutions able to dedupe and archive appropriately, automate processes and monitor and report on system status across all different devices and environments
- Prepare people – create IT policies that educate employees on how to manage their information – from email practices like limiting the ‘CC’ and ‘reply to all culture’, to saving only the latest document version and overcoming the fear of the delete button. Help employees understand the company’s information retention strategy so they know what information is recoverable. This will empower them to take charge of information control and maintain productivity and efficiency
- Keep security front of mind – it seems like an obvious statement, but reinforcing company security policies around mobile devices could protect against significant and damaging data loss. Make sure employees know the company processes and take advantage of technologies that enable the IT department to see where the most important information is, at all times
- Encourage staff to switch off – with the information era in full swing and with more and more opportunity for employees to stay connected at all times, it’s important that organisations support staff welfare and encourage them to switch off every once in a while
Seriously consider optimising your storage to reduce overall front end storage usage. Improving capacity can be done through integrated archiving and deduplication as well as tiering your storage. Archiving moves old data to a separate store so you don’t have to backup the same data day-in, day-out – forever. Deduplication only backs up data (at a block level) once, using a pointer to the unique data. So you can both reduce the amount you backup as well as dramatically reducing your backup window with archiving and data deduplication.
But, I hear you say, if I implement deduplication technology what are the benefits? Well, Backup Exec can help with that too. Read all about the Backup Exec Deduplication Assessment Tool in Part III.
Keep your business up and running - by discovering backup and storage management inefficiencies you can cut costs, while making sure that your data is fully protected. Highly beneficial at a time when budgets are under strain.
It is really useful to go through the process of trying to find out:
- How well your data is protected
- If you are missing backing up critical data
- How prepared you are for increasing data volumes
- Whether your strategy supports business growth or lowers performance
- If you are taking advantage of the most cost-effective solutions available
It never ceases to amaze me how well we don’t know ourselves. To quote Polonius, “unto thine own self be true”; the more honest you are with yourself the more accurate and the more useful the results will be. We know our business, don’t we? We know that we are doing the best we can, aren’t we? It’s not like someone is trying to catch you out – give it a go, there are some pretty simple questions you can ask yourself just to get going, simply because the world has moved on, the drivers for improving backup and recovery operations are ever stricter:
- How can you keep business-critical applications running, delivering improved ROI, while complying with regulations
- How can you justify spending in times of budgetary constraint by demonstrating the quality and effectiveness of your systems
- What is the best way to convince business users of the importance of investing in backup solutions, before data is lost, while also establishing what should be backed up – and why
- How confident are you that you can cover all your IT service requirements? If you are not very confident – how not confident are you, 25%, 50%, or do you stick to all your business service agreements?
- What level of backup reporting do you have that allows you to justify future IT investment to optimise your recovery time objectives? Is there a requirement for reporting metrics, occasionally, or more regularly?
- How confident are you that your main business managers understand the importance of backup? Most of us take backup for granted but how confident are you that your backup policy covers all areas of the business?
Are you confident that you have the right backup and recovery systems in place and are getting the most out of them? A backup and recovery, or storage, assessment will highlight areas of weakness but also help to identify where Backup Exec can improve efficiencies and save you money.