Backup Exec Server-side Deduplication
Do you have VMware ESX or vSphere servers with high average processor utilization? If so, the Backup Exec server-side deduplication method can be a useful and effective deduplication solution for these environments. This method of deduplication is performed entirely on the Backup Exec server and does not impact source systems any more than a typical backup would.
The Backup Exec server-side deduplication method performs the deduplication processes against data when it arrives at the Backup Exec server – that is, just before the data is laid down on disk. Data is transmitted in its whole, un-deduplicated form, and then decomposed into deduplication blocks in-line by the Backup Exec server. Only the unique data blocks (that is, the data that the deduplication disk storage device doesn’t yet contain) are stored.
Figure 4: Backup Exec Server-side Deduplication
The Backup Exec server-side deduplication method is optimal for situations where:
• High Processor Utilization on Remote Servers
If the remote system has no processor cycles to spare for deduplication calculations, Backup Exec server deduplication can take the load and still perform deduplication.
• VMware Environments
When using the Agent for VMware and Hyper-V to capture image-level backups of VMware virtual machines, Backup Exec server-side deduplication must be used.
Backup Exec server-side deduplication is not recommended for the following environments:
• Remote Office Protection Over a WAN
With Backup Exec server-side deduplication, the Backup Exec server receives the entire data set before deduplication takes place. This is not a WAN-friendly method of deduplication. Generally, remote office protection without local storage should use client-side deduplication.
Any Backup Exec server that has the Deduplication Option licensed can utilize the Backup Exec server-side deduplication method. Most agents and backup types supported by Backup Exec can take advantage of the space savings inherent with Backup Exec server-side deduplication.
Backup Exec 2012 SP2 Agent Backup Exec Server-side Deduplication Support
Agent for Windows Yes
Agent for Linux Yes
Agent for Mac Yes
Agent for Applications and Databases Yes
Agent for VMware and Hyper-V (VMware) Yes
Agent for VMware and Hyper-V (Hyper-V) Yes
Some Backup Exec customer environments have an existing investment in deduplication-enabled appliances for onsite backup, offsite storage (disaster recovery), and remote office protection. The appliance deduplication method is an excellent fit for these environments.
The appliance deduplication method uses Symantec’s OpenStorage (OST) technology in conjunction with both a 3rd-party deduplication appliance and a manufacturer-developed OST plug-in. Together, these components enable the following:
• Intelligent Replication Tracking
Many 3rd party deduplication appliances include a replication feature enabling data to be efficiently copied from one device to another downstream device. When backup data is transferred by a Backup Exec server to a deduplication appliance through the OST plug-in, the Backup Exec server is able to track when data is replicated to additional appliances. This allows the Backup Exec server to be able to restore data from both the original deduplication appliance or from any of the additional appliance replication destinations.
Appliance deduplication requires that the Backup Exec server be paired with one or more supported OST-based deduplication appliances. Symantec Backup Exec is committed to expanding the breadth and depth of OST partners certified to work with Backup Exec, so additional OST devices are being certified and supported as they complete Backup Exec’s internal qualification processes.
For more information on supported 3rd-party appliances compatible with the OST-based appliance deduplication technology within Backup Exec 2012 SP2, please refer to the Backup Exec 2012 SP2 Hardware Compatibility List (HCL) available online.
Deduplication Methods – Client-side Deduplication
With Backup Exec 2012 SP2, exciting possibilities for remote office protection are available. The concept of client-side deduplication – where the remote system is responsible for deduplication calculations and where backup data is sent over the network in its deduplicated form – can make the process of protecting remote offices a much more streamlined experience. Remote offices can be challenging to protect effectively; WAN environments may only utilize a fraction of the bandwidth available to a LAN backup. Backups over the WAN can be a challenge to set up, as well as to complete. Some environments include backup servers that are not as powerful as the application servers they are protecting – often, the SQL server or the Exchange server in the environment is the most powerful machine available in terms of processor speed or disk throughput. Where appropriate, why not leverage some of this remote computing power to achieve faster backups? Both of these situations are problems where client-side deduplication can offer a comprehensive solution to the data protection challenges brought on by the environment.
Generally, remote office backup strategies have two basic architectures. First, there are remote offices which do not have local storage, and where backup data is sent directly over the LAN or WAN to the central data center for storage. Second, there are remote offices that employ local storage and then “forward” that locally stored backup data to the central data center for protection. Both of these configurations can use the Backup Exec 2012 SP2 Deduplication Option to streamline and improve backup and recovery for remote offices.
Client-side deduplication is the act of skipping redundant data blocks at the backup source before transmitting the backup stream to the Backup Exec server. Data from the source system is refined into smaller deduplication blocks, and only the unique blocks (that is, the data the Backup Exec server doesn’t yet contain) are sent to the Backup Exec server’s deduplication disk storage device.
A deduplication disk storage device is special type of disk storage configured by Backup Exec where all deduplication data blocks are stored. With the client-side deduplication method, the majority of the processing necessary for deduplication is done on the remote system rather than as the data arrives at the Backup Exec server. Client-side deduplication is the default deduplication method Symantec recommends for several reasons:
Client-side deduplication enables greater scalability by spreading processor usage out across all clients running backups, enabling the Backup Exec server to process more concurrent backups.
Reduced Network Data Transfers
Client-side deduplication minimizes network data transfers as only unique data blocks – not yet stored by the Backup Exec server – are transferred. Most environments – either LAN or WAN environments – can benefit from less data being sent across the network.
Each Backup Exec Agent for Windows and Agent for Linux has the built-in capability to perform client-side deduplication calculations. Note that all deduplication operations require the Deduplication Option to be licensed on the Backup Exec server.
|Backup Exec 2012 SP2 Agent||Client Deduplication Support|
|Agent for Windows||Yes|
|Agent for Linux||Yes|
|Agent for Mac||No|
|Agent for Applications and Databases||Yes|
|Agent for VMware and Hyper-V (VMware)||No*|
|Agent for VMware and Hyper-V (Hyper-V)||Yes**|
|*While it is possible to utilize client-side deduplication when protecting VMware virtual machines, this configuration requires that backups be processed by locally installed agents within the virtual machines themselves (the Agent for Windows or the Agent for Linux). This configuration bypasses the optimized, image-level backup capabilities of the Agent for VMware and Hyper-V in VMware environments. For these reasons, using client-side deduplication in VMware environments is generally not recommended. Backup Exec server-side deduplication is usually optimal.|
|**Client-side deduplication can be used when protecting Hyper-V environments using the Agent for VMware and Hyper-V. In this configuration, optimized, image-level backups of virtual machines are captured and deduplicated through the Backup Exec Agent for Windows installed locally to the Hyper-V host. It is not necessary to install an individual agent into each Hyper-V virtual machine in order to realize client-side deduplication in Hyper-V environments.|
Modern Data Management and Protection Challenges
Customers of all types and sizes are seeking new and innovative ways to overcome challenges associated with data growth and storage management. While these challenges are not necessarily new, they continue to become more complex and more difficult to overcome due to the following:
- Pace of data growth has accelerated
- Location of data has become more dispersed
- Linkages between data sets have become more complex
Data and storage management challenges are compounded by the need for companies to protect critical data assets against disaster through backup and recovery solutions. In order to maintain backups of critical data assets, additional secondary storage resources are required. This additional layer of backup storage must be implemented wherever backups occur, including central data centers and remote offices.
Storage Efficiencies through Data Deduplication
Backup Exec 2012 includes advanced data deduplication technology that allows companies to dramatically reduce the amount of storage required for backups, and to more efficiently centralize backup data from multiple sites for assured disaster recovery. These data deduplication capabilities are available in the Backup Exec 2012 Deduplication Option.
Backup Exec 2012 Data Deduplication Technology
The data deduplication technology within Backup Exec 2012 breaks down streams of backup data into “blocks.” Each data block is identified as either unique or non-unique, and a tracking database is used to ensure that only a single copy of a data block is saved to storage by that Backup Exec server. For subsequent backups, the tracking database identifies which blocks have been protected and only stores the blocks that are new or unique. For example, if five different client systems are sending backup data to a Backup Exec server and a data block is found in backup streams from all five of those client systems, only a single copy of the data block is actually stored by the Backup Exec server. This process of reducing redundant data blocks that are saved to backup storage leads to significant reduction in storage space needed for backups.
Figure 1: Deduplication Process
The deduplication technology within Backup Exec is applied across all backups managed by a deduplication-enabled Backup Exec server.
Deduplication Methods within Backup Exec 2012
The Backup Exec 2012 Deduplication Option gives administrators the flexibility to choose when and where deduplication calculations take place. Three deduplication methods are supported by Backup Exec 2012. These are as follows:
The client-side deduplication method is a software-driven process. Deduplication takes place at the source or protected client, and backup data is sent over the network in deduplicated form to the Backup Exec server. Only unique blocks of backup data are sent to the backup server and saved to backup storage; non-unique blocks are skipped.
Backup Exec Server-side Deduplication
The Backup Exec server-side deduplication method is also a software-driven process. Deduplication takes place after backup data has arrived at the Backup Exec server and just before data is stored to disk (also known as inline deduplication). Only unique blocks of backup data are stored; non-unique blocks are skipped.
The appliance deduplication method is a hardware-driven process. Deduplication takes place on the deduplication appliance (can be in-line or post-process deduplication, for example, ExaGrid or Quantum). 3rd-party deduplication devices handle all aspects of deduplication.
Administrators can mix and match deduplication methods to fit their unique needs. For example, a single Backup Exec server enabled for deduplication can simultaneously use client-side deduplication for some jobs, Backup Exec server-side deduplication for others, and appliance deduplication for yet another set of jobs.
Figure 2: Deduplication Methods
The different deduplication methods supported by Backup Exec 2012 have various configurations for which they are best suited. The benefits of each method, as well as the configurations for which each method is best suited, will be detailed in the following weeks.
Someone told me the other day that they thought that nowadays Backup Exec was pretty complicated and it struck me that it wasn’t so much that Backup Exec had become more complicated, so much as the infrastructure that had. I was routing through my desk draws not so long ago and came across a copy of NetBackup 3.2 a single CD which included Media Management, Clients and all Robotic Support – a single CD! It now takes a few more than that to ship a backup product. Backup has become largely distributed throughout most organisations in order to deal with the demands of modern business. But this does mean that we need new ways to automate the deployment, updates, upgrades, and licenses efficiently across the environment.
Many organisations run a mixed environment of many different versions of Backup Exec and at different patch levels. When managing a large Backup Exec installation, it may not be clear;
- Which versions of BE do I have and where are they?
- What BE license keys have been installed and which Agents and Options?
- Are the patch levels for BE up-to-date?
- What data or machines are not protected?
- How can I update and upgrade multiple BE installations?
A significant deployment or upgrade of Backup Exec really does need careful planning, and subsequent management and now we have the tool to help the management of remote backup servers from a single place.
Backup Exec Infrastructure Manager (BEIM) will be available from early April – based on the Altiris delivery technology from Symantec – it will enable organisations to manage almost all BE operations through a web-based browser. This means you will be able to manage the :
- Discovery and Inventory of All Servers, Agents, and Options
- Creation of Custom Backup Exec Installations
- View of Protected vs. Unprotected Systems
- Creation of Backup Exec 9.1-12.5 Version Upgrades
- Creation of Backup Exec Patch Deployments
- Backup Exec License Management
- Backup Exec Disk Consumption Monitoring for Catalog and Disk-based Backup Data
- Command-line Script Management and Diagnostic Log Gathering
The new tool can help organisations reduce management costs; reducing time to deployment, patching, upgrading, troubleshooting, and monitoring the various components of Backup Exec. The combination of Backup Exec Central Administration Option (CASO) and BEIM is ideal for remote branch offices where network connectivity may be intermittent, but standardisation is needed.
Keep your business up and running - by discovering backup and storage management inefficiencies you can cut costs, while making sure that your data is fully protected. Highly beneficial at a time when budgets are under strain.
It is really useful to go through the process of trying to find out:
- How well your data is protected
- If you are missing backing up critical data
- How prepared you are for increasing data volumes
- Whether your strategy supports business growth or lowers performance
- If you are taking advantage of the most cost-effective solutions available
It never ceases to amaze me how well we don’t know ourselves. To quote Polonius, “unto thine own self be true”; the more honest you are with yourself the more accurate and the more useful the results will be. We know our business, don’t we? We know that we are doing the best we can, aren’t we? It’s not like someone is trying to catch you out – give it a go, there are some pretty simple questions you can ask yourself just to get going, simply because the world has moved on, the drivers for improving backup and recovery operations are ever stricter:
- How can you keep business-critical applications running, delivering improved ROI, while complying with regulations
- How can you justify spending in times of budgetary constraint by demonstrating the quality and effectiveness of your systems
- What is the best way to convince business users of the importance of investing in backup solutions, before data is lost, while also establishing what should be backed up – and why
- How confident are you that you can cover all your IT service requirements? If you are not very confident – how not confident are you, 25%, 50%, or do you stick to all your business service agreements?
- What level of backup reporting do you have that allows you to justify future IT investment to optimise your recovery time objectives? Is there a requirement for reporting metrics, occasionally, or more regularly?
- How confident are you that your main business managers understand the importance of backup? Most of us take backup for granted but how confident are you that your backup policy covers all areas of the business?
Are you confident that you have the right backup and recovery systems in place and are getting the most out of them? A backup and recovery, or storage, assessment will highlight areas of weakness but also help to identify where Backup Exec can improve efficiencies and save you money.
You may think that data leaks has precious little to do with backup, but there you would be wrong. Interestingly I am finding it increasingly difficult to talk about data leaks/loss in isolation from storage management and backup, security and network access control, or, for that matter to talk about backup without talking about data leaks. So, as with all things, backup is joined to the whole of the computer infrastructure and remains the foundation of IT.
The book offers practical advice on how to protect your customers, the reputation of your business, and your bottom line. It is designed for pretty much everyone and anyone from the CEO to the backup manager or data capture personnel – or anyone who deals in sensitive or confidential information, that’s everyone, of course.
One of the really cool functions of BE is the Granular Recovery Technology (GRT). By the way, anytime you need more information on any aspect of BE please see the Backup Exec for Windows Servers Administrator’s Guide. In fact, don’t take my word for it, download from here:
Just a few tips to help you get the best out of BE’s GRT:
- Review the requirements for staging locations in the Administrator’s Guide.
- You must use a staging location for GRT-enabled jobs in the following scenarios:
- You back up to or restore from a volume with file size limitations.
- You restore granular items from tape.
- You run an off-host backup job.
- You are better off creating a separate backup-to-disk folder specifically for all GRT enabled backup jobs – this really simplifies media management. You will need to manage the IMG media that GRT enabled jobs create differently than other backup-to-disk media.
- Don’t allocate a maximum size for backup-to-disk files. If you do then you are in danger of getting failed jobs because of low disk space. This is because the backup-to-disk file often occupies extra space since GRT information is stored in IMG media and Backup Exec will only create a backup-to-disk file that is as large as the size that you specified.
- If you are using frequent incremental GRT enabled jobs it is a really good idea to run a full GRT enabled backup job every so often. This is because each incremental GRT enabled job requires a small amount of internal storage. If this storage amount increases too much, it can affect system resources. When you run the full GRT enabled backup job, you make available the storage space that has accumulated from incremental jobs.
12.5 delivers GRT for Exchange, Active Directory, SharePoint Server, and SharePoint Services which gives you the ability to recover granular data quickly and efficiently from a single-pass backup. It means, for example, that you do not have to run Exchange mailbox backups to recover granular data, including documents, list items and user attributes, or properties.