Backup Strategies for IaaS Cloud Services
Designing a backup policy for offsite data backup, as opposed to traditional on-premises backup, can pose a challenge for organizations. Our Recovery & Remediation team has said over and over again the importance of backups is key to reducing your downtime in the event of a cyber attack. Data protection is key to business continuity. The particulars of data transfer costs and the pros and cons of various disaster recovery solutions are complex. However, with a properly designed backup policy, an online backup service such as Microsoft Azure or Google Cloud can be a cost-effective way to ensure business continuity by mitigating the risk of lost data.
What’s the Best Approach to Using a Cloud Services Provider for Backup and Restore?
- Most significant differences between traditional on-premises backups versus cloud services backup solutions regarding frequency, retention, full versus incremental, etc.
- IaaS backup strategies versus a backup strategy for traditional on-premises backups
- Responsibilities of the cloud provider versus responsibilities of the cloud customer. Which responsibilities does the cloud provider own, and which responsibilities does the cloud customer own?
What Is the Purpose of Backups?
As you develop your backup policy, start with considering what the purpose of a backup policy is. The whole reason for data backup is to ensure business continuity by providing recovery capability in three main scenarios:
- Recovering files that were accidentally deleted or that have become corrupted
- Catastrophic (unrecoverable) operating system or filesystem failure
- Catastrophic (unrecoverable) hardware failure
Backups serve other purposes besides disaster recovery and individual file retrieval, but such purposes are generally some variation of one of these primary purposes.
Cloud Services Backup Strategies Versus On-Premises Backup
Creating a backup strategy for a content security policy environment is challenging. On the one hand, the likelihood of a catastrophic failure requiring the use of backups to recover is substantially lower. On the other hand, the few case studies of catastrophic content security policy failures were highly catastrophic.
To avoid unmanageable costs for a backup solution, organizations must adopt different backup strategies for backups using cloud services than those used for traditional on-premises backups. Applying a conventional on-premises backup policy to cloud services will also fail to model around the appropriate types of failure scenarios.
Cloud Service Providers (CSPs) offer a lower risk of hardware failure. Still, organizations must balance this with the costs of performing the same type and number of backup tasks as they traditionally do for on-premises backups. Using cloud services, particularly IaaS, for a backup solution requires an organization to shift its thinking from on-premises backup strategies.
Accurate Assessment of Availability Bias of CSP Backup Costs
Many organizations fall victim to the misconception that moving to a CSP backup solution from a traditional on-premises solution comes with a much higher cost. CSP backups, in most cases, do not cost significantly more. The reduced cost is a by-product of how easy it is to track the actual cost of CSP backups.
Part of the misconception around the higher cost of using a CSP comes from the difficulty in calculating the cost of individual on-premises backups. On-premises backups involve an organization licensing backup services and buying the hardware necessary to support it, which falls into capital expenditures (CAPEX). However, the hardware operation comes with power, cooling, and space (the highest cost) expenditures: operation OPEX. This data center OPEX cost is hard to estimate, and most organizations don’t track this.
In contrast, CSP backup costs are much easier to calculate because the numbers are readily available. Don’t fall into the trap of thinking CSP backups cost more just because it’s easier to see the numbers.
The Cloud Backup Solution: All Responsibilities Are on the End-User
The role of a cloud services provider is to provide a sturdy platform for cloud infrastructure. It is the sole responsibility of the end-user to manage backups. Also, regarding backup services offered by CSPs, the service could be negatively impacted by a CPS-wide outage. As mentioned earlier, it is very rare but also highly catastrophic.
Your Disaster Recovery Goals
When designing a backup policy for cloud providers such as the Azure backup service, an organization should thoroughly evaluate its data recovery requirements, especially data recovery time and point objectives.
Disaster recovery goals go hand-in-hand with backup policy goals. For this reason, developing an IaaS backup program should follow an organization’s goals for that backup policy. Your disaster backup and recovery program should be defined by two critical metrics: the recovery time objective (RTO) and the recovery point objective (RPO).
- RTO: How long it will take to recover the system and return it to service
- RPO: Defines the maximum acceptable data loss by setting a point in time to which a system can be rolled back
Disaster recovery is a complex subject of its own. Some factors affect RTO and RPO outside of backup strategies, but they will not be discussed in this article. One thing you should not do, however, is to transfer on-premises RTO and RPO to a cloud-based backup solution such as Microsoft Azure Recovery Services or Google Cloud. Once the assets have been migrated to the cloud and the system is deployed to your CSP, the disaster recovery objectives, i.e., the RPO and RTO, should change.
To understand why this is, let’s consider the three primary reasons to backup data. On-premises backup solutions entail far less hardware redundancy than a CSP-based strategy. When it comes to an IaaS running at a CSP, failure would depend on a much higher number of simultaneous hardware failures. This difference applies to disaster recovery solutions and individual file recovery, as most OS and filesystem corruption issues are rooted in hardware problems. And so, this problem is significantly reduced when using a CSP.
In-Band Versus Out-of-Band Backups
The same software used for on-premises backups can be used for backups in-band. Organizations can minimize bandwidth costs by creating an IaaS running the backup server in the same region as the instances being backed up. Through this approach, CSP backups and on-premises backups use the same solution. This approach works well for individual file restoration in an otherwise functional system.
Out-of-band backups in a CSP can be performed using volume snapshot backup capabilities. Azure backups can be done as both full and incremental volume snapshots. These snapshots tend to come at a lower cost than running a separate IaaS-protected instance for backup software.
The drawback is that it becomes much more challenging to recover individual files from a volume, which usually involves moving the data manually from a backup volume mounted to an IaaS instance. If an individual file recovery needs to be performed regularly, personnel costs will quickly surpass those of additional infrastructure.
A compromise between these strategies can be achieved. A blended option of running backup software in an IaaS instance and backing up all instances in the region. An organization using this strategy would then obtain block storage snapshots of the backup server IaaS instance and replicate those to another CSP region. This strategy results in a good blend of in-band and out-of-band backup features and offers maximum resiliency against catastrophic CSP failures.
Those catastrophic “hardware” failures of a CSP can and do happen. For example, the infamous widespread outages caused across an entire region by Amazon due to errant commands run by a systems engineer. In this instance, the ” hardware ” was the grouping of a block of volumes that underpin the volumes that all IaaS instances deployed in the AWS region used.
Most CSPs offer tiers of block storage that vary in cost depending on the length of storage and retrieval requirements. Because many IaaS backups will be stored for a relatively long period and will be accessed infrequently (if at all), cheaper tiers of block storage may be appropriate for IaaS backups. In almost all cases where infrequent access tier storage is used for IaaS backups, the backups will be performed at a volume level, making individual file restoration more complicated.
The cost of backups of IaaS instances is very much dependent on where those backups live. Most CSPs do not charge for bandwidth consumed within a region. They do, however, charge for data transfer outside a region. So, the most cost-efficient backup solution entails storing backups only in the region where they will be restored, i.e. the region they are taken from.
This strategy doesn’t protect against catastrophic failure in which all block storage in a region has been rendered unrecoverable. Such scenarios are, admittedly, rare. But since such a scenario is precisely when organizations most need backups, they should make sure to risk-model such a possibility.
If an organization wants to replicate backups in a second region, it should do so in physically separate regions. For example, if primary backups are in US-West-1, then backups should be replicated in US EAST-1, not US-WEST-2. One thing to make a note of is that not much is known about how various CSPs manage security between regions, as that kind of information falls into the realm of trade secrets. This means, of course, that this obscurity is not likely to change.