Cloud Database Snapshots: Know the Real Costs

The recent trend with enterprises is to migrate their on-premises databases into a Cloud PaaS DBs such as Azure SQL or CosmosDB, AWS RDS, or Aurora, or DynamoDB, GCP CloudSQL.

However, in many instances, enterprises migrate their on-premises databases to Cloud IaaS VM instances. They run/manage by themselves in the cloud for various reasons such as PaaS database size constraints, OS & patch supportability, granular recovery needs, and some just want to avoid the cloud lock-in with PaaS. 

For these cloud databases running in cloud VMs, DBAs adopt a mixture of a legacy practice and cloud snapshots. Let’s explore the challenges that this approach brings up and explore key capabilities to solve these challenges.

Legacy Practices

In most enterprises, database administrators (DBAs) follow the process of 1) Dumping database backups to disk, 2) Configuring a backup product to sweep those dumps to tape or cloud object storage such as AWS S3, Azure Blob, GCP Nearline, IBM COS.

DBAs like this process because it controls their destiny with self-service recoveries instead of relying on a backup admin to recover data from tapes or dedup appliances.

Inheriting Traditional Practices in Cloud

When these enterprises move their production workloads to the cloud (I will use AWS as an example, but the concept will apply to Azure, GCP, IBM), the DBAs continue with the same process. 

The default design (Fig 1 below) in the public cloud is to backup databases to AWS EBS storage and store EBS snapshots of those backups to AWS S3 storage.

They also replicate snapshots to a remote cloud region for DR purposes.

However, applying traditional practices as-is in the cloud can lead to substantial infrastructure costs and other challenges.

Challenges with Database Dumps + Cloud Snapshots

Using database dumps, in combination with cloud snapshots, creates three challenges:

  1. Substantial Infrastructure Costs: Refer to Figure 1 above.
    1. High Block Storage Costs: Because of full database backup, you would need at least two weeks’ worth of backups stored in block storage such as AWS EBS, Azure Disk, GCP Persistent DIsk. 
    2. Cloud Snapshot Costs: To control the block storage costs, system administrators configure cloud snapshots in object storage for long-term retention such as four weeks or six months or 3-year retention.
    3. Remote AWS Region Costs: Replicating snapshots to the 2nd cloud region for DR involves inter-region data transfer costs and snapshot costs in the 2nd region.
  1. Large Recovery Time (RTO): Recovering databases from database dumps is a 2-step process and leads to long restore times as illustrated in figure 2. We use AWS as an example, but the concept applies to other cloud vendors as well.

The first step is to mount the cloud snapshot. This process copies the database dump from object storage to block storage and takes time proportional to the size of the database backup.

The 2nd step is to restore from the database dump in block storage to the database. For example, use RMAN restore to recover the Oracle database from the RMAN backup.

The above 2 step process introduces a very large Recovery Time Objective (RTO) for mission-critical databases.

3. Large Data Provisioning Time for DevOps: Dev, QA, UAT, Security, & Analytics teams need copies of production application data. Creating physical copies in these various test environments consumes a lot of cloud block storage and time. 

For example, five copies of a 10 TB database will consume 50 TB of expensive block storage such as AWS EBS, Azure, DIsk, GCP persistent disk.

It also consumes valuable DBA time, which could instead be used for more important production database related activities/projects.

Critical Capabilities to Solve these Challenges

The key outcomes that enterprises would want are: 

  1. Keep cloud infrastructure costs as small as possible.
  2. Self-service capabilities for DBAs to backup, recover, and clone.
  3. Low RTO, RPO, and cloning time for Databases.

Following are some of the critical capabilities needed to deliver these outcomes:

  1. Eliminate recurring full backups and deliver application consistent incremental forever backup. This also delivers low RPO.
  2. Minimize the usage of cloud block storage and maximize the utilization of cloud object storage.
  3. Deliver instant recovery of even 50+ TB sized databases within minutes to deliver low RTO
  4. Reuse the backups to provision rapid database clones for test/dev, DevOps, and analytics
  5. Reduce cloud infrastructure costs bt deliver low RTO & RPO, and cloning time using cloud object storage instead of using cloud block storage. Cloud block storage, such as AWS EBS, is 10x more expensive than object storage, such as AWS S3 IAS.
  6. And lastly, make sure these critical capabilities work for not just 1 or 2 database types but all the database types such as Oracle, Microsoft SQL Server, SAP HANA, SAP Sybase, SAP MaxDB, MySQL, PostgreSQL, Db2, etc.

I hope this you found this helpful.

Recent Posts