
This is Part 1 of a 3-part series on incremental-forever backups. In this first part, I’ll describe the challenges with traditional backups and the way to address those with incremental-forever backups. In parts 2 and 3 we’ll dive into details of how Actifio provides incremental-forever backups for Oracle, SQL Server, and SAP HANA databases.
We’re all familiar with the three little pigs fable, with the moral being that it pays to invest in the right design even if it takes hard work. While there were certainly no computers when this story evolved, it is nonetheless absolutely true for our modern world. At Actifio, our goal from day one was to build a platform that could handle very large data sets – for backup, recovery, and virtual cloning. The only way to do that is to design an architecture that moves data as little as possible. When you are dealing with 50 GB of data, you can easily copy it over and over. But when dealing with a 50 TB database, a copy can take multiple days. One of the innovations that Actifio brought to the market was to make sure that, after the first full backup of such a database, we only have to backup changes – the backups are incremental-forever.
Traditional backups also have a concept of “incremental backup”, so how is Actifio different? In traditional backups, after the weekly full backup takes place, the database transactions are saved from time to time, most often daily, and those are the incremental backups. There are many challenges with this approach:
- When recovering the data, you have to restore the last full backup and then “play” onto it the incremental backups. This is very time consuming, causing long RTOs.
- You need to rebuild the database from its full and incrementals, and that means all this data has to be copied somewhere, which requires storage capacity and, even worse, means that you can not access the data until it’s fully copied, which can take days.
- Because the recovery time gets longer the further away you get from the last full backup, most enterprises settle on a weekly full backup. This means that you have to size your environment – hosts, storage, network, etc. – to be able to complete this full data copy in 24 hours. With the size of today’s databases that is often not feasible and certainly not economical.
- Finally, the worst part is that the full backup is long and I/O intensive and therefore negatively impacts your production database’s performance. Why not save all that I/O bandwidth for real production use instead of a full backup?
So, what is a better architecture? One that allows capturing the incremental changes and making them available immediately at any point-in-time, without having to “replay” anything. And, ideally, one that allows you to access that data immediately, without having to copy it somewhere else first. The answer is quite simple, and has been around for some time, really. Most transactional databases today sit on block storage devices, and we have known how to snapshot those devices for a long time – whether on the host with volume managers such as LVM, or on enterprise storage arrays. So the most efficient way to deal with large databases is to maintain snapshots in time, tracking the changed blocks between points in time.
Now you may say, if it’s so easy, why doesn’t everyone offer that? There are multiple reasons for that. First, it’s not enough to maintain snapshots on the production host or storage array – in order to have a true backup you need something that resides on some other infrastructure, and moving these snapshots elsewhere is not always straightforward. Second, those snapshots are very specific to the storage array – what happens when you want to upgrade or move to another vendor? You’re stuck. So what’s needed is a mechanism to identify changed blocks between snapshots in a way that is portable – where point-in-time copies can be moved around to wherever you need them – off site for DR, in the cloud for long term retention, in another cloud for resiliency – wherever the need is. That requires some real innovation and is exactly what Actifio’s Virtual Data PipelineTM (VDP) provides to you.
In parts 2 and 3 of this blog series we’ll talk about the beginning of this pipeline, which deals with capturing the data in an incremental-forever fashion. We’ll discuss why it’s easier said than done to achieve this, what’s required, and how Actifio does it for various databases.
__
Micah Waldman is the VP of Product Management at Actifio and brings more than 25 years experience developing and taking to market enterprise software solutions.