For anyone who’s been in the backup / recovery space for any length of time, it’s very likely you’ve heard one or more of the following phrases:
- Tape is dead
- Dedup is the future of backup
- We’re out of space in our dedup
- [Cloud] Object storage is the future of backup
- Dedup on object storage is the future of backup
- Recoveries are too slow from [tape | dedup | object]
This is the road we’ve all traveled over the last 15 years or so. But where does it end? Where is our pot of gold at the end of the rainbow? Why can’t we have backups that are cost effective, scalable, resilient, and most importantly can meet my recovery SLAs?
When Actifio set out to start using object storage, our intention was to use it as an alternative to tape, especially for long-term retention. Instead of falling into the same trap that seems to have snared everyone else and focusing just on backups, we decided to keep a focus on recovery times. We realized that the way we store data would allow us to use our near-instant mount feature, even if the data was coming from object storage.
When this feature was released it was an immediate attention grabber. People loved the fact they could access backups, no matter how old, from anywhere with access to the object storage (which in the world of cloud providers means they could access it from anywhere). Not only could they access it from anywhere, but they could bring large databases online without the need to copy all the data first. They could even bring up a database stored in the cloud on a local database server, with almost no storage consumption locally!
The realization that we had something special in our hands led us to double-down our efforts in this technology, and we subsequently introduced incremental-forever backups to object storage, allowing it to become a primary backup medium instead of just a long-term monthly copy.
But this still doesn’t answer the question “How do you do it and still get performance?” To understand the answer, you first have to understand a little bit about how Actifio stores data. As a guiding principle, we decided long ago to store the data we capture in a “usable format” instead of a backup format. What does this mean? It means our backups typically look like the source. For a SQL Server database, this means we store virtual disks, formatted with NTFS, and holding .mdf, .ldf, and .ndf files. For Oracle we have disks that are formatted to be part of ASM disk groups, or have filesystems with Oracle data files on them. For VMware, we have disks that contain the contents of VMDK files.
Why does the format of our data matter? Because it enables us to present a disk to a target server that is ready to be used without any sort of extra virtualization layer, transformation, or data copy / restore. It means that the user can read and write data on their recovery server just the same way they would if we had restored the data to their server first, but without actually copying any data first.
Once this is understood, then we can talk about storing data in object storage. Because the data inside an Actifio appliance is a set of virtual disks that contain the data, storing them in object storage is as simple as breaking up the disk into fixed-size blocks and putting each one into an object. If a virtual disk is 100GB and we have selected 1MB blocks, then we will store that data in the object storage as a set of 102,400 objects (each one containing 1MB of the underlying disk). We don’t care if the contents of the disk were 1 billion little files, or one file of 100GB.
Performing that near-instant mount operation now is just a matter of presenting a 100GB phantom disk to a server, and intercepting each I/O request so that we can do the right thing with it. If a server tries to read a 4KB block, we can translate that into the object containing the 4KB and retrieve it. If the server tries to write data, we intercept that and store it in some block storage locally. We also offer options for caching reads into the block storage for enhanced performance on subsequent reads of the same data.
The data flow looks something like this:
The options for managing the block storage cache are as follows:
- Storage Optimized
- Balanced (default)
- Performance Optimized
- Maximum Performance
If the user selects Storage Optimized mode, only changed data written by the server will be stored in the block storage cache. In the Balanced mode, data read from object storage and data written by the server are stored, speeding subsequent retrieval of that data. In the Performance Optimized mode, a background worker thread is added to eventually read all data from object storage and save it into the block storage cache. Maximum Performance mode will copy all data to the block storage cache first (before the mount) and is only used if every I/O must be at maximum performance.
The combination of data stored in virtual disk format, combined with this approach of treating object storage like block storage with a cache allows for access to data in the object storage quickly, and with performance to meet the needs for most tasks.
So go ahead. Send your 20TB database to object storage with Actifio, and keep a very short retention on-site. You’ll still be able to access that database any time you want, even years later, without provisioning 20TB to hold it, and without waiting to copy that data back to your datacenter (or even into a cloud instance if that’s where your DB server resides). Enjoy the pot of gold!
Learn more about protecting your data with Actifio