Providing Masked Data for Devops


Time and storage always seem to be in short supply when your business relies on multi-terabyte databases. An outage can put your business on hold for days, and provisioning up-to-date copies for development and test can be an enormous challenge. Actifio solves these challenges in spectacular fashion. Even huge databases can be online in minutes, in production, or in multiple development and test environments.

In some cases, the data may contain sensitive information that may be inappropriate for your developers to see, such as personally identifiable information, health records, salary information. In those cases it may be necessary to “mask” or “obfuscate” the sensitive data before providing a copy to the developers. The masking process will obscure any personally identifiable information (such as names, addresses, phone numbers) by replacing that data with fake or random information, while maintaining referential integrity and overall functionality. Developers and testers can then use the masked data, while not exposing the “real” data.

The traditional approach to data masking involves getting a full copy of production data, via a slow backup/restore or export/import process, and presenting the data to a database server where the masking process runs. Once the data is masked, copies can be distributed to the various development and test environments, with each copy taking hours and consuming a lot of disk space. By the time you’ve managed to get this masked data to those who need it, it’s probably days old. Refreshing the data once a week is probably the best you can achieve.

These challenges can be solved using an automated workflow. A masked copy of the production data is updated from production in an incremental-forever fashion, on-demand or on a schedule, and is re-masked after each update. Developers then access “virtual” (thin-provisioned) copies of this masked data, which can be refreshed in minutes, using almost no disk space. The virtual copies can be used even while the data is re-masked, meaning that data refresh can proceed during the day, with no downtime for the developers, making it possible to refresh daily.

Time to refresh is not the only time saved. With on-demand access to a fresh, virtual copy of the full data set at all stages of your deployment pipeline, the time to detect code defects is also greatly reduced, enabling your organization to build higher quality applications faster.

Planning your DevOps Strategy?

Download the DevOps Checklist