Every enterprise. Many internal teams.
Who benefits from compute virtualization? The answer is simple! Everybody. Application developer, QA tester, Operations, UAT, Production support, Data Analytics team … it’s a long list and it applies for data as a service as well.
Consider that you have a mission critical application using an enterprise database such as Oracle, MS SQL, DB2 etc. Who would want a copy of that production database? Over the years, in my interaction with hundreds of enterprise customers, I have compiled a comprehensive list of various internal teams, listed below, who demand copies of production data:
- DBAs to test database schema changes and run performance analysis
- Application developers for unit testing
- Build team for integration testing
- Automation QA team for automated testing
- QA engineers for manual QA testing
- UAT team for performance testing
- Pre-production / staging team for ad-hoc security, app, and OS patch testing
- Security team to test and prevent security vulnerabilities
- Production support team to do root cause analysis and fix patches
- Data warehousing and Analytics team to analyze TBs of data
- Financial analysts who need to crunch numbers at the end of each quarter
- Backup / DR team to do instant recoveries from backups
- Compliance team to ensure data is recoverable from long term data retention vaults
- Training team to demo their products to internal / external customers
The common denominator is they all need access to this valuable production data. However, do they all need data at the same time? Most likely no. But do they all need data at some point within the same month or a quarter? Yes.
Traditionally these functional teams got access to data via long cumbersome manual processes starting with opening a ticket, followed someone provisioning compute, storage, and then a DBA cloning and masking the database. Such manual processes and physical cloning inevitably slows down the data access.
What’s needed is self-serve instant access to the requisite data to keep the process moving forward. Data virtualization is a strategy that can deliver data-as-a-service to these users with a simplicity similar to someone streaming a movie on Netflix. That’s right. It should be as simple as an end user (who has been granted access based on role based access control) pushing a button and provisioning a masked copy of production database instantly on his test machine.
But how is it possible to provision a copy of 10 TB or a 50 TB database “instantly”? It’s possible only if you can provision a “virtual” copy instead of a physical copy. A virtual copy gives an illusion that each user has their own private copy of a 10 TB or a 50 TB database. The reads from all virtual copies come off a single master golden copy of the production data set (see Fig 1).
This illustrate how data virtualization works and it enables users to provision data-as-a-service on any storage on any hypervisor. Data can also be replicated to cloud so users can leverage on demand compute and data as a service in public cloud as shown in Fig 2.
In an upcoming follow-up blog, I will share reference architectures of how users mentioned in various functional groups listed above can access data-as-a-service.