This post was initially written for Datanami; the original can be found here.
No matter what kind of business you’re in–whether it’s an IT solutions provider hoping to get a better understanding of how your software is (or isn’t) being used by end-users, or an advertising agency trying to measure campaign ROI for a client–the success of the modern organization is increasingly fueled by insight gained from data. Attaining that data has long since ceased to be a problem–and the meteoric rise of computer, mobile and Internet usage mean it’s not going to be a problem organizations will have any time soon, either.
This sheer deluge of data caused by increased connectivity does pose a serious challenge in its own right, though. We now live in an age where the data quantities being created is causing the complexity and cost of data management to skyrocket. 2016 has frequently been cited as the year the world produces more digital information than it can actually store. By 2020, IDC predicts 1.7 megabytes of new information will be created for every human being on the planet, every second. Trying to make sense of that data is going to be a huge challenge for all organizations.
While this is partially caused by the increase in touchpoints creating this data, the bigger reason this overwhelming data quantity actually exists is the proliferation of multiple physical data copies. IDC estimates that 60 percent of what is stored in data centers is actually copy data – multiple copies of the same thing or outdated versions, costing companies worldwide as much as $44 billion to manage.
Cost aside, copy data also creates a serious security threat. The vast majority of stored data is extra copies of production data created by disparate data protection and management tools like backup, disaster recovery, development, testing and analytics. Arecent IDC study found that the typical organization holds as much as 375 data copies. Each added copy increases an organization’s “surface area of attack”, and gives hackers looking to get at important information more source material to work with.
Copy data virtualization (CDV) – the process of freeing an organization’s data from its legacy physical infrastructure – is increasingly what forward-thinking companies are doing to tackle this problem. By eliminating copy data and creating a single ‘golden master’, virtual copies of ‘production quality’ data are available immediately to everyone in the organization that needs it, creating a more streamlined workflow, and cutting storage and management costs. In many ways, it’s the next natural step following server and network virtualization.
If you’re an IT manager, copy data virtualization may be a great solution to many of the challenges facing you by an increase in data. Like any transformative solution though, implementing it within your organization requires some planning and strategic thinking.
So what are some of the main considerations you should take into account before implementing CDV?
1. Choose Your Platform
Each organization has its own unique challenges, and this will influence what platform best suits its needs, but there is a common set of criteria that most will need to consider. The typical enterprise will have workloads spread across different systems – i.e. virtual machines on VMware; physical machines on Windows, and so on. You need a platform that will support all of these systems, and the whole range of applications and databases that go with it, for CDV to be effective. It should also be infrastructure-independent, allowing you to choose different infrastructure for different use cases – production support being run on a storage platform from vendor A, testing and development on storage from vendor B, and so on. You’re also going to want to manage this from a single location to make it simple to control. Finally, a platform with hybrid cloud support will give you the choice to spin off different applications into different datacenters.
2. Choose an Initial Use Case
A successful CDV implementation will eventually allow you to replace many of the disparate tools used to manage data with your platform of choice. But that doesn’t happen overnight – and nor should it if you want to manage a roll out effectively. Identify one use case you want to start with and roll it out there first. That way, you’re able to identify potential issues and iron out any kinks before rolling out more widely. So you might start with testing and development, before moving onto production, analytics, and so on.
3. Scope Your Needs
You’ve chosen your platform, and identified your initial use case – the next step is to scope out your specific needs, so you can design the underlying infrastructure to support them accordingly. Important questions to ask yourself at this stage should include: What rate is the production data changing at? If you are looking at virtualizing for backup – what is the retention time needed? How many virtual copies will you need simultaneously? What kind of testing will be done with that data (performance, functionality, scaling etc.)? How much bandwidth will you need (particularly important if you’re working with multiple datacenters across different locations)? You should also think about security and controls – how is data being replicated and encrypted? Making sure you know the answers to these questions before you start investing in infrastructure can save you a lot of time and money.
4. Leverage the Hybrid Cloud
Many organizations have started to harness both private and public cloud offerings to create a hybrid cloud infrastructure. These hybrid clouds exploit the control and security of a private cloud, along with the flexibility and low cost of public cloud offerings. Together they can give you a powerful solution to meet the increased demands on IT from the rest of the organization. One of the biggest benefits of this approach is enhanced organizational agility – using public cloud, particularly in times of heavy usage, means you can experience fewer outages and less downtime. Testing and development of new applications is a good example use case for using the public cloud, as it gives you time to consider where you’d like to host those applications more permanently if or when they go into production. A hybrid approach also allows you to multi-purpose infrastructure – for data recovery and test and development simultaneously, for example – helping to cut down on costs and complexity.
Successfully implementing copy data virtualization can dramatically change the way an organization works. The sooner they can reduce the creation of physical copies, the less they will have to spend on storage and the quicker they can get to the analysis. The result is less data moving across networks, less data to store, greater efficiency in long-term retention, substantially reduced storage costs, and the elimination of costly operational complexity. In short, data virtualization gives an organization virtual sanity, and we believe it will be around for some time to come.