When you explain the tenets of Copy Data Management (CDM) to someone in IT –particularly someone in the backup and recovery space – there is a pretty common reaction: “I already have CDM since I use dedupe”. This reaction isn’t surprising since the dedupe appliance market leader, Dell EMC, likes to co-opt industry terms and bend them into a shape resembling their own offerings. This same perspective – incorrect, mind you – is propagated by all other vendors with deduplication built into their appliances, software, storage arrays, etc.
Deduplication is a useful technology, mind you. And it was a very appropriate answer to the issue of backup storage volumes growing wildly out of control thanks to full backups, stored for long periods of time. When you throw massive, full backups at a deduplication appliance – with 90-95% of the same data every time – you achieve incredible deduplication results. (We’ll conveniently ignore encrypted data sets and high-change rate applications). But, ask yourself: why on earth would you move 90-95% of the same data week after week, month after month, ad infinitum? Your only objective when backing up data is to ensure a successful recovery. Ideally, that recovery is timely. And, further into the ideal state, you would want equally efficient and reliable recoveries for data that has been stored for extended periods of time(months, years and even decades). If you can ensure recovery, without having to move all this data, wouldn’t that be a better solution?
And that’s why deduplication isn’t Copy Data Management.
Deduplication is not concerned about the data type, the application that created the data, or even the workload that was associated with the data (production, analytics, dev/test, etc.). Deduplication is merely a dumb repository into which we dump the same data over-and-over. Copy Data Management is application aware and natively integrated in order to ensure the most efficient capture of that application data. CDM addresses the upstream problem of “there are too many redundant copies of production, so let’s eliminate them”, whereas deduplication thrives on excess copies of the same information. CDM maintains data in its native format so you can instantly access and use the data (after all, isn’t that the most important thing?). Deduplication transforms the data and compresses it into an altered state, requiring rehydration / re-assembly before usage is possible.
Most deduplication solutions are tied to a specific hardware platform and therefore introduce a level of vendor lock-in that is not ideal. True Copy Data Management solutions, like those from Actifio who pioneered the space in 2009, are infrastructure agnostic. Actifio’s virtual copies may be accessed and used from any storage – on-premises or cloud-based. The storage can be block-based and/or object-based. And Actifio can also write to a deduped and compressed pool – for retention purposes – but an instant access copy is always maintained.
Deduplication is a useful point technology that addresses the issue of space efficiency, particularly as it relates to long(er)-term retention of data. It is absolutely not to be mistaken for a Copy Data Management solution.
To read more about Copy Data Management see below
Copy Data Corner Chapter 1: The History of Copy Data Management
Copy Data Corner Chapter 2: Data On-Demand