Close

Big data’s hidden problem

By Ash Ashutosh, Founder & CEO – Why ‘big data’ is smaller than ‘copy data’, and what it means for your business.

Big data and its challenges have inspired much thought and debate. So what does it actually mean for your business?

To start with, the term itself is not always well-defined. Technology historian George Dyson put it bluntly: “Big data is what happened when the cost of keeping information became less than the cost of throwing it away.” But recent research conducted by 451 amongst storage professionals, shows that big data accounts for only 3% of the total data storage footprint.

‘Big data’ has been used to describe the analysis of large volumes of various types of data. Big data is also a trend covering multiple new approaches and technologies for storing, processing and analysing data. Such analysis can be useful for businesses looking to understand what people are buying, when, where and how.

If only 3% of data stored is ‘big’, what makes up the rest of it?

It turns out that the real problem is data proliferation.

We all see this in our home lives. You take a photo, save it to your computer, edit it, post it on Facebook, Tweet it, email it to a friend and back it up when you upgrade your computer. So you’ve made several copies of the same photo, saved in different places. At work, when you email a PowerPoint attachment to ten colleagues, the email system saves a copy, and your colleagues may save it to their computers too.

At work you create new data every time you send or receive an email. Software engineers can make tens or hundreds of database copies to accelerate new application development.  A single email shouldn’t gobble up lots of storage space, but the copying of large datasets will quickly amass to petabytes inside the modern enterprise. IDC estimates that 60% of what is stored in data centres is actually copy data –multiple copies of the same thing or outdated versions. The vast majority of stored data are extra copies of production data created by disparate data protection and management tools like backup, disaster recovery, development and testing, and analytics. According to IDC, global businesses will spend $46 Billion to store extra copies of their data in 2014. This ‘copy data’ glut in data centres costs businesses money, as they store and protect useless copies of an original.

While many IT providers are focussed on how to deal with the mountains of data that are produced by this intentional and unintentional copying, far fewer are addressing the root cause of copy data. In the same way that prevention is better than cure, reducing this weed-like data proliferation should be a priority for businesses. Actifio’s recent successful $100m+ funding round is testament to some of the sharpest minds in finance recognising this priority.

Enterprise IT heads tend to have similar key strategic priorities – improving resiliency, increasing agility, and moving toward the Cloud to make their systems more distributed and scalable. Often they are held back by  old software and hardware. Copy data virtualisation – freeing organisations’ data from their legacy physical infrastructure just as virtualisation did for servers a decade ago – is likely to be the way forward. If business divisions work on a single physical ‘golden’ copy which can spawn innumerable virtual copies then copies won’t take up server space

So despite all the big noise about big data, it’s not going to pose a threat to you just yet, it’s copy data you want to watch out for. The sooner companies reduce the creation of physical copies, the less they will have to spend on storage.

 

This post originally appeared on NewBusiness.co.uk >