In software development, is testing faster more important than coding faster? It seems like that’s true for Uber. Yesterday, CNBC posted an article on how Uber is not finding enough cities in which to test their self-driving cars.
The cool software and algorithms are ready… But it can’t be used because they can’t test the software in “real-world” scenario. So by the time all the testing is done in the “real world”, it might take many months or even years. Until then, we all can not benefit from this technology, and neither can Uber.
The inability to deliver “real-world” data, i.e. production data to Dev & QA testers, in my opinion, is the #1 problem facing most enterprises engaged in software development. Let me share a personal example from my past life when I was running software engineering. Back in 2007, my development team came up with a cool inline variable block deduplication software. The biggest challenge was to feed Gigabytes (GB) and Terabytes (TB) of data to this software. And the data couldn’t be fake data, generated from copying and pasting the same information over and over again, because the deduplication algorithm would then give massive deduplication ratios which would be meaningless. Suffice it to say, we never had enough ‘real world’ data to test against during our initial phases of development and testing.
Eventually though, I managed to convince our internal IT team to allow us to backup some of their low-tier application servers using our software once we had a pre-beta version. However, we very quickly found that it was almost impossible to do daily backups, debug the problems, provide a fix, and then test the fix all in the same environment. The window to analyze a problem, fix a problem, and test the fix was typically just 6 hours. We found way too many issues in the late stage, pre-beta, testing using ‘real world’ data. We also found some performance issues and hence had to go back and tweak design and refactor quite a bit of code. All this led to delayed release in an increasingly competitive software deduplication market in 2007.
Back then, I remember wishing “I wish there was a way to get ‘real world’ data early in unit testing cycles, so we could have found and fixed all of these issues earlier.” I wished there was a way to create multiple copies of these deduplicated backup images, 1) instantly, 2) without consuming any extra storage, and 3) have multiple people debug, fix, and test simultaneously. Unfortunately, at the time, there was no concept of data virtualization which would have made it possible to provision ‘virtual’ copies of such multi-TB datasets instantly.
Today, of course, it’s possible with data virtualization technology. Enterprise Dev & QA teams can really “ACCELERATE” their test cycles using data virtualization technology. It allows them to create a copy of, for example, a 10 TB production dataset very efficiently and deliver data as-a-service to developers, QA, UAT, Analytics, Support, and Security teams instantly without consuming any extra storage.
Software engineers design for parallelism. They identify tasks that need to be run in parallel, and spawn multiple threads or daemons to run those tasks to cut down the time it takes to finish the task. This is precisely what’s needed in end-to-end testing – an approach where massive number of test cases can be broken down into parallel execution on multiple test machines. And such parallel effective testing is possible ONLY if virtual copies of ‘real world’ production datasets can be delivered quickly to all those test environments.
Want to know how this can be accomplished using Actifio? Send us an email at email@example.com and our solution architects will show you a demo of these capabilities in less than 15 minutes.