Data management means many different things to many different people. On a personal level most of us have a mix of storage media on various devices, potentially multiple cloud storage accounts and perhaps a handful of old portable hard drives or USB sticks. If I asked you to access certain parts of that data, how certain would you be that you could find it? And if you found the physical location, what state would that data be in?
If we expand this line of thinking to organisations, it might be easy to assume they have a much better grasp of their historical data and would be able to easily access old files if required. The reality is that many do not. I’d like to share a couple of stories of data loss that I think we can all learn from.
You probably think that storing something on a hard drive or in the cloud makes it safe, but actually this is not true. Buckle up…
Pixar deletes Toy Story 2…Twice
A well-known tale in the data management world is that of how most of Toy Story 2 was deleted by accident. The Next Web covers the story in more detail but in short 90% of the movie was deleted (we’re talking months of work and hours of footage) from the company servers due to a human error.
Things were compounded when it was discovered their backups tapes were only 4GB in size and had been overwriting new data that was pushed to them. Unfortunately, the error log to tell the team of the issue was also located on this full tape and so hadn’t been noticed.
In the end they were saved by a team member who had been working remotely and had a 2-week-old backup on a workstation in her house, and after a nervous trip to retrieve it the files (minus the work conducted in the intervening period) were retrieved.
Ironically, due to initial feedback on the release months later, much of the move was reworked from the ground up regardless (that would be the “twice” reference in this story’s title) but what this tale does illustrate is a critical point to long-term data management.
While having a secure backup (and not one which is easy to delete) is important it is also crucial that it is continually checked to ensure it is still working as intended, and the data is secure and useable.
Forgive my poor attempt at a pun but another story from the entertainment industry is how 97 episodes of Doctor Who (and a number of other shows) are no longer held by the BBC. Surprisingly this wasn’t actually in error as the BBC routinely deleted archived programmes for a number of mostly practical reasons (insufficient space, scarcity of materials, a lack of rebroadcast rights). It’s probably also worth mentioning the BBC were not alone in this as well.
As of writing this there has been much effort to try and restore these lost episodes, including the discovery that some ardent fans had made audio recordings of the shows when they were broadcast.
So, what does this teach us about data management? It may seem like a simple point, particularly in the context of deleting whole TV shows, but organisations must think very carefully about the importance of correctly archiving and storing its data. Even if the data may not have much value at the given time, in an ever-changing business environment things could shift very quickly to a position where that data could be a competitive differentiator.
This is part of a wider shift in mindset for archived data from simple storage to effectively leveraging archived data for organisational benefit.
The cost of space travel
It was recently reported that NASA have made a costly blunder in their move to Amazon Web Services (AWS). In a move to plan for the future, the space agency has identified that by 2025 they will need an additional 215 petabytes of data storage (up from the current 32 that is currently needs). And if you’re interested to know how much that is…then it is roughly how much space you would need to download 50 million HD movies from Apple TV.
Again, I’ll leave the full details of the story linked above, but in short in NASA’s decision to move towards cloud storage they did not take into account cloud egress costs for users to access the data. The recent audit which flagged this issue concluded that “Collectively, this presents potential risks that scientific data may become less available to end users if NASA imposes limitations on the amount of data egress for cost control reasons.”
This raises a number of interesting points; while NASA had taken the initiative to actively plan for their long-term data management, they did not fully scope or consult on their full requirements and the solutions available to them. I’m not going to start a debate on the pros and cons of a cloud vs. on premise solution, but it illustrates how every organisation needs a robust and well thought out plan when it comes to its data management.
Although NASA is fairly unique in the type of information it collects, the second point this story illustrates is the exponential growth that many organisations face when it comes to the amount of data they are capturing. Big data has been a concept for many moons now, but it is in the coming years that we will really start to see organisations realise the size of the data challenges they face.
These stories all feature organisations who have subsequently regretted not having a more considered approach to their long-term data management. They demonstrate how, while it may not seem an urgent issue at the time, a pro-active and considered approach will save you further down the line, whether it is recovering from a tough situation or taking advantage of a new business opportunity.