I recently spoke on an Arkivum webinar about how to build a plan for successful long-term data management. During the session I covered a range of general themes to consider while planning, which can form the building blocks for any data management plan (I’ll provide the caveat now, what I did not cover in any detail is a formal Data Management Plan (or DMP) as seen in some Higher Education or Heritage organisations).
Every organisation is different, has different objectives and faces different challenges, so instead of providing a template for success (which I feel would be misleading), I wanted to share and explore the questions and considerations that need to factor into every long-term data management plan.
Some of the areas or questions included won’t apply to you, but that shouldn’t be a concern. The important thing is that you at least ask the questions of yourself, your team or your organisation to ensure that you have covered the important bases.
I don’t have the space sadly to detail everything that was in the webinar, so I have only covered some of the key areas in this post. If you’d like to watch the full recording of the session you can do so here.
Preservation & Accessibility Requirements
One of the most important elements of data management planning is ensuring that you fully understand your requirements. It sounds like a very simple step, but it is one we find many prospective customers have not fully realised.
Storing any data has a cost associated to it, not just in monetary terms but also the resources required to maintain it and an impact to the environment. The first step in any planning is ensuring that you are asking the question;
“Do I need to store this data?”
Or even, “Should I be storing this data?” in regard to GDPR rules stating data shouldn’t be stored beyond its need.
In many cases the answer will be yes, but as organisations begin to generate more and more data, we all must become smarter about how we manage and preserve it for the long-term. It also must be a question that is factored in as an ongoing process for any organisation to ensure that data is not being needlessly stored.
Once this has been considered we can start to explore the preservation requirements in more detail. The questions you need to ask include;
- How long does the data need to be stored for?
- Who needs access to the data?
- How quickly do they need to access it?
- How frequently does it need to be accessed?
- How safe or secure does the data need to be?
The answers to these questions can start to shape both the tools and processes your organisation needs to build an effective long-term data management approach.
To give a quick example, one set of complex data may need to be frequently accessible for the next decade; this would require advanced preservation techniques applied such as normalisation of file formats to ensure they are kept up to date, ongoing fixity checks to ensure data hasn’t been corrupted or lost, and advanced metadata to ensure it can be easily found. Another dataset which also needs to be stored for a long period of time, but which has a very low likelihood of needing to be accessed would have a very different approach.
No one approach is right or wrong (hence the lack of a template for success), but it is important that organisations understand their requirements and plan accordingly.
Building in Continual Improvement
Data and data management are both fluid and this needs to factor into your planning. Digital preservation and electronic archiving need to be considered as a process of continual improvement that aims to ensure the measures used in the archive remain appropriate and proportionate for the digital content being held, and the purpose for which it is being retained.
Good and best practice approaches exist to aid in building your processes and offer a reliable indication of what good looks like. Popular approaches include both the FAIR data management principles and ALCOA+.
There is also a range of different maturity models and self-assessment tools that organisations can use (and build into their planning) which help assess where an organisation currently lies, and where it can improve.
It is important to view maturity models not as means to achieve a high score, but an effective tool at helping build an appropriate suite of processes that meet your business requirements. In previous roles I have often seen organisations make the mistake of using (and in some cases gaming) a maturity model to celebrate a high level of internal achievement. While these tools can help champion progress within an organisation, it’s important to use them properly and identify opportunities for improvement as well as celebration.
Our CTO and co-founder Matthew Addis recently provided a good overview of many popular approaches in another blog post which I would encourage you to read if you’re interested in some further reading around maturity models.
Accurately defining, communicating and predicting the value delivered from a long-term data management initiative is difficult. Often the true value is only realised if appropriate investment hasn’t been made and something goes wrong (e.g. files become corrupted or lost, or an outdated format cannot be accessed).
Yet any plan must factor in an expected level of value to be realised from the initiative otherwise it will struggle to gain internal support and approval.
At Arkivum we’d strongly advocate anyone building a plan to strike the right balance between highlighting the risks of not investing appropriately with identifying how effectively managing data for the long-term can add additional value to the organisation. This is about shifting the conversation from overcoming the challenge of what to do with stored data, to how to bring your archive alive and use it to gain organisational value.
A good example is from our life science customers who have a regulatory requirement to store clinical trial data for up to and beyond 25 years. There is greater value in effective archival of this data than simply complying with regulations. Some examples include gaining approvals for patent extensions after marketing approval, or during mergers and acquisitions.
Shifting this conversation from risk to reward helps to build the case for an appropriate investment.
There is also a range of tools available to help anyone more effectively define value. Best practice approaches in the management of value or benefits can be investigated to help build a robust definition of how value will be obtained and measured throughout any data management initiative.
This post is already much longer than I planned to make it, yet I feel that I have barely scratched the surface.
The areas I have covered in this post cover three of the most important areas to my mind when planning an effective approach to long-term data management:
- Understand your preservation requirements and ensure you are constantly asking the right questions throughout the initiative
- Build in continual improvement to your plan and leverage available best practice and maturity models
- Clearly define the expected value from the initiative and shift the conversation from mitigating risk to unlocking organisational value.
I hope this post has been helpful; please feel free to comment or ask any questions you may have. I’d also be delighted to hear from readers about other topics that you would like to see us cover.
And finally, as I mentioned at the beginning of the post, if you would like to watch the recording of the webinar where I cover these topics in more detail you can catch the recording here.
23 Jun, 2020
Webinar: Planning for the future; How to build a long-term data management plan
In this webinar Tom Lynam explores how to build an effective plan for long-term data management success. The session covers a broad range of topics including; Why…
27 Sep, 2019
Can digital preservation be automated in the real world?
Why automate digital preservation? Digital preservation is defined by the Bodleian Libraries at Oxford University as: “The formal activity of ensuring access to digital information for as…
18 Oct, 2019
What is FAIR data and can Life Science organisations ensure data is compliant whilst adhering to these principles?
Life Sciences organisations are becoming big data enterprises, with growing amounts of data being generated from clinical studies, lab equipment and drug development often in silos; the…