Matthew Addis, Arkivum CTO, presented at the Preservation and Archiving Special Interest Group (PASIG) from 26th to 28th October 2016, an event kindly hosted and co-organised by The Museum of Modern Art in New York.

Arkivum has approx 30 Universities in the UK using our data archiving and digital preservation solutions, many of them for research data storage as part Research Data Management. We’re also working with Artefactual and Jisc to deliver an open digital preservation solution for use in Jisc’s forthcoming national Research Data Shared Service.

What are the drivers

Matthew’s presentation is within a session about reproducibility of research. There are a range of drivers that are pushing us towards research that is done in an open, transparent, reproducible and verifiable way. These include from the funding bodies who pay for research, researchers who do the research, and both industry and the public who make use of the results of research.

Digital preservation helps achieve reproducibility by helping to ensure that data remains usable over time and has understood authenticity and integrity, i.e. people can not only open and use data but they can trust the data and know where it came from. But, in many cases, we’re still a way off being able to apply and reap the rewards of all the good stuff that digital preservation can potentially offer.

Matthew takes a step back to look at some ways to get the basics in place so that we are in a better position to apply digital preservation in practice.

Total Estimated Research Data Volume

Several Universities in the UK have audited their research data holdings using the Data Asset Framework, or DAF for short. h>p://www.data-audit.eu/. We took the results of about a dozen DAF surveys and estimated the volumes of research data held by UK Higher Education Institutions (HEI). Our estimate is 450PB of data generated by 91,000 researchers across 156 institutions – and that’s just England. Basically, there’s a lot of research data out there and it’s only getting bigger!

Total Estimated Research Data Volume

In this presentation Matthew has tried to show how several practical steps can be taken towards preserving research data. The approach is to make it as easy as possible for researchers to contribute towards the safeguarding and accessibility of the research data that they generate. This is done by getting the data into safe storage as soon as possible and generating metadata so it’s clearer what the data is and how to use it. Some of this metadata is descriptive and comes from the researcher. Some of the metadata is technical and comes from automated tools, e.g. Archivematica.

The result is that we move from ‘unknown unknowns’ to a position of ‘knowing what we have’, which is the starting point for making informed and reasoned decisions on how to keep this valuable data ‘alive’ so it remains accessible and usable into the future.

Please click here to visit our Resources Library to access the full presentation.