When we work so close to something, it is sometimes possible to overlook the simple things. This is why we’ve decided to create a two-part blog series on what is digital archiving and what is digital preservation.
Now, many definitions exist for both, but what we are looking to achieve with this series is how we define both from an Arkivum perspective. For the first part of this blog series we will delve into digital archiving and later, provide an introduction into what digital preservation is and how it differs to the former.
So, digital archiving…what is it?
Storing this data in a digital archive ensures it remains searchable, accessible and secure for as long as it is needed. It is a separately managed repository with the sole purpose of meeting the long-term safeguarding and access requirements of the organsiations mentioned above.
Benefits of a digital archive
There often seems to be a misconception that an archive and a backup are similar in their offering, if not, the same. Whilst they can complement each other, it’s important to understand that a backup does not offer the same long-term security, accessibility or capability that an archive can provide.
Digitally archiving your assets involves much more than making a backup copy or even simply placing them in storage. Whilst a backup makes a copy of your data on a regular basis, it does not provide any additional benefits or even, future-proof its usability for the future.
A digital archive goes a step further and stores digital objects in such a manner that they remain:
There is little point in storing data if it cannot be found. A digital archive ensures that data is stored in such a way that regardless of how many assets are saved, it is possible to reasonably easily find the file which has the right information you are looking for.
It achieves this through indexing each asset with metadata, essentially additional data about the data. Whilst there are different forms of metadata, it’s worth focusing on two areas in particular to equip your organisation and stakeholders with easily identifiable items:
- Descriptive metadata (title, author, date of publication, description etc.)
- Technical metadata which describes technical properties of a digital file or the particular hardware and software (where the data resides, the structure of the data etc.)
Let’s take a photograph as an example. An institution can store a digital copy of a photo and have accompanying details to search by, including: when it was taken, who by, where it was captured and perhaps even some details about what it is of. Years into the future, someone can easily find that photo for a number of different reasons. This could include:
- Photos by a particular photographer
- Pictures of certain buildings from a certain time
- Or indeed just looking for that specific photo.
With potentially limited knowledge, they’ll be able to search an abundance of files which fall within these search criteria to be met with relevant results of matching photos…all because of metadata.
As discussed in the previous section, an archive should ensure that the right files and assets can be found. But what happens when you try and access that file?
Storing data within a central repository like a digital archive, enables your organisation and team to better manage these assets. Equipping your team with the ability to search and locate assets within a single database will free up time, administration effort and potential security concerns.
The accessibility element of a digital archive is about ensuring the right people have access to the data and digital assets they need. Imagine you have a university which uploads its research data into a campus-wide repository so that all stakeholders can access it, yet there are a certain number of projects which can only be accessed by those who are directly involved. A digital archive can ensure that those particular stakeholders have the correct user permission to access the associated files, and all other general stakeholders (who have access to the campus repository) won’t be able to access it.
When data is stored for anything more than a couple of years, it is at serious risk of degrading, becoming corrupted or lost. It is a common misconception that backing up data or storing it either ‘in-house’ or in a cloud environment (such as AWS, GCP or Azure) will ensure that it is safe, accessible and readable for as long as it is stored there.
These approaches do not provide any capability to mitigate against:
- Data loss
- Data corruption
- Software applications and operating systems becoming obsolete
- Hardware failure
A digital archive safeguards digital assets and provides peace of mind that your important collections and data remain safe for as long as they’re in the archive. This is achieved by a series of processes which are in place to protect against this data loss or corruption, including:
- Regular automated checks for integrity, to quickly highly and rectify any issues with data.
- Storing the data within multiple separate geographic locations.
- If using a third party archiving supplier, ensuring a copy is stored in Escrow to ensure no vendor lock-in.
- If required, audit trails of who has accessed or viewed the data.
Another element which ensures safeguarding is data immutability whereby, once assets have been uploaded into the archive, they cannot be amended or tampered with…by anyone. Permitted users can access and download them, but they cannot alter them.
To archive or not to archive?
I hope we’ve made it clear what we mean by a digital archiving (and by extension what a digital archive is) and how it differs to a backup or other storage solution. An archive’s primary benefits being that it can;
- Guarantee the long-term safety and security of the data.
- Ensures that those who will require access to the data in the future will not only be able to find it, but access it (as long as they have permission).
As the generation of data continues to increase, long-term data management must become a key focus for organisations around the world and across industries. Not only will this ensure that assets do not become lost, but it will also unlock additional value of this data long into the future.
In our next piece, we’ll delve into digital preservation and outline how this goes a step further and protects the future readability and usability of your data.
15 Jul, 2021
What is digital preservation?
11 Jan, 2023
Cloud storage v On-premise: Choosing the right solution
04 Aug, 2020