What is digital archiving? A guide to data archiving and best practices

Blog Tom Lynam

When we work so close to something, it is sometimes possible to overlook the simple things. This is why we’ve decided to create a two-part blog series on what is digital archiving and what is digital preservation.

Now, many definitions exist for both, but what we are looking to achieve with this series is how we define both from an Arkivum perspective. For the first part of this blog series we will delve into digital data archiving, including what it is, the best practices, and the benefits of having a digital archive. In our second blog, we’ll provide an introduction into what digital preservation is and how it differs to the former.

So, digital archiving…what is it?

 

What is digital archiving?

Digital or data archiving is the process of moving inactive or currently unneeded digital content into long-term archival digital storage. The content can be any digital file, from digitised physical documents and complex databases to images, videos and audio files. As the terms are synonymous, the answer to “what is data archiving?” is the same.

Archiving content ensures it remains searchable, accessible and secure for as long as it is needed, prevents it from being lost or damaged, and frees up storage capacity. It is a separately managed repository with the sole purpose of meeting the long-term safeguarding and access requirements of your organisation.

Examples of archiving systems and use cases

What is the definition of archival data?

Archival data is any data that has been stored and preserved for long-term retention (“archiving”) rather than immediate use. It is often important data that is archived rather than stored as a backup, and it is stored on a dedicated archival storage system rather than an organisation’s primary storage.

 

What is the difference between a digital archive and a backup?

There often seems to be a misconception that an archive and a backup are similar in their offering, if not, the same. Whilst they can complement each other, it’s important to understand that a backup does not offer the same long-term security, accessibility or capability that an archive can provide.

Digitally archiving your content involves much more than making a backup copy or even simply placing them in storage. Whilst a backup makes a copy of your data on a regular basis, it does not provide any additional benefits or even, future-proof its usability for the future.

A digital archive goes a step further and stores digital objects in such a manner that they remain retrievable, accessible and protected.

Learn more about the key differences between backups and digital archiving, and why it’s important when designing your archiving solution.

 

How to archive data: Best practices

To move beyond simple backups, you must implement data and database archiving best practices. The best way to digitally archive data long-term is to follow a framework prioritising searchability, accessibility, and safeguarding.

1) Searchability through indexing with metadata

There is little point in archiving content if it cannot be found. A digital archive ensures that data is stored in such a way that regardless of how many assets are saved, it is possible to reasonably easily find the file which has the right information you are looking for.

It achieves this through indexing each asset with metadata, essentially additional data about the data. Whilst there are different forms of metadata, it’s worth focusing on two areas in particular to equip your organisation and stakeholders with easily identifiable items:

  • Descriptive metadata (title, author, date of publication, description etc.)
  • Technical metadata which describes technical properties of a digital file or the particular hardware and software (where the data resides, the structure of the data etc.)

Let’s take a photograph as an example. An institution can store a digital copy of a photo and have accompanying details to search by, including: when it was taken, who by, where it was captured and perhaps even some details about what it is of. Years into the future, someone can easily find that photo for a number of different reasons. This could include:

  • Photos by a particular photographer
  • Pictures of certain buildings from a certain time
  • Or indeed just looking for that specific photo.

With potentially limited knowledge, they’ll be able to search an abundance of files which fall within these search criteria to be met with relevant results of matching photos…all because of metadata.

2) Accessibility and user permissions

As discussed in the previous section, an archive should ensure that the right files and assets can be found. But what happens when you try and access that file?

Storing data within a central repository (such as a digital archival storage), enables your organisation and team to better manage these assets. Equipping your team with the ability to search and locate assets within a single database will free up time, administration effort and potential security concerns.

The accessibility element of a digital data archive is about ensuring the right people have access to the data and digital assets they need. Imagine you have a university which uploads its research data into a campus-wide repository so that all stakeholders can access it, yet there are a certain number of projects which can only be accessed by those who are directly involved. Secure data archiving can ensure that those particular stakeholders have the correct user permission to access the associated files, and all other general stakeholders (who have access to the campus repository) won’t be able to access it.

3) Safeguarding against corruption or hardware failure

When archival data is stored for anything more than a couple of years, it is at serious risk of degrading, becoming corrupted or lost. It is a common misconception that backing up data or storing it either ‘in-house’ or in a cloud environment (such as AWS, GCP or Azure) will ensure that it is safe, accessible and readable for as long as it is stored there.

These approaches do not provide any capability to mitigate against:

  • Data loss
  • Data corruption
  • Software applications and operating systems becoming obsolete
  • Hardware failure

Secure data archiving safeguards digital assets and provides peace of mind that your important collections and archived data remain safe for as long as they’re in the archive. This is achieved by a series of processes which are in place to protect against this data loss or corruption, including:

  • Regular automated checks for integrity, to quickly highly and rectify any issues with data.
  • Archiving the content within multiple separate geographic locations.
  • If using a third party archiving supplier, ensuring a copy is stored in Escrow to ensure no vendor lock-in.
  • If required, audit trails of who has accessed or viewed the data.

Another element which ensures safeguarding is data immutability whereby, once assets have been uploaded into the archival digital storage, they cannot be amended or tampered with…by anyone. Permitted users can access and download them, but they cannot alter them.

 

Benefits of a digital archive

The primary benefits of using digital storage to archive data are that it can:

  • Guarantee the long-term safety and security of the data.
  • Ensures that those who will require access to the data in the future will not only be able to find it, but access it (as long as they have permission).
  • Evidence compliance with data retention regulations.
  • Make historic research and work accessible to current and future generations.
  • Leverage well structured historic data to train AI systems.

 

Conclusion: To archive or not to archive?

I hope we’ve made it clear what we mean by digital archiving (and by extension what a digital archive is) and how it differs to a backup or other storage solution. 

As the generation of data continues to increase, long-term secure data archiving and management must become a key focus for organisations around the world and across industries. Not only will this ensure that assets do not become lost, but it will also unlock additional value of this data long into the future.

In our next piece, we’ll delve into digital preservation and outline how this goes a step further and protects the future readability and usability of your archived data.

Arkivum image

Tom Lynam

Tom is the Marketing Director at Arkivum. He joined the business in January 2020 tasked with driving new business growth and building the brand into new sectors such as Pharmaceutical and Life Sciences. He has over 12 years’ experience in several diverse marketing leadership roles across technology and professional services organisations.

Get in touch

Interested in finding out more? Click the link below to arrange a time with one of our experienced team members.

Book a demo

SHARE

Related resources

Interested in finding out more?

Message us via our contact us page or book some time in with one of our experienced team. We’ll arrange an initial exploratory discussion to better understand your requirements, and whether the Arkivum solution will help you solve your challenges.