Metadata is defined as a set of data that describes and gives information about other data. Without it, you have no view of what is going on with your unstructured data, context as to what the data is or what it is for; who created the file and when, what does the file relate to, does it have a retention period.
Metadata commonly refers to structured data that is held in a database, previously referred to as:
- Data Warehouses
- Integration Engines
Typically, these databases only connect structured data together, leaving unstructured data unmanaged, and in a lot of cases unusable. Traditionally, these types of unstructured files cannot be dealt with in the same way.
Therein lies the problem, that typically organisations cannot associate files and contents with each other when dealing with this type of data. Traditional methods such as the data warehouse or integration engine cannot solve the problem on their own.
This challenge presents itself most prominently in regulated markets, such as pharmaceuticals, medical device companies or financial services, where it is imperative to be able to cross reference data, or where data or data subjects have a specified retention period.
How do you ensure that you are dealing with data in line with regulations?
The Amazon package analogy
To give a very simple description of what metadata could be I like to use the analogy of an Amazon package:
You’ve received a box with your purchased item, without the metadata you would not know much about the package you have just received until you open it up, you need some more information to identify the contents. Much the same as a file held in a repository in your organisation.
The metadata that you might receive for this Amazon package which allows you to find out more about it could be:
- Where the package has come from
- The weight of the package
- Who the package is for
- The name of the item within the package
If we imagine that the Amazon package is a piece of unstructured data, we can begin to unwrap the complexity.
When it comes to this unstructured data, there might be some metadata that comes packaged with the file such as the name of the file, how big the file is, and potentially what type of file it is.
However, this is not always the case and is not typically enough when it comes to managing regulations, retention periods, subject access request etc. You still need to be able to reference this metadata and link it to other pieces of data.
There is no history, providence, description of the file, compliance information, information on whether the data is sensitive or not. You may struggle to do anything useful with the data because you do not know what the data is.
Why is metadata the forgotten hero?
Sinead McKeown explains the importance of metadata in this short video:
Metadata allows organisations to capture contextual information about a file or piece of data allowing information governance, compliance, and descriptive information so you know what the file contains, how long it should be kept for and how it relates to other files in your possession. This allows you to govern, search and put context to your data, adding a huge amount of value to your data.
Metadata is at the heart of what we do, by using Arkivum’s long-term data lifecycle management solution organisations are able to make sense of their structured and unstructured data, whilst meeting compliance requirements and ensuring that their data is usable, searchable and preserved into the long term.
If you would like further information about our digital archiving and preservations solution, please contact us.