Machine Learning: The next frontier for eTMF management? - Arkivum

Archiving & Preservation / 12 Aug, 2021

Machine Learning: The next frontier for eTMF management?

Before we explore the impact of Machine Learning upon the management of the eTMF, I think it’s important we outline what it is, what it is not and how it works.

Machine learning (ML) has been defined as the process in which machines can learn to adapt through experience and be taught to learn from themselves. Likened to feedback loops, data scientists have been employing applied and general ML for decades to improve the utility and functionality of data systems. Applied ML is more common, meaning it’s specifically task focused: programs that run autononmous vehicles, or one that trades stocks based on market trends, for example. General ML on the other hand, in theory, can learn to handle any task.

Machine learning should not be confused with its data science cousin Artificial Intelligence (AI). Artificial intelligence applies ML to support the deeper artificial understanding that is needed to deliver problem solving that the brain does on a human level.

But that doesn’t diminish ML’s potential power. For much of the digitised world it is coming in quite handy. One of the best examples is Otter – an online application that uses ML-enabled natural language processing to create highly accurate transcripts from meeting recordings. Digital image-based quality control, face recognition on your iPhone and music apps that suggest new artists and sort your selections based on trends your preferences reveal, are similarly common ML applications. Because ML recognises patterns and can sort and parse vast nuances in the data it can automatically learn how to improve the desired outcome.

As a branch of data science, ML has the potential to enhance and improve the quality of data acquisition whilst also improving the processes required to manage and utilise large quantities of data. For example, applied ML has the potential to help drug sponsors better understand large volumes of data stored within documents, optimise processes, and ultimately manage potential risk more effectively.

How does Machine Learning work?

Very broadly, an ML application learns by processing examples of what it is being trained to analyse without relying on simple if/then assumptions typical of traditional programs (i.e. if x exists, then conclude y). For example, a network can learn to identify specific regulatory documents based on unique features such as a signature box position in combination with, or lack of, other identifying characteristics. It can then classify and index the document accordingly.

The promise of Machine Learning

The scope of clinical studies are expanding as the globalisation of pharmaceutical healthcare grows. Documentation now has the potential to come from numerous clinical sites and different countries, and even though English is standard, if it is a second language then nuances could be introduced that can impinge reporting and metadata quality. ML could help understand a site’s identifying metadata that may or may not be present, and the scanned documents which often come with handwritten notes and signatures can be scrutinised accordingly.

For the sponsor, this creates a large volume of documentation that must be identified and tagged with metadata for presentation to auditors during an agency inspection to very specific standards.

ML has great potential to improve clinical trial data and documentation processing and accelerate eTMF filing processes by reducing manual input and learning more accurate ways of automating document classification and indexing for filing. In time, verification and automated archival of the bulk of the documentation collated would ‘round-off’ the application for ML in this scenario.

Relative to current state-of-the-art eTMF data management and archiving solutions, ML has the potential to enhance certain functional utility, as well as the ability to support critical elements of trial data integrity and compliance including:

1 – Classification

Electronic Content Management (ECM) based eTMF repositories provide automated methods and workflows to collect, classify, index and report on content. The core of any ECM system is a schema or classification system (such as document tagging terms or metadata) and a relational database that retains eTMF content for search, reporting and other management tasks.

All organisations involved in clinical trials maintain a TMF comprising of thousands of pages of documents required for regulatory and evidential purposes. The archive has to maintain these documents in a manner that continues to ensure ongoing compliance with these regulatory requirements for as long as they are needed.

In this instance, ML technologies could potentially be applied to eTMF management systems to improve and accelerate processes associated with classification. For example the acquisition of critical metadata embedded in legacy compliance documentation.

 2 – Anonymisation

The gold standard in clinical study remains the blinded randomised control trial. To maintain this protocol, all documentation store within the eTMF must be redacted to hide all personally identifiable information, such as emails, bank account numbers, social security numbers or dates of birth.

In the EU, there is also a requirement to meet robust privacy regulations like GDPR. Potentially, a ML metadata extraction solution could form a neural network that would train itself to recognise an extracted attribute – such as the name of a redacted study – and cross-reference it to an index that links the document containing it to the correct study.

3 – Quality

Automated indexing is another way  in which ML could boost efficiency and save time because by providing a higher level of repeatable quality that supports inspection readiness.

It is currently being used to promote metadata standardisation by reducing errors and increasing consistency. Resulting quality assurance improvements in eTMF content are likely to lead to fewer corrections and less rework.

Algorithms and rules can be created to confirm that the correct information is present and in full and through ML automatically finding anomalies and quality issues.

4 – Archival

One of the most important measurements in managing trial master files is completeness. Before an eTMF can be archived it is essential that a review is undertaken. Furthermore, the guidance states that trial stakeholders must have a quality management system (QMS) in place with procedures and processes established for TMF management which ensure its completeness, quality and accuracy.

Given the volumes of information collated across a trial, it is potentially very time consuming to do this after it has closed out, so a continuous approach with regular reviews is considered best practice. Much of what is necessary in these reviews – correct indexing, the presence of certain documentation, correct metadata and anonymisation of patient information – might be delivered through ML technology.

Current solutions providing high performance eTMF management

Certainly data science is buzzing about the potential ML and AI has to enhance data retrieval and content/document management. Given the complexities of eTMF management, it is likely that next-gen solutions will begin to incorporate its principles in meaningful ways. Arkivum’s engineers are at the forefront in exploring the possibilities and potential of ML in future Arkivum eTMF offerings.

There is potential for ML to improve eTMF management’s future, but drug developers are busy now and need solutions to answer today’s eTMF control and compliance complexities. Arkivum provides a fully validated digital archiving and preservation solution. Easy to setup, the platform provides confidence and the protection pharma development’s regulated commercial environment requires, regardless of where drug developers are in the clinical trial process.

To further support compliance Arkivum regularly deliver’s validated (CSV) releases of the software with a special focus on QMS and related SOPs. Confident in our quality processes and fully welcome customer audits in support of their QSM audits and reviews.

Arkivum’s proven eTMF applications ensure eTMF content is preserved, accessible and usable for the whole retention period in support of compliance.


Arkivum can help clients bring order, stability, and control to trial eTMFs while attaining the highest standards of completeness, timeliness and quality.

Arkivum provides validated digital archiving support and software that keep archived trial data safely alive in an immutable, trustworthy format that ensures prompt, painless access and reference. Our experienced team is ready to protect your critical data and guarantee that it is accessible, secure, reliable, and immutable – now and in the future.

To learn more about how we deliver complete data integrity and data preservation for the long term, book a demo today.

Whitney Armstrong

Whitney is the Marketing Executive at Arkivum. She joined the business in 2020 and is responsible for supporting marketing campaigns and activities targeting key sectors. Whitney has over 5 years' experience of delivering and supporting marketing strategy for technology brands.

To receive our latest news and blogs straight to your inbox, please enter your email address.

Follow us on