DNA digital data storage: an efficient and sustainable solution for long-term data preservation?
From floppy disks to USB sticks, the process of storing data has evolved over the years to keep up with rapid technological advancements. However, in the face of climate change, we must turn to more environmentally sustainable solutions to keep up with growing data storage demands. And what better way to adapt to these needs than using a resource that has existed for billions of years? Seems counterintuitive, but really, using this ancient resource could propel us into the future of data storage and preservation. This resource is DNA.
What is DNA data storage?
As a recent scientific breakthrough (potentially a significant breakthrough in the data preservation world), DNA digital data storage refers to the process of encoding data into DNA strands.
To get more technical, this process entails encoding data into DNA nucleotides (think C,G,A,T instead of 0s and 1s) that form sequences. These DNA sequences are then stored within a temperature-controlled environment to ensure its preservation and safe-keeping. Upon requiring the retrieval of such data, the DNA strands are placed into a machine for sequencing. The DNA sequences are then decoded back into the original data.
The outcome? Large quantities of data can be reduced to small volumes of physical DNA material. To put this in perspective, if we were to take all the movies ever created and synthesize these into DNA strands, it would “fit inside a volume smaller than a sugar cube”. Even more surprisingly, if we were to take all the digital data in existence today and encode them into DNA strands, this would fill a coffee mug.
Today, there are over 10 trillion gigabytes of digital data in existence, with this figure increasing by around 2.5 million gigabytes per day. These vast amounts of data are hosted within large data centres that are costly to run and can have high carbon footprints for solely storing the data. Therefore, it is important that we explore more sustainable approaches. That’s where DNA-based storage steps in; it’s energy requirements for maintaining its environment are low, making this a much more energy efficient solution for data that doesn’t need to be accessed frequently, which is often the case for archive data.
The use of DNA could be hugely beneficial for long-term digital storage with data being able to survive thousands of years if kept at the correct temperature. Once synthesised and stored, the cost of its storage without accessing is low as there is rare need for migration due to the solution lasting for hundreds of years. This means that long-term, DNA as a long-term digital storage solution renders itself as cost efficient.
Regarding to its security, DNA as a data storage tool is extremely secure (if the physical security of the DNA is secure). If protected against all three of DNA’s enemies (excessive light, extreme temperature and high oxygen rate) the DNA can be preserved for many years.
Another significant benefit of this method is that once created, the encoded DNA can easily be copied and replicated. It’s very cheap to make these additional copies which is beneficial for verification and preservation.
So, upon deepening our conversation, one question remains;
How can this impact your current data storage and preservation solutions?
Well, we are still far from this being a wide-spread and accessible solution. In fact, the current methods of synthesizing the DNA are expensive, costing around $3,500 USD per 1 megabyte of information. For access, the costs continue with 1 megabyte of information costing around $1000 USD for reading. Though as mentioned previously, the cost of storage is next to zero after its synthesis and there are efforts being made to reduce the costs of the processing. In fact, the DNA Alliance and IARPA MIST projections aim for costs to reduce to $1000 per terabyte by 2024 and $1 per terabyte by 2030.
Aside from the costs, DNA as a method of data storage proves difficult for accessibility. The process of encoding and decoding the data is long and complex, making it difficult for retrieval of specific data, meaning it lacks easy accessibility. Therefore, this method may be inefficient for organisations that need regular access to their data archives.
What about for preservation?
Speaking in terms of its preservation potential, DNA methods are estimated to preserve information for far longer than any current method available on the market today. To put this into perspective, I’ve created a table outlining current preservation methods and their timelines.
Conversely, DNA as a long-term digital preservation tool may pose difficultly for the future in terms of long-term reuse and readability. There is little point in being able to store data for 20,000 years when only after a fraction of that time will formats become obsolete, and thus unusable.
Successful long-term digital preservation requires active and ongoing management of the data; you cannot simply store and forget. Maintaining files in preservation formats and regular data integrity checks are just two examples of processes that ensure long-term access and use of the data. But how will this work for DNA?
Perhaps similar techniques will need to be implemented to ensure that the data stored on DNA can more easily managed then it’s current process. While we do not know for certain where technology will go, in order to prevent loss of access and usability of these files, there must be a strategy to ensure it has effective preservation capabilities.
While this new data storage solution is potentially highly advantageous for long-term preservation, its cost and time difficulties make this an inefficient method for data that often needs accessing.
However, these methods are in its early phases of development and still pose major potential for data storage and preservation.
Here at Arkivum, we will be posting updates relating to the topic of DNA digital storage and preservation. To keep up to date with our latest news and blogs, sign up to our newsletter here.
Alternatively, if you’ve got important data that requires regular easy access, contact us to discuss your specific needs for an effective digital storage solution.
23 Mar, 2022
eBook: An eTMF Archiving and Preservation Guide – Revised for 2022
09 Jun, 2022
Post-pandemic, life sciences refocuses digital transformation on strategic challenges and the potential of data
20 Apr, 2022