DNA hard drives will disrupt people’s perception of data storage. In our current storage world, the hard drive must be the hard disk, the tape must be the shape of the tape, the disc must be the shape of the disc, and the DNA hard drive is not limited by the shape.
Original title: All things hard drive! A “rabbit” empirical DNA storage is ubiquitous, China has been listed as a key special
Source: DeepTech Deep Technology
To confirm this claim, the scientists created a rabbit using 3D printing, and the rabbit’s 3D structure data is built into the printed material in the form of a double-stranded DNA structure. That is, by encoding and decoding, the rabbit model realizes the DNA storage and transmission of its own data. By the disputing, dna storage is possible for everything in the world.
Simply put, data writing is synthetic DNA, data reading is DNA sequencing, and copy of data is dna replication.
Moving charts . . . DNA data storage expands the possibilities of embedding information directly into everyday items. (Source: Wired)
The study, published today (December 10, Beijing time), was published in the journal Nature Biotechnology and was led by MyHeritage, lead scientist and lead scientist. Yaniv Erlich, an associate professor at Columbia University, and Robert Grass, a professor at the Functional Materials Laboratory at the Federal Institute of Technology in Zurich.
Yaniv Erlich will be a guest speaker (click for more) to attend the 3rd EmTech China Global Emerging Technology Summit in Beijing from December 13-14 to share with us the latest developments in MyHeritage in the life sciences sector. He said he would show the rabbit to the audience at a speech in Beijing.
The first rabbit to store data in DNA
Figure : Stanford Rabbit, which contains DNA data. (Source: Federal Institute of Technology Zurich)
There’s a concept to mention here, the Stanford Bunny. This is not the work of an artist at will, but a 3D test model widely used in the field of computer graphics, produced at Stanford University in 1994.
The researchers converted binary data from stanford rabbits 0 and 1 into data from four bases in DNA (A, T, C, G) and then encapsulated DNA fragments in silicon dioxide pellets (160 nanometers in size), which were embedded in biodegradable thermoplastic polyester. Finally, the resulting thermoplastic polyester is used for 3D printing of rabbits.
This is a DNA storage coding process, consisting of compression, error correction, and conversion of three parts. Before converting dna data, the Stanford Rabbit’s binary stereolithography file size was 100KB, and the digital blueprint for synthesizing DNA encoding was compressed to 45KB, in order to maximize the use of DNA storage space and eliminate redundancy of information for compression purposes.
Each oligonucleotide length is 145 nucleotides, consisting of 104 active fragments and 41 nucleotide polymerase chain reaction (PCR) annealing sites. The researchers then used DNA fountain coding (DNA Fountain) to convert digital information into DNA sequence information, or 12,000 DNA oligonucleotides, and then encapsulated PCR-amplified oligonucleotides into silicon dioxide pellets, each containing dozens of synthetic DNA molecules. Of course, this assembly is made in vitro synthetic, which avoids cell excitability and biological activity. The DNA chain is encapsulated in a silicon dioxide ball to prevent DNA degradation.
Figure : Stanford Rabbit’s 3D printing and decoding schematics, divided into binary data conversion into DNA data, DNA encapsulation, embedded thermoplastic polyester, printing, DNA decoding and other processes. (Source: Nature-Biotech)
Figure: The restored structure of the Stanford Rabbit. (Source: Nature-Biotech)
So, how to decode? The principle is to use PCR technology to copy and amplify the stored DNA fragments to back up, and then the amplified DNA fragments to sequence, to obtain the base sequence after the sequence correct, de-redundancy, decoding, you can get the original information.
In the study, researchers used DNA stored in rabbits to replicate rabbit data. Specifically, the researchers cut 10 milligrams of printed material from the rabbit’s ears, which accounted for 0.3 percent of the rabbit’s total weight of 3.2 grams, and then extracted the DNA (which takes four hours) to amplify and sequence (17 hours). Despite 5.9 percent of the original oligonucleotides lost and the presence of sequencing errors, the researchers used a DNA fountain decoder to perfectly interpret the Stanford rabbit’s data. The decoding process only takes a few minutes to run on a regular laptop.
So in this cycle, the previous generation of enlarged DNA is encapsulated into the next generation, and the researchers created five generations of rabbits in a row without any loss of information. Even after a nine-month gap between the fourth and fifth generations, DNA information remains highly fidelity and stable.
To expand the study, the researchers also encoded a video of the Warsaw Ghetto archive into a pair of glasses from a resin glass. A small piece of resin glass can restore the hidden information.
Yaniv Erlich told DeepTech that the biggest breakthrough in the study was the theory that everything can be stored in DNA without any shape limitations.
1 gram of DNA can store 220 million movies
The density of DNA data storage is incredible. Several are said to store 215 pbs of information in 1 gram of DNA, while hard drives can store just a few Ts. You know, 1PB to 1024TB, and 1TB to 1024GB, 1 gram of DNA can store 220 million movies based on 10GB per HD movie.
The reading of DNA information does not involve compatibility issues, and DNA is a biodegradable material that is more environmentally friendly than other storage media. In addition, DNA has strong anti-jamming ability for high temperature, shock and other external environment.
In view of the above characteristics, coupled with the DNA is in order to encode the storage of information, and the storage of information segments there is a starting point and termination point, but also the introduction of error-correcting code to ensure the integrity of information, so DNA has become the focus of data storage research. In particular, cold data that is not commonly used but requires long-term preservation, such as government documents, historical archives, etc., is particularly suitable for DNA storage.
Since the discovery of the double helix structure of DNA in the 1950s, scientists have come up with the idea of storing data using four bases of DNA. Professor George Church’s team at Harvard University stored the book data (659kB) of a Church book in DNA in 2012, using a two-to-one correspondence in which binary “0” is represented by glands or cytosines, while binary “1” It is represented by ostrich or thymus.
In 2017, Yaniv Erlich et al. reported in the journal Science that they had deposited six files in DNA, including a complete computer operating system, a computer virus, a French film, and the founder of The Insation. A study by Claude Shannon, an American mathematician, in 1948.
In the study, in the journal Science, Yaniv Erlich used DNA fountain coding techniques used in Stanford’s rabbit study to randomly package DNA fragments into “water droplets” to store them, adding additional labels to allow them to be reassembled later. The technology has independent randomness, and the complexity of the compilation code is low, there is a fault-correcting mechanism, can recover the storage information with high probability.
Barriers to “DNA of Everything”: high cost and low time efficiency
The researchers called the rabbit’s hard drive the “DNA-of-things, DoT” storage architecture, which produces material with constant memory.
So what are the applications of this “all-encompassing DNA” storage architecture? Yaniv Erlich et al. cites the example of 3D medical or dental implants, since each structure is unique, it can be customized based on the patient’s precise anatomy. Given that the silicon dioxide ball is non-toxic, the implant’s design and other medical information can be stored in it, resulting in a long-term backup of electronic medical records, which are usually retained for only 5 to 10 years.
In addition, the authors believe that this technology can also be used for the storage of cold data such as construction, pharmaceuticals and electronic components.
Another application of “DNA of everything” is information-secretive. Because everyday items can be carried by secret data, data thieves face multiple barriers to cracking: First, because silicon dioxide balls do not change the properties of the storage medium, the cracker must test multiple items to be able to find the storage media. Second, because DNA is isolated in silicon dioxide pellets, common DNA sensing techniques such as ultraviolet light will not be able to detect DNA. Again, even if the cracker restores the DNA library, it will be necessary to find an annealing site to amplify the information through pcR.
The author also believes that the technology’s powerful self-replication capabilities are promising, and it has found good ideas for localized data storage and offline storage.
At present, the relevant technology has a patent layout. Yaniv Erlich holds patents in the field of DNA storage; the Federal Institute of Technology in Zurich holds patents on DNA encapsulation; and Yaniv Erlich and Robert Grass are the inventors of DoT patent applications.
The rabbit is expensive and costs about $14,000 to complete the DNA data store. On the cost issue, the authors argue that while the cost of DNA synthesis is still high for custom items, the cost of DNA libraries will become insignificant in order to mass produce.
However, costs remain an obstacle. Although the cost of DNA synthesis and sequencing has decreased exponentially each year, with several reported to have fallen from $218,750/megabit in 2002 to $4.41/megabit in 2016, this is still expensive relative to normal hard disk storage.
In fact, the more important reason why DNA storage is far from consumer-grade applications is the inefficient time. “If you want to be applied to the general consumer, you also need to make the sequencer portable and the sample very efficient,” Yaniv Erlich said. “
As can be seen from the Stanford Rabbit’s encoding and decoding process, the process takes dozens of hours. This means that expanding the range of DNA storage applications requires, in addition to reducing costs, the performance of writing or reading information anytime, anywhere, like hard drives and tapes.
Zhang Cheng, an associate researcher at Peking University’s School of Information, has long been involved in research in the fields of DNA molecular computing and nanointelligence, including DNA computing and storage, molecular circuits, self-assembled nanoporous devices and nano-intelligent robots. He told DeepTech that the biggest obstacle to the DNA storage development path is the efficiency of input and reading, and that time costs are a very big problem. “I can spend a week in the lab decoding, but no average consumer wants to wait, and when it comes to DNA amplification, the time requirement is inevitable. “
This requires that DNA encoding, storage and decoding be portable. With the further development of portable DNA sequencers, DNA sequencing may be possible at any time. However, this is only the decoding link.
China’s key projects are already in the layout
At present, DNA storage in the domestic is an emerging field, revolutionary breakthrough also requires the joint efforts of scientists in the field. Zhang Cheng said that the field of DNA storage actually began to accelerate after 2016, as the domestic development is not fast enough for two reasons. One is that the threshold in the field of research is very high, requiring computer science, biology, chemistry and other fields of collaboration. The second is that although DNA storage technology has a wide range of applications, there are still huge time and cost challenges, “when to really enter the commercial market, but also depends on the development of relevant cutting-edge DNA nanotechnology.”
There are already big companies eyeing DNA storage. Since 2015, Microsoft Research has been working with researchers at the University of Washington to conduct DNA data storage research, hoping to turn synthetic DNA into a durable, easy-to-operate, high-density information storage medium.
In 2016, the team successfully stored four image files on to a piece of artificial DNA and extracted them intact.
In March 2019, they made the first fully automated DNA data storage and extraction. In this experiment, the team developed a fully automated end-to-end system that writes the word “hello” in synthetic DNA fragments and converts data from DNA back to common digital information. Microsoft says the automation technology is an important milestone in getting DNA data storage out of the lab and in commercial data centers.
In fact, the Chinese government is also increasing its support in this area. According to the Ministry of Science and Technology, the Shenzhen Municipal People’s Government, “the Ministry and the city joint organization to implement the national key research and development plan “synthetic biology” key special framework agreement,” the central government and Shenzhen Jointly funded the implementation of the “synthetic biology key special project.”
The project “Technical research and development of data storage using synthetic DNA” is described in the “Synthetic Biology” Key Special 2018 Project Declaration Guide:
Research content: the development of synthetic DNA efficient, fast, high-density data encryption coding transcoding, random reading, non-destructive interpretation of new methods, the development of multi-type data storage DNA media, the development of rapid coding, storage and data reading through synthetic DNA integrated software system.
Assessment indicators: develop 1 set of DNA data coding algorithm, realize the high density storage of data information to DNA code (unit coding efficiency bits/base and 1.6);
According to information from the Southern University of Science and Technology in July this year, the Ministry of Science and Technology announced the national key research and development plan “synthetic biology” key special 2018 project list, The Southern University of Science and Technology, the department of biomedical engineering, Professor Jiang Xingyu as the project leader of the project “the use of synthetic DNA for data storage technology research and development” Successfully selected, its lead “technology research and development using synthetic DNA for data storage” project total funding of 22.03 million yuan.
The project was led by the Southern University of Science and Technology, shanghai Jiaotong University, the Chinese Academy of Sciences, the Institute of Applied Chemistry, Fuzhou University, Tongji University jointly declared. The project aims to promote China’s original innovation and scientific breakthrough in the basic research of DNA data storage by developing new storage technologies to cope with the explosive growth of big data, addressing the contradiction between rapid data growth and effective data storage and utilization.
“Because of the interdisciplinary nature of DNA storage technology, we must rely on the collaborative development of computers, biology, chemistry, mathematics and many other related disciplines, so that our country can take the lead in the international competition for DNA storage, ” Zhang pointed out.
Multidisciplinary cross-cutting and international collaboration is essential in the face of historical opportunities for the disruptive technology of DNA storage. For example, the paper’s author, Yaniv Erlich, who published a Science article at Columbia University’s School of Computer Science in 2017 designing the DNA fountain algorithm, and Robert Grass, another author of the paper, is Professor Titulary of the Functional Materials Laboratory at the Federal Institute of Technology in Zurich. He is responsible for encapsulating THE DNA into the glass.
“As domestic researchers, we also pay great attention to cross-disciplines, ” Zhang said. “The computer students of Xu Jin-Zhang Cheng’s joint task force at Peking University are not only familiar with computer technology such as programming, but also able to walk into biochemical laboratories to conduct sophisticated DNA nanotechnology experiments. For example, work on a recyclable DNA circuit built by the joint team in 2019 was published in the American Chemical Society (JACS), the first interdisciplinary JACS paper in the history of Peking University’s Software Institute.
Zhang Cheng admitted that the current DNA storage-related research is still in the laboratory exploration stage, belongs to the basic research stage. Therefore, the guidance and support of the relevant policies of the state is essential for the development of DNA storage in China.