The genome is the genetic blueprint that determines the characteristics of an organism, and for viruses DNA and RNA are the building blocks of genome sequences, and direct manipulation of these nucleic acids can make a real difference to the organism. Therefore, the development of genetic engineering focuses on our ability to manipulate genome sequences. But this is a difficult task. For example, precise control of a specific type of engineered RNA molecule called a toenail switch can provide important insight into the cellular environment and potential diseases.
However, previous experiments have shown that “toenail switches” are not controllable, and in many cases organisms do not respond to modifications, even though they have been designed to produce the desired output for a given input according to known RNA folding rules.
With this in in place, two research teams from Harvard University’s Wyss Institute and the Massachusetts Institute of Technology have developed a set of machine learning algorithms that can improve the process. They used deep learning to analyze a large number of toenail switch sequences to accurately predict which toenails reliably performed the expected tasks, allowing researchers to determine high-quality toenails for their experiments. Their findings are published today in two separate papers in the journal Nature.
The first step in solving any machine learning problem is to collect data from specific areas to train the model. The researchers collected a large data set consisting of a sequence of toe switches. Co-first author Alex Garruss, a graduate student at Wyss, said.
“We designed and synthesized a huge toe switch library of nearly 100,000 by systematically sampling short trigger areas along the entire genome of 23 viruses and 906 human transcription factors.”
With two separate teams, the researchers tried two different techniques to deal with the problem. The authors of the first paper decided not to analyze the toenail switch as a base sequence, but as a 2D image of the possibility of a base pair. The method, known as Visualizing Secondary Structure Saliency Maps, or VIS4Map, successfully identifies the physical elements that affect the performance of the toehold switch, providing insight into RNA folding mechanisms not found using traditional analytical techniques.
After generating a dataset of thousands of toenail switches, one team used computer vision-based algorithms to analyze the switch sequence into two-dimensional images, while another used natural language processing to interpret the sequence as “text” written in RNA’s “language.”
The authors of the second paper created two different deep learning architectures that use orthosis techniques to address the challenge of identifying “susceptible” toenail switches. The first model is based on reel neural networks (CNN) and multi-layer perceptors (MLPs), which treat toenail sequences as 1D images, or nucleotide baselines. Using an optimization technique called “Sequence-based toehold optimization and redesign model” (STORM), it identifies the patterns of the base and the potential interactions between these bases to mark the toehold of interest.
The second architecture models problems into the field of natural language processing (NLP), treating each toe sequence as a phrase made up of word patterns. The task is then to train a model to combine these words or nucleotide bases to make a coherent phrase. This model, combined with the CNN-based model, creates NuSpeak. This optimization technique redesigns the last nine nucleotides of a given toenail switch while keeping the remaining 21 nucleotides unchanged. This allows the creation of specialized toenail switches to detect the presence of specific pathogen RNA sequences and can be used to develop new diagnostic tests.
By using both models in turn, the researchers were able to predict which toenail sequences produced high-quality sensors.
To test both models, the researchers used their optimized toenail switches to sense fragments of SARS-CoV-2, a viral genome that causes COVID-19. NuSpeak improved sensor performance by an average of 160%. STORM, on the other hand, has created better versions of four SARS-CoV-2 virus RNA sensors, improving their performance 28 times. For these impressive results, co-lead author Katie Collins of the Wyss Institute, said.
“One of the real benefits of STORM and NuSpeak platforms is that they enable you to quickly design and optimize synthetic biology components, as shown by the toenail sensors we developed for the COVID-19 diagnostics.