In 2014, Google introduced the Sequence-to-Sequence model, which aims to map sentence text to a fixed-length vector, where the length of input and output may vary. Seq2Seq is commonly used in text generation tasks in the NLP domain, such as summary generation, grammatical error correction, sentence merge, and so on.
Although the latest research has the advantage of an end-to-end approach in text generation that is more than before, its own irinterpretation makes the model require a lot of training data to achieve acceptable performance levels, and on the other hand, it is usually only word-to-word, which is inherently slow.
Recently, Google’s research team opened up a text-editing model, LaserTagger, to reason a series of editing actions to convert the source text to the target text. The researchers assert that LaserTagger’s processing of text generation is generally error-prone and easier to train and execute.
Before that, Google had released Meena, a neural network of 2.6 billion parameters that could handle multiple rounds of conversation. In early January, Google also presented a Reformer model in its paper that could handle all novels.
GitHub Link: https://github.com/google-research/lasertagger
LaserTagger Design and Features
For many text generation tasks, there is a high degree of overlap between input sands, and LaserTagger takes advantage of that. For example, when detecting and correcting grammatical errors or multiple merged sentences, most of the input text can remain the same, with only a small number of words modified. LaserTagger then generates a series of edits instead of actual words.
Four editing operations are currently supported:
Keep (copy words to output)
Delete (Delete words)
Keep-AddX (add phrase X to mark word before)
Delete-AddX (Remove marked words)
The following illustration illustrates the application of LaserTagger in sentence consolidation.
Note: In laserTagger’s forecasted edit operation, remove “Turing” and add “and he”. Note that there is a high overlap between input and output text.
All added phrases come from a restricted vocabulary. The glossary is the result of an optimization process that has two objectives: (1) minimizing the size of the vocabulary and (2) maximizing the number of training samples, where the only necessary words added to the target text come from the vocabulary. Limiting the vocabulary of phrases makes the output decision less space and prevents the model from adding arbitrary words, thus alleviatetheous “illusion” (Note: the model does not exist in the generated text, the input information).
One inference of the high overlap of input and output text is that the required modifications are often local and independent of each other. This means that editing operations can reason in parallel with high precision, significantly increasing end-to-end speed compared to the self-regression seq2seq model of sequential execution reasoning.
The researchers assessed LaserTagger’s performance in four tasks: sentence merge, split and retelling, abstract summary, and grammatical correction. The results showed that LaserTagger had a similar score with the BERT-based seq2seq baseline with a large number of training samples, and was significantly better than the baseline when the number of training samples was limited. The results on the WikiSplit dataset are shown below, where the task is to rewrite a long sentence into two coherent short sentences.
Note: LaserTagger and BERT-based seq2seq baseline scores were equal when training models on a complete data set of 1 million samples, but when training on subsamples of 10,000 or fewer samples, LaserTagger is significantly better than the baseline model (the higher the SARI score, the better).
LaserTagger’s Key Advantages
LaserTagger offers the following advantages over the traditional seq2seq method:
Control: LaserTagger is less prone to “hallucinations” problems than the seq2seq baseline by controlling the output of phrase vocabulary (which can also be manually edited or collated).
Reasoning Speed: LaserTagger calculates the reasoning speed 100 times faster than the seq2seq baseline, which can meet the real-time problems in the actual situation.
Data efficiency: LaserTagger produces reasonable output even if only a few hundred or thousands of training samples are used for training. In the experiment, the seq2seq baseline required thousands of samples to achieve the same performance.
The Google team concluded: “The benefits of LaserTagger become more apparent in large-scale applications, for example, by improving the format of voice responses in some services by reducing response times and reducing repeatability.” High inference speed allows the model to be inserted into the existing technology stack without adding any significant latency to the user side, while improved data efficiency can collect training data in multiple languages for the benefit of users from different language backgrounds. “
RELATED LINKS: https://ai.googleblog.com/2020/01/encode-tag-and-realize-control-and.html