Earlier this month, Chinese tech giant Baidu quietly beat Microsoft and Google in an ongoing competition for artificial intelligence, according to mIT Technology Review. Specifically, Baidu’s AI algorithm ERNIE leads its competitors in the General Language Understanding Evaluation Test (GLUE).
GLUE is a good standard for AI systems to understand the human language. It consists of nine different tests, including choosing the names of people and organizations in a sentence and figuring out what pronouns such as “it” mean when there are multiple potential forewords. As a result, language models that score high on GLUE can handle a variety of reading comprehension tasks. In a score of 100, the average score in the previous GLUE test was 87. Baidu is now the first team to score more than 90 points with its model ERNIE.
GLUE’s public rankings are constantly changing, and another team is likely to soon surpass Baidu. But it’s worth noting that Baidu’s achievements illustrate how AI research benefits from a wide range of contributors. Baidu researchers must develop a technology specifically for Chinese to build ERNIE (representing a “knowledge-enhanced semantic representation model”). As it happens, the same technique also gives it a better understanding of English.
The natural language model was not so good until Transformer’s Bidirectional Encoder Representation (BERT) was created in late 2018. They’re good at predicting the next word in a sentence (and therefore great for auto-completion), but even after a short period of time, they can’t afford any idea. This is because they do not understand the meaning, such as what the word “it” might refer to.
But BERT changed that. Previous models have learned to predict and interpret the meaning of a word only by considering the context that appeared before or after it appeared, not both. In other words, they are one-way.
IN CONTRAST, BERT CONSIDERS THE CONTEXT BEFORE AND AFTER A WORD AT A TIME TO MAKE IT GO BOTH WAY. It uses a technique called a mask to do this. In a given text paragraph, BERT randomly hides 15% of the words and then tries to predict from the rest of the words. This allows it to make more accurate predictions because it has twice the work lead. For example, in the sentence “Men go to buy milk”, the beginning and end of the sentence indicate the missing word. It’s a place where you can go and where you can buy milk.
Using masks is one of the core innovations behind major improvements to natural language tasks, and is part of the reason models such as OpenAI’s famous GPT-2 can write compelling essays without straying from the central theme.
When Baidu researchers began developing their own language models, they wanted to build on masking techniques. But they realize that they need to adjust to Chinese. In English, words act as semantic units, which means that words that are completely out of context still contain meanings. Chinese characters cannot be said to be the same. Although some characters do have intrinsic meanings, such as fire, water, or wood, most characters can only be strung together with others. For example, depending on the match, a character spirit can represent a clever (intelligent) or a soul (soul). Once separated, the characters in proper nouns (for example, Boston or the United States) are not the same thing.
As a result, the researchers trained ERNIE on a new version of the mask, which hides strings instead of a single character. They also trained it to distinguish between meaningful and random strings, so that the correct combination of characters can be masked accordingly. As a result, ERNIE has a deeper understanding of how words are encoded with Chinese and is more accurate in predicting missing fragments. This has proven to be useful for applications such as translation and information retrieval from text documents.
The researchers soon discovered that this method actually applies to English. English has similar word strings that represent different meanings than their parts and sums. It is not possible to meaningfully parse the meaning of a proper noun such as “Harry Potter” from an expression such as “chip off the old block”.