Sun Maosong, Tsinghua Natural Language Processing Scientist: Let Algorithms Understand Human “Common Sense”

Voice assistant, AI customer service, machine translation… At present, the application of natural language processing (NLP) based on deep learning has achieved good results. But their results come from the “black box”, we do not know how the machine came to this result, the machine itself does not know. The “black box” problem is a long-term focus of AI research. Sun Maosong, a professor of computer science at Tsinghua University, has come up with what he sees as a unique solution to “open the black box”.

The first original article was published by the agency and was not reproduced without authorization. Vianews WeChat: lifeissohappy Twitter:


Book Airlines is in Beijing on November 4

Voice assistant to answer people’s instructions, AI customer service automatically answer human calls, part of the replacement of manual machine translation … These aI applications fall into the category of natural language processing (NLP). At present, nLP application based on deep learning can achieve good results, especially in machine translation, the final answer to human beings is becoming more and more ideal.

However, the current NLP applications are based on deep learning “black box”, popularly speaking, we do not know how the machine gave us such results, the machine itself does not know. It is like a parrot learning tongue, and does not really “understand” what it deals with with this sentence.

The “black box” problem is a long-term focus of AI research, from the famous “Chinese room experiment” has been a variety of discussionand and thinking. We have collated a speech at Tsinghua University by Hong Xiaowen, dean of Microsoft’s Asia Research Institute, and mentioned this.

On October 31, 2019, Beijing held the Beijing Zhiyuan Conference at the National Convention Center, which created the world’s most innovative academic event for a global ai-knowledge academic and innovative ecology. Sun Maosong, a professor of computer science at Tsinghua University and chief scientist of Zhiyuan, gave an interview to the media and presented what he saw as a unique plan to “open the black box”.

The most complex attempt yet to solve AI explainable problems

Sun Maosong is an international leader in the field of natural language processing, which has made outstanding achievements in the theory, methods and applications of natural language processing. A few months ago, Professor Sun was hired by the Beijing Zhiyuan Institute of Artificial Intelligence as the chief scientist in the direction of major research on “natural language processing”.

Zhiyuan Research Institute is a new research and development institution that Beijing strongly supports, and hopes that through institutional innovation, it will become a major strategic platform that brings together the research forces of the world’s top scientists and produces important international lying achievements.

At that time also announced the “Beijing Zhiyuan – Jingdong cross-media dialogue intelligent joint laboratory” unveiled. Relying on’s massive data accumulation and oversized computing power in the field of e-commerce retailing, The Joint Lab will focus on creating ultra-large-scale, open-area, real-world complex scenario data sets in the field of cross-modal dialogue and human-computer interaction, and creating forward-looking demonstration applications in the intelligent retail scene.


Sun Maosong hopes to solve the biggest challenge faced by “black box” NLP applications such as machine translation: it looks good, but the algorithm itself doesn’t have any understanding of semantics, so that the system is very fragile when dealing with complex semantics. “The front door is coming, please get off the back door” and other examples have not been overcome.

We also asked IBM experts about the same issue during the World Artificial Intelligence Conference (WAIC) in Shanghai. IBM tends to use “black boxes” to explain “black boxes” and to interpret adecisions in the AI model by using the same neural network-based approach. In August 2019, IBM released the Open Source Algorithm Collection AI Explainability 360 to enhance the interpretability of the algorithm.

But Sun Maosong believes that the black box is a last resort. The black box has played a relatively active role at the beginning of the rise of the AI boom in recent years, but the problem stoic is now more obvious. In his view, it is important for the machine to say common sense, logical words, not just grammar, but in reality is not true, or absurd.

Sun Maosong’s team identified its NLP research project at Zhiyuan Lab as “Natural Language Processing of Big Data and Knowledgeable Two-Wheel Drive” and needed to build a computer-aware, operational human knowledge base. This is obviously a more complex road than explaining it with a “black box”.

Building a World Knowledge Base and “Common Sense Library” with “Beijing Characteristics”

At present, there are Wikidata, WordNet and other industry pioneers to produce knowledge base system, enterprises also have their own knowledge map, but either not open, or not enough to be recognized by all mankind, or too shallow, “big but not strong.”


Sun Maosong believes that the key to NLP’s human logic is to give machines “common sense” that everyone understands and there is a global consensus, and that the knowledge base built for this purpose should be more appropriate lying to be a “common sense library”.

Knowledge maps such as WordNet and Wikidata are purely manual and take decades to do. Sun Maosong hopes that they can, on the basis of the integration of previous open research results, rely on the existing deep learning algorithms, by dismantling the sentence structure of the language materials, the original vast array of corpus into the relationship between the various elements of the chain, reduce the burden of human manual editing, and make the project sustainable.

He wants his team to make a library that “reflects Beijing’s characteristics and does it more deeply”, and if it can’t be done, at least part of it.

We are interested in how to define and trade “common sense” in this “common sense library”. After all, some may think the moon landing is a conspiracy theory, there may be other geopolitical or other differences, and there is a fierce “editorial war” on Wikipedia based on ideological conflict.

Sun Maosong wants to do a particle size is not so fine “common sense” system, that is, only for the more stable core part of human knowledge, beyond the scope of this common sense, is equivalent to the point of view, is allowed to have different.

“You go to a restaurant, no matter which restaurant in the world, you have to order, serve, pay after eating, don’t pay the bill, run  without paying the bill, this should not happen  it’s common sense.” ”

On the other hand, the view is flexible and inexhaustible. The point of view can be complemented by mastering big data mining based on “common sense”. Sun Maosong’s team will control the original language and output effects that make up the “common sense library”, which cannot contain factual errors.

At present, Professor Li Zi of the team has extracted a number of bilingual knowledge bases from Wikipedia, and this library and other Tsinghua NLP projects have been open source disputed on GitHub. The Tsinghua NLP group gets the star from open source, similar to the open source star of Stanford Manning, one of the best groups in the world for NLP.

Long-term help reduce the amount of data in training algorithms

Sun Maosong believes that even if the industry has the use of small data, and even data not to the cloud, in the native computing to protect privacy calls, as the direction is still difficult, not easy to produce results. Because small data lacks such common, accepted solutions as CNN, LITM, GPT2, etc. in the field of deep learning.

Small data is currently used only in limited areas, such as iOS’s inductor of the user’s own photo gallery, or the hospital’s collection of patient records, which is certainly not possible at large scale and must be based on small data sets.

Correspondingly, however, such small data can only be a case analysis (case by case) dedicated to the creation of solutions, not reusable;

If nLP based on small data needs to develop a wider range of common-purpose algorithms, it must involve pre-processing of “common sense”. From this point of view, Sun Maosong’s team is trying to create a knowledge map that will help get rid of computing’s reliance on data volume in the long run.

For re-authorisation authorization, please contact the AGENCY Assistant (ID: hangtongshe) or email

Add a Comment

Your email address will not be published. Required fields are marked *