On the afternoon of November 22nd, at the microsoft Xiaoice annual research progress sharing meeting, Microsoft Xiaoice three chief scientists to share Xiaoice recent technological breakthroughs, including singing, metaphors and so on. On August 15, 2019, Microsoft officially launched its seventh-generation Xiaoice. Wu Wei, Microsoft’s chief NLP scientist at the Xiaoice, believes That Self-Complete (self-contained) is a good summary of Microsoft’s work on Xiaoice in recent years.
A self-contained conversation robot should have several abilities: learning, self-management, and connectivity.
“I think these three capabilities form a vertical line that runs through the dialogue robots over the years, and maybe even for some time to come, the whole research and development.” In fact, Wu wei says, there is a horizontal line that is the evolution of Microsoft’s Xiaoice core conversation engine. Xiaoice to start by making a search model, by reusing existing human dialogue to achieve human-computer interaction, and then the team to do the generation model, so that Xiaoice can synthesize their own response, and then later do a common model, hope that Xiaoice can autonomously grasp the entire dialogue process.
“This horizontal line and that vertical line are actually interlaced, forming a gorgeous picture of the development of a dialogue robot. “
Among them, learning includes learning how to speak from human conversation, and learning from each other by robots.
Wu Wei revealed that the team this year tried to let the two robots learn from each other to advance together. That is, let the two retrieval models in the training process for each other teachers and students, exchange with each other. At each iteration, one model communicates what it learns from the data to another model, and at the same time comes into contact with it from another model, and the two models learn from each other and ultimately hope to make common progress.
Microsoft Xiaoice chief voice scientist Yan Jian mentioned the progress of Xiaoice singing technology. He revealed that Xiaoice began to sing in 2016, and after all the efforts, Xiaoice some of the big problems in the field of speech synthesis have been solved, the team began to look for a more challenging topic to continue to do, so they chose to sing.
There are three main reasons for choosing to sing: the threshold of singing is higher than speaking, there are difficulties in technology, emotional expression is more rich and intense, the song is a form of joy, it is a very important form of entertainment, singing is done well, should be very market prospects, very direction.
One of the most important supports for deep learning is data, and now there is the support of big data to make deep learning so good. And “for the task of singing, the data is actually more difficult, because compared to the speech, the singing data is very small, the vast majority of the data is mixed, accompaniment track.” The team has worked with record companies to learn from the mixed accompaniment data already available in their databases, mr. Yan said.
It is reported that Xiaoice singing also has some commercial prospects, such as its Japanese division has signed with the record company.
Yan Jian concluded that the next whether it is artificial intelligence creation, or singing to improve, to walk on two legs, while constantly improving the model, while constantly digging more data, these two things if done better, the quality will continue to be improved.
Xiaoice at the moment, in addition to writing poetry, drawing, composing, singing, this year another development can be to create metaphors. Song Ruihua, Microsoft’s chief scientist at microsoft Xiaoice, said he hoped Xiaoice would actually create a metaphor that humans had never said, rather than digging it up in an article that existed, rather than using templates like “like,” “like”, “like” and “like” to dig it out.
To help Xiaoice learn the “figurative” skill, Song revealed that they selected six categories of complex poetry, each with 122 themes, and filtered out 96 commonly used figurative concepts including love, heart, world, mother, beauty, and humanity through Xiaoice chat logs. The metaphorical capability of 3,000 of the most commonly used adjectives was then selected from 1,000 commonly used words to expand Xiaoice.
For example, love and the national football, their common vocabulary may have the future, illusory, then Xiaoice can make “love is like the national football, the future is illusory” such a natural metaphor.
“We actually want Xiaoice to be more like people, and you’ll find that in addition to working, listening to music is a great treat for you. I think the difference between human and animal is that there is a certain degree of autonomy, artificial intelligence creation embodies a kind of autonomy, including composition, writing poetry, painting, we do algorithms people do not know what kind of results will eventually occur, you in that instant will have an illusion, think she has consciousness, this is a very good point. Mr Song said.