With the development of 4G and 5G communication technology, network calls are becoming more and more popular. But network instability is the norm, so we may jump out from time to time during the call, “Can you repeat it, just the network is not very good.” “To improve the quality of calls, Google recently applied a new technology, WaveNetEQ, to the video chat app Duo that determines what the missing audio data might be and replaces it in the event of an audio packet drop.”
The technical support behind it comes from Google’s famous DeepMind team.
For a complete online call, the data is often divided into small pieces, each of which is a packet packet. However, during the transfer of these “packets” from the sender to the receiver, the packets usually arrive in the wrong order, resulting in jitter-related problems or direct loss, resulting in audio loss.
According to google, 99% of Duo’s calls are caused by packet loss, excessive jitter, or network latency. More than 3% of calls lose more than 3% of their audio, and 10% of calls lose more than 8%, which means there is a lot of audio to replace each call.
Each AV app handles drop-off bags in some way. Google says these packet loss hidden (PLC) processes can be difficult to fill the gap of 60 milliseconds or more. The algorithm used to be NetEQ, one of the two core technologies in audio technology in webRTC (the other is the back-and-forth processing of audio, including AEC, ANS, AGC, etc.). WebRTC, Google’s acquisition of GIPS and open source, is the most influential real-time audio-visual communication solution available, but it is used to handle lost packets, which in most cases sound like robot or mechanical duplication.
Google used a lot of voice data to develop a WaveNetEQ model based on DeepMind WaveRNN technology. The training dataset comes from more than 100 volunteers in 48 different languages, which means it can automatically fill in the drop-off in 48 languages.
WaveNetEQ is a recursive neural network model for speech synthesis, consisting of two parts, auto-regression network and conditional network. The role of self-regression network is to maintain the smooth flow of signals, while conditional network control and influence of the self-regression network to maintain audio consistency.
Google replaced the original NetEQ PLC component with WaveNetEQ, which has undoubtedly improved in sound quality compared to NetEQ, and the WaveNetEQ model runs fast enough to run on a mobile phone, thus avoiding data privacy concerns that users may be worried about. Google says all processing takes place on the device because Duo’s calls are encrypted end-to-end by default. Once the real audio of the call is restored, it will seamlessly switch to the real conversation.
However, there is a limit to the content and duration of WaveNetEQ replacements. Currently supports white space within 120 milliseconds, which then disappears and zeros; WaveNetEQ does not generate complete words, but simple syllables.
WaveNetEQ is now available in duo app on Pixel 4 phones, and Google says it is promoting it to other Android phones.
Of course, using machine learning to deal with audio packets is not the first time, many companies are studying related technology, to domestic companies, for example, some of their own business is related to video audio, such as Tencent;