Mozilla today released the latest version of Common Voice, an open-source collection of its transcription voice data for startups, researchers, and enthusiasts to build voice applications, services, and devices. Common Voice Now contains more than 7,226 hours of total contributed voice data in 54 different languages, up from 1,400 hours in 18 languages in February 2019.
Common Voice includes not only voice clips, but also metadata for voluntary contributions that are useful for training the voice engine, such as the speaker’s age, gender, and accent. It is designed to integrate with DeepSpeech, an open source set of voice-to-text, text-to-speech engines, and a training model maintained by Mozilla’s machine learning team.
Collecting more than 5.5 million fragments of Common Voice requires a lot of work, but so far, 5,591 of the 7,226 hours have been confirmed as valid by the project’s contributors. According to Mozilla, there are now more than 5,000 unique users in five languages – English, German, French, Italian and Spanish – in Common Voice, while seven languages are recorded in English, German, French, Kabil, Catalan, Spanish and Kinyarwanda.
Mozilla’s first-ever data set target segment was also released today to collect voice data for specific purposes and use cases. This segment includes the numbers “0” to “9” and the words “yes,” “no,” “hey” and “Firefox”, with 11,000 people speaking for 120 hours in 18 languages.
Following the Common Voice refresh, DeepSpeech has undergone a major update to include one of the fastest open source speech recognition models to date. The latest version adds support for TensorFlow Lite, the release of Google’s TensorFlow machine learning framework, optimized for computing-constrained mobile and embedded devices, and reduces DeepSpeech’s memory consumption by 22 times, while increasing its startup speed by more than 500 times.
Both Common Voice and DeepVoice provide references for the Work of the Mozilla project, such as Firefox Voice, a browser extension that adds voice recognition support to Firefox. For now, Firefox Voice can understand commands such as “how’s the weather” and “Find Gmail tags,” but the goal is to promote “meaningful interaction” with websites using only voice.