Since Amazon launched the Smart Speaker Echo in 2014, “smart speaker” devices such as the Amazon Echo, Google Home and Apple HomePod have sold millions of devices, and big tech companies are moving Amazon’s Alexa, Apple’s Siri, Google’s Assistant, Microsoft Cortana and Facebook’s similar services are integrated into people’s lives. Juniper Research, a consultancy, estimates that the global market for smart speakers will reach $11 billion a year by 2023, with about 7.4 billion voice control devices worldwide.
Technology companies say smart speakers only record when the user is activated, but are actually introducing a microphone that is always open into private space. Amazon and its competitors say the vast majority of voice requests are automated by computers and do not require manual auditing. However, these smart devices rely on thousands of low-paid employees for manual transcription, and private user conversations have become one of their most valuable data sets. All technology companies see this as a reasonable way to improve their products.
Actually, we’ve been eavesdropping.
Ruthy Hope Slatis couldn’t believe what she was hearing. She was hired by a temporary agency outside Boston to transcribe audio files for Amazon, a job that was obscured by Amazon. As a contract worker who earns only $12 an hour, she and her colleagues (officially known as data assistants) need to listen to clips of random conversations and write down every word they hear on their laptops.
Amazon will only say that the work is crucial to its top-secret voice recognition products, but these voice clips contain recordings of intimate moments of the user.
In the fall of 2014, Amazon launched the Echo Smart Speaker, which comes with the voice-activated virtual assistant software Alexa. Amazon sees Alexa as a miracle of artificial intelligence in its first Echo ad. In the ad, a happy family is ordering Alexa to receive news updates, answer questions, and help children with their homework. But Slatis soon became aware of the human influence behind the product.
She remembers thinking, “Oh my God, this is what I’m doing.” “Amazon captures every voice command in the cloud and relies on a data assistant like her to train the system. At first, Slatis thought he was listening to clips from paid testers who volunteered to use their voice mode in exchange for a few dollars in rewards. However, she soon realized that the idea was wrong.
The recordings she and her colleagues are listening to are often intense and awkward, and users will admit their secrets and fears in front of the speakers. With the development of transcription projects and the popularity of Alexa, the private information disclosed in the recordings has increased. Other contractors recalled hearing children share their home addresses and phone numbers, hearing a man trying to order sex toys, and even hearing a dinner guest loudly wondering if Amazon was eavesdropping. “Users tend to just joke, but they don’t know they’re being overheard,” Slatis said. She chose to resign in 2016.
Tech companies say they’re making corrections.
In the five years since Slatis first felt creepy, a quarter of Americans have bought smart speaker devices such as the Echo, Google Home and Apple Home Pod. So far, Amazon has won the sales battle, with users reportedly buying more than 100 million Alexa devices.
But now there is a new war between the world’s biggest companies, embedding Alexa, Siri, Google Assistant and Cortana into people’s lives by building microphones into everything from phones, smartwatches, TVs, refrigerators, SUVs and more. Juniper Research, a consultancy, estimates that by 2023, the global market for smart speakers will reach $11 billion a year and about 7.4 billion voice control devices, the equivalent of everyone on the planet.
The question now is, how do we deal with this scale?
According to the technology company’s statement, the machines don’t create audio files every moment because smart speakers record audio only when the user activates them. However, when always-on microphones are introduced into the kitchen and bedroom, they may inadvertently capture sounds that the user does not want to share.
However, these so-called smart devices undoubtedly need to rely on thousands of low-paid people, who need to add annotations to these sound clips so that technology companies can upgrade their “electronic ears”. So far, our faintest whispers have become one of the most valuable data sets for technology companies.
Earlier this year, Bloomberg first reported that the technology industry was using humans to censor audio collected from users (and did not disclose that fact to users). That includes Apple, Amazon and Facebook. Executives and engineers say building a vast network of human listening networks can be problematic or disruptive, although this has been an obvious way to improve its products.
In addition, apples have become more aggressive in collecting and analyzing people’s voices over the past few years, fearing that Siri’s understanding and speed lag behind those of Alexa and Google Assistant. Apple sees Siri as a voice search engine, so it must be prepared to handle endless user queries and become more dependent on audio analytics.
In 2015, when Apple CEO Tim Cook declared privacy a “fundamental human right,” Apple’s machines had to process more than a billion requests a week. At that time, users can turn on a feature that keeps the voice assistant online so they no longer have to press a button to activate the voice assistant. Apple said in its user agreement legal terms that it might record and analyze voice data to improve Siri, but there was no mention of human employees listening. “Listening to other people’s voices makes me feel very uncomfortable, ” says a former contractor. John Burkey, who worked on Siri’s senior development team, said. “This is not espionage. This is the same behavior as the app crashes and asks if you want to send the report to Apple. “
Many contractors say that while most Siri requirements are common, they still hear sexually explicit voices, as well as racist or homophobic speech.
Apple says less than 0.2% of Siri requests require human analysis. The former manager regarded the contractor’s allegations as hyperbole. Tom Gruber, co-founder of Siri, who led the development team, said: “Actually, a lot of what we’re dealing with is noise, not that the machine is going to record certain sounds, it’s just a question of probability in a sense. “
By 2019, Apple will have to process 15 billion voice commands a month when it introduces Siri into products such as its wireless headsets and HomePod speakers. 0.2% means that human contractors will have to handle 30million voice commands a month, or 360 million a year. Mike Bastian, a former chief research scientist on the Siri team, says the risk of random recording is growing as use cases increase. He mentioned the Apple Watch’s “lift-activated” feature, which automatically activates Siri when the wearer’s wrist is detected being lifted. “This leads to a high false positive rate,” he said. “
In 2016, Amazon created The Frequent Utters Database (FUD) to help Alexa add answers to common requests. Former employees working with FUD say there is a tension between product teams eager to mine more data and security teams responsible for protecting user information. In 2017, Amazon unveiled the Camera-equipped Echo Look, known as the AI Stylist, which recommends clothing pairings. People familiar with the matter said its developers were considering programming the camera to automatically power on when users asked Alexa to tell a joke. Their idea is to record a video of the user’s face and assess whether the user is laughing. Amazon eventually shelved the idea, they said. The company says Alexa does not currently use facial recognition technology.
The company has set up transcriptional “farms” around the world. This year, it has held a number of introductory recruitment campaigns for overseas transcriptists. A voice technologist who has spent decades developing identification systems for technology companies says the latest hiring suggests the scale of Amazon’s audio data analytics is staggering. Amazon said it was “seriousabouttheing about the security of its customers and their recordings” and needed a full understanding of regional accents and colloquialism to make Alexa global.
Microsoft admitted in August that it was using human help to review speech data generated through speech recognition technology. Companies such as BMW, HP and Humana are integrating this technology into their products and services. Chinese technology companies, including Alibaba, search giant Baidu and phone maker Xiaomi, collect voice data on millions of smart speakers every quarter.
Google Search provided Google Assistant with queries from billions of available devices, including Android smartphones and tablets, Nest thermostats and Sony TV. Google has hired temporary workers overseas to transcribe clips to improve the accuracy of the system. Google has promised that the recordings reviewed will not be associated with any personal information. But this summer, a Google contractor shared more than 1,000 user records with Belgian broadcaster VRT NWS. The media was able to find out who someone in the recording was based on what the user said, much to the shock of those identified. 10% of these records are due to the device incorrectly detecting the activation word and recording without the user’s consent.
The big tech companies adjusted their virtual assistant programs this year after news reports continued to emerge.
Google suspended human transcription of Assistant audio, and Apple began allowing users to delete their Siri history and choose not to share more content, making shared recordings optional, and directly hiring a number of former contractors to increase their control over human surveillance.
Facebook and Microsoft have added clearer disclaimers to their privacy policies.
Amazon has introduced similar disclosure methods and has begun allowing Alexa users to opt out of manual auditing.
Some researchers say increased smartphone processing power and a form of computer modeling called joint learning may eventually eliminate these listening behaviors because they will become smart enough to solve problems without the help of contract workers. Now, with no stricter laws or strong consumer opposition, human audio censors are almost certain to continue to grow as voice devices proliferate.