Amazon’s Alexa isn’t just AI — thousands of humans are listening

Amazon, like many other tech companies investing heavily in artificial intelligence, has always been forthright about its Alexa assistant being a work in progress. “The more data we use to train these systems, the better Alexa works, and training Alexa with voice recordings from a diverse range of customers helps ensure Alexa works well for everyone,” reads the company’s Alexa FAQ.

What the company doesn’t tell you explicitly, as highlighted by an in-depth investigation from Bloomberg published this evening, is that one of the only, and often the best, ways Alexa improves over time is by having human beings listen to recordings of your voice requests. Of course, this is all buried in product and service terms few consumers will ever read, and Amazon has often downplayed the privacy implications of having cameras and microphones in millions of homes around the globe. But concerns about how AI is trained as it becomes an ever more pervasive force in our daily lives will only continue to raise alarms, especially as most of how this technology works remains beyond closed doors and improves using methods Amazon is loathe to ever disclose.

In this case, the process is known as data annotation, and it’s quietly become a bedrock of the machine learning revolution that’s churned out advances in natural language processing, machine translation, and image and object recognition. The thought is, AI algorithms only improve over time if the data they have access to can be easily parsed and categorized — they can’t necessarily train themselves to do that. Perhaps Alexa heard you incorrectly, or the system thinks you’re asking not about the British city of Brighton, but instead the suburb in Western New York. When dealing in different languages, there are countless more nuances, like regional slang and dialects, that may not have been accounted for during the development process for the Alexa support for that language.

In many cases, human beings make those calls, by listening to a recording of the exchange and correctly labeling the data so that it can be fed back into the system. That process is very broadly known as supervised learning, and in some cases it’s paired with other, more autonomous techniques in what’s known as semi-supervised learning. Apple, Google, and Facebook all make use of these techniques in similar ways, and both Siri and Google Assistant improve over time thanks to supervised learning requiring human eyes and ears.

In this case, Bloomberg is shedding light on the army of literal thousands of Amazon employees, some contractors and some full-time workers, around the world that are tasked with parsing Alexa recordings to help improve the assistant over time. While there’s certainly nothing inherently nefarious about this approach, Bloomberg does point out that most customers don’t often realize this is occurring. Additionally, there’s room for abuse. Recordings might contain obviously identifiable characteristics and biographical information about who is speaking. It’s also not known how long exactly these recordings are stored, and whether the information has ever been stolen by a malicious third party or misused by an employee.

Bloomberg’s report calls out instances where some annotators have heard what they think might be a sexual assault or other forms of criminal activity, in which case Amazon has procedures to loop in law enforcement. (There have been a number of high-profile cases where Alexa voice data has been used to prosecute crimes.) In other cases, the report says workers in some offices share snippets of conversation with coworkers that they find funny or embarrassing.

In a statement, Amazon told Bloomberg, “We only annotate an extremely small sample of Alexa voice recordings in order [sic] improve the customer experience. For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone.” The company claims it has “strict technical and operational safeguards, and have a zero tolerance policy for the abuse of our system.” Employees are not given access to the identity of the person engaging in the Alexa voice request, and any information of that variety is “treated with high confidentiality,” and protected by “multi-factor authentication to restrict access, service encryption, and audits of our control environment.”

Still, critics of this approach to AI advancement have been ringing alarm bells about this for some time, usually when Amazon makes a mistake and accidentally sends recordings to the wrong individual or reveals that it’s been storing such recordings for months or even years. Last year, a bizarre and exceedingly complex series of errors on behalf of Alexa ended up sending a private conversation to a coworker of the user’s husband. Back in December, a resident of Germany detailed how he received 1,700 voice recordings from Amazon in accordance with a GDPR data request, even though the man didn’t own a Alexa device. Parsing through the files, journalists at the German magazine c’t were able to identify the actual user who was recorded just by using info gleaned from his interactions with Alexa.

Amazon is actively looking for ways to move away from the kind of supervised learning that that requires extensive transcribing and annotation. Wired noted in a report late last year about how Amazon is using new, more cutting-edge techniques like so-called active learning and transfer learning to cut down on error rates and to expand Alexa’s knowledge base, even as it adds more skills, without requiring it add more humans into the mix.

Amazon’s Ruhi Sarikaya, Alexa’s director of applied science, published an article in Scientific American earlier this month titled, “How Alexa Learns,“ where he detailed how the goal for this type of large-scale machine learning will always be to reduce the amount of tedious human labor required just to fix its mistakes. “In recent AI research, supervised learning has predominated. But today, commercial AI systems generate far more customer interactions than we could begin to label by hand,” Sarikaya writes. “The only way to continue the torrid rate of improvement that commercial AI has delivered so far is to reorient ourselves toward semi-supervised, weakly supervised, and unsupervised learning. Our systems need to learn how to improve themselves.”

For now, however, Amazon may need real people with knowledge of human language and culture to parse those Alexa interactions and make sense of them. That uncomfortable reality means there are people out there, sometimes as far away as India and Romania, that are listening to you talk to a disembodied AI in your living room, bedroom, or even your bathroom. That’s the cost of AI-provided convenience, at least in Amazon’s eyes.

Leave a Reply