Highlight
- The Columbia Engineering researchers developed a new system that creates whisper-quiet sounds that you can play in any room, in any situation, to block intelligent devices from spying on you.
- The researchers have developed an AI algorithm to help individuals protect their privacy from antagonists trying to steal their data through rogue microphones.
Have you ever seen some online ads that are mysteriously close to something you may have just discussed with your closest family member or friends? Today, most electronic equipment, including mobile phones, watches, televisions, and voice assistants, are integrated with microphones, and they are always listening to you. Computers consistently utilize neural networks and Artificial Intelligence (AI) to process your voice to get information from you. What can you do if you want to stop this from occurring?
In the past, as displayed in a famous hit TV show, “The Americans,” one could play music with the volume up or turn on the water in the bathroom. But what if you do not want to shout on the piece to interact consistently? A research team at Columbia Engineering has designed a new system that creates whisper-quiet sounds that you can play in any room, in any situation, to block intelligent devices from spying on you. It is very simple to integrate with hardware such as computers, and smartphones, providing people agency over protecting their voice privacy.
“A key technical challenge to achieving this was to make it all work fast enough. Our algorithm, which manages to block a rogue microphone from correctly hearing your words 80% of the time, is the fastest and the most accurate on our testbed. It works even when we don’t know anything about the rogue microphone, such as its location or even the computer software running on it. It camouflages a person’s voice over-the-air, hiding it from these listening systems, and without inconveniencing the conversation between people in the room,” said assistant professor of computer science Carl Vondrick.
Be on top of the interactions
The team’s results in corrupting automatic speech recognition systems are theoretically known to be possible in AI; accomplishing them quickly enough to utilize them in practical applications remained a major challenge. The concern had been that a sound that breaks a user’s voice now – at a particular moment – is not the sound that will break speech a moment later. As individuals interact, their voices continuously change as they say different words and speak quickly. Such changes make it challenging for a system to keep up with the fast pace of an individual’s speech.
Releasing ‘predictive attacks’
The research team wanted to develop a framework that could break neural networks in real-time, constantly delivered as the speech is spoken, and applicable to a majority of vocabulary in a language. They had successfully overcome at least one of these three requirements in the past, but no one has accomplished all three.
Chiquier’s latest framework utilizes what she calls “predictive attacks,” a signal that can disrupt any word that automatic speech recognition models are trained to transcribe. Furthermore, when the sounds are played over-the-air, they must be loud enough to disrupt any rogue “listening-in” microphone that might be far away. The attack sound must make a similar distance as the voice.
The research team’s module accomplishes real-time performance by predicting an attack on the future of the signal, or word, conditioned on two seconds of input speech. The researchers optimized the attack to have a volume similar to normal background noise, enabling individuals in a room to interact naturally and without being successfully tracked by an automatic speech recognition system. The researchers successfully displayed that their approach works inside real-world rooms with natural ambient noise and intricate—scene geometries.
Experts view
“Our algorithm is able to keep up by predicting the characteristics of what a person will say next, giving it enough time to generate the right whisper to make. So far, our method works for the majority of the English language vocabulary, and we plan to apply the algorithm to more languages, as well as eventually make the whispering sound completely imperceptible,” lead author of the study and a Ph.D. student in Vondrick’s lab, Mia Chiquier, said.