Humans have a quite common ability to differentiate voices of various people in a crowd and be entirely sure who is talking to whom. In this way, you can listen to only one person while hearing many other voices, as well as noises. Have you ever thought about that? Being quite ordinary for the human mind, voice recognition and focusing are those things which are still considered to be a hard-to-achieve task for artificial intelligence.
Google Research Blog reports that the brightest minds of their corporation have developed a new deep learning model which deals with audio and visualization. The new technology of Google machine learning is able to treat each voice in a mixture of sounds separately. It is commonly known as the cocktail party effect.
The way it works is simple as that: the voice of the only person you need is enhanced, and other sounds in the video are suppressed. Google researchers, together with an intern from The Hebrew University of Jerusalem, claim that the only thing that is required for such audio decoding is an ordinary video with a single audiotrack. Then you have to choose the face of the person in the video you would like to listen to (or you can deliver this task to the computerised automatic algorithm). A long story short, apart from face recognition, we are now being introduced to voice discerning.
We’ve already touched upon the unique characteristics of the technology, but to make it clear, once again: the new AI technology uses both face recognition and audio tracking. Not only does the system understand that the voices are different, but it also correlates them with the faces in the video.
You can extract a separate audiotrack for a single person without any background noises.
Google are anticipating the bright future for their new technology. This deep-learning algorithm is expected to be implemented into various Google products (Hangouts, Duo etc).
To summarize, we can definitely agree that the new voice isolating algorithm can be extremely handy in avoiding our daily “cocktail parties” when speaking to our friends or relatives via Skype and other messengers, or just picking up only useful information out of some all-noisy video.
Having said that, this artificial intelligence algorithm could also be dangerous because any person might overhear your private conversation. The only thing he or she would have to do is just record you, and that’s it. Though Google claims to be all about the privacy, it is still a bit scary how far the technology has gone today. Our task is to make sure it doesn’t go wrong.