Google Duplex - How it works & Implications
Imagine a future where every task can be achieved with a simple voice command. Google gave us a sneak peek into what this future will look like.
During Google I/O this past month, Google demonstrated the capabilities of Google Duplex, an ambitious project to advance artificial intelligence. Duplex is a virtual assistant that can book reservations and make phone calls for you, all with a seemingly human voice. This technology is the latest and most significant innovation of the artificial intelligence revolution. So how does it work?
Before I go in depth on how this technology works, I need to define a few high-level terms. Google Duplex uses a recurrent neural network and is built using TensorFlow Extended. A neural network is a computing system that is trained to do an automated task. Using training sets, the neural network is given a problem with a solution. The model trains itself by guessing the answer to these problems and adjusting itself as it gets closer to the solution. Using what it has learned, the neural net is able to solve new problem. Recurrent neural networks are perfect for tasks like speech recognition as it is able to work with data that is unsegmented. Unsegmented data is data that does not have a direct correlations. They are a class of artificial neural networks where connections between nodes form a directed graph along a sequence. This kind of neural net uses its memory to process inputs which makes it well suited for speech recognition. TensorFlow Extended is a machine learning platform implemented by Google. TensorFlow is a software library that aids with high performance numerical computation. It has strong support for machine learning and deep learning and has a flexible numerical computation core that aids with its calculations. Using these two technologies Google was able to make a very advanced robotic assistant that can use all of the data collected to mimic human conversation.
To make Google Duplex so realistic, Google used all of its collected conversation data and its Automatic Speech Recognition (ASR) technology to provide data points for the machine learning model. They trained each task separately and then combined all of the data to make the most effective responses. As a final step Google used optimization from TensorFlow Extended. This further improved the human-like aspects of Duplex.
In order to make Duplex sound natural, Google used a concatenative and synthesis text to speech engine (Tacotron and WaveNet) to control the intonation of the assistant based on the situation. Additionally, for complex responses, Google added fillers such as "umm" or "hmm" which mimics the sounds a human makes while gathering their thoughts. In order to make the response times more accurate latency was added to ensure that simple responses were immediate while complex ones took a bit more time to process.
Occasionally, Duplex is faced with too complex of a task in which case a human operator will take over. Duplex will take the data from this interaction and learn from the experience, which will further improve the scope of Duplex. As this technology continues to improve, it will continue to make our lives easier as we spend less time on mundane phone calls.
Google Duplex is the largest leap for robotic assistants that we have seen so far. It is capable of placing calls for its users for tasks such as making reservations to restaurants or booking a hair salon appointments. What makes this extremely impressive is that Google Duplex sounds like a human, can understand complex statements and understand the context of the conversation. Currently Google Duplex can only call restaurants, hair salons, and get the hours from a business. As this technology continues to improve, which it will do so rapidly, the future implications of Google Duplex are far reaching.
This is just the beginning.
Just as the rise of the autonomous vehicle has signaled the end of commercial driving and transportation jobs, so does Google Duplex signal the end of customer service jobs across the board. For example, many tech support jobs spend more than half of their calls answering simple questions that require simple solutions. With technology like Duplex, these jobs will become obsolete. As this technology evolves it will be able to use your voice commands to not only make phone calls but also physically interact with your environment. Having a robotic assistant in your house could help with daily chores or even make you a perfect cup of coffee! Conversely, this rapid improvement of artificial intelligence (AI) could have unintended consequences.
We can all imagine a future where AI is a part of life. We’ve been told that AI will reduce the need for physical human labors and increase the demand for human intelligence. Or will it? As we progressively increase the capabilities of AI, it will begin to accomplish more complex tasks.
Personally, I look at Google Duplex with optimism and with excitement for what the future has to hold. Ever since I witnessed the intelligence of J.A.R.V.I.S. in Iron Man in 2008, I have dreamt of a future where all of mundane household tasks can be taken care of with simple voice commands. Voice assistants continue to grow ever smarter as products for smarter homes become more prevalent indicating the rise of the fully automated house in the near future. It may seem like artificial intelligence will take over our lives but at in the present artificial intelligence will serve to free up our time and accomplish menial tasks for us to make our lives easier. I look forward to the day I can stop booking my own hair appointments.
Inspirations: ai.googleblog.com & cnet.com
Investment Banking Associate at Goldman Sachs
6 年That’s really insightful Aman! Keep up the good work!