TPU - Google opened the Pandora Box in computing world?
Venkatasudhan Lakshminarayanan
Solutions Architect @ Broadcom | Cloud Solutions Expert
Last year during Google's I/O conference, Sundar Pichai revealed that Google is using a special customised version of compute hardware for it's machine learning & AI processing program.Google call it as " Tensor Processing Unit" or "TPU".
With the amount of data-set which needs to be processed by these Machine Learning algorithms,deploying GPUs, FPGAs might have made Google to built large DataCenters with huge compute power to perform the task. But engineers & scientists in Google came up with idea of building a new compute chip which can improve cost-performance by 10 X over GPUs and punch more power into the existing HW rather adding new datacenters. They kept the design & architecture of TPU as tightly guarded secret.
Finally Google has released a whitepaper on TPU with gives insights into it's architecture and performance benchmark. I'm sharing few important points out of that.
Rather than be tightly integrated with a CPU, to reduce the chances of delaying deployment, the TPU was designed to be a coprocessor on the PCIe I/O bus, allowing it to plug into existing servers just as a GPU does.Each TPU chip can be installed in a data center rack on a board that fits into a hard disk drive slot.
The TPU was designed, verified, built, and deployed in datacenters in just 15 months. They have implemented that TPUs in their products like Google Image Search, Google Photos and the Google Cloud Vision API and they were one of the key factor behind Google DeepMind's victory over Lee Sedol, the first time where a computer defeating a world champion in the ancient game of Go.
Google designed the chip specifically for neural networks, which can run 15 to 30 times faster than general purpose chips built with similar manufacturing techniques.During tests the TPU server has 17 to 34 times better total-performance/Watt than Haswell, which makes the TPU server 14 to 16 times the performance/Watt of the K80 server.
The TPU leverages its advantage in MACs and on-chip memory to run short programs written using the domain specific TensorFlow framework 15 times as fast as the K80 GPU, resulting in a performance/Watt advantage of 29 times, which is correlated with performance/total cost of ownership.
Google is not alone in this space, Nvidia & Intel are playing their cards by developing powerful processors that will meet the growing demands of the compute hungry environments like machine learning, AI & deep learning.
All the cloud service providers are now offering high perf GPU processing servers for heavy work loads starting with AWS cg1.4xlarge, Azure N Series servers,Google Cloud GPU Instances.
But TPU has surely opened a Pandora box which will wake up the hardware vendors to re-think the design of compute chips for the datacenter & cloud environment to meet the growing demand for huge compute power for modern applications like Machine learning,neural networking & AI.
Lead engineer work modernisation
7 年it's core machine learning