Distributed platform for Machine Learning
Sreehari S S
Senior Technical Architect @ IBS Software | Cloud Computing | AI | Innovations
When we think of computer processing power, our first instinct is to consider the speed and architecture of a computer’s central processing unit (CPU). In the AI era, these two words GPU, TPU changed the mode of computation in many aspects like latency, throughput etc.
Let’s have a brief journey about Data Processing using CPU, GPU, TPU. As it turns out though, the consumer graphics cards produced for the PC gaming market have far more computing power for many types of tasks.
- Central Processing Unit (CPU): A processor designed to solve every computational problem in a general fashion. The cache and memory design is designed to be optimal for any general programming problem.
- Graphics Processing Unit (GPU): A processor designed to accelerate the rendering of graphics. GPUs are devices that are optimized for performing matrix multiplication and other "data friendly" mathematical operations.
- Tensor Processing Unit (TPU): A co-processor designed to accelerate deep learning tasks develop using TensorFlow (a programming framework); Compilers have not been developed for TPU which could be used for general purpose programming; hence, it requires significant effort to do general programming on TPU.
GPU vs TPU
The two vendors leading the GPU revolution are NVIDIA and Google (with the TPU - Tensor Processing Unit).
GPU originally built for high speed graphics rendering, GPUs have found a new calling in training and serving machine learning models as well as in Deep Learning and Artificial Intelligence. A GPU is a processor in its own right, just one optimised for vectorised numerical code; GPUs are the spiritual successor of the classic Cray supercomputers.
A TPU is a co-processor, it cannot execute code in its own right, all code execution takes place on the CPU which just feeds a stream of micro operations to the TPU. TPUs are a pretty successful Google experiment; Google tries lots of crazy things, and many of them don’t work that well. TPUs do work, but aren’t going to replace GPUs for every application, or even many applications. TPUs are focused just on machine learning work.
Practically speaking? Both TPU and GPU can achieve the same computational task provided the appropriate compiler support is available. The main difference is that TPUs are cheaper and use a lot less power, and can thus complete really large prediction jobs cheaper than GPUs, or make it simpler to use prediction in a low-latency service. There’s no particular reason a TPU couldn’t run something other than a TensorFlow model, it’s just nobody has written the compilers to do so yet. It would be hard, because it is not a completely generic processor, and some of the strange-looking restrictions in TensorFlow are there to make TPUs possible.
In the current scenario, one can use GPU as a general processor but for TPU, other dependent support such as compilers are not available. TPU is good for the deep learning task.
It will indeed be an exciting world, but it will also require vast amounts of hardware and computational power. In the next few years, our society will undoubtedly be driven by new developments in AI.
Data Scientists need computing power, whether you’re processing a big dataset with Pandas or running some computation on a massive matrix with Numpy, you’ll need a powerful machine to get the job done in a reasonable amount of time. Currently, huge investments are being made in more and more data centers that are utilizing a traditional CPU based computing to perform machine learning tasks.
A dataset that goes over 100GB in size is going to have many many data points, within the millions or even billions ballpark range. With that many points to process, it doesn’t matter how fast your CPU is, it simply doesn’t have enough cores to do efficient parallel processing. If your CPU has 20 cores (which would be fairly expensive CPU), you can only process 20 data points at a time!
CPUs are going to be better in tasks where clock-speed is more important — or you simply don’t have a GPU implementation. If there is a GPU implementation for the process you are trying to perform, then a GPU will be far more effective if that task can benefit from parallel processing.
Deep Learning has already seen its fair share of leveraging GPUs. Many of the convolution operations done in Deep Learning are repetitive and as such can be greatly accelerated on GPUs, even up to 100s of times. Every month the demand for AI computation is doubling with costs increasing proportionately. Traditional suppliers of computation power, such as Amazon and Microsoft, are using price as a lever to control usage which restricts innovation.
Higher the CUDA Parallel-Processing Cores, NVIDIA Tensor Cores, GPU Memory, Tensor Performance, Core clock speed - better the GPU perform.
Blockchain and Machine learning
The rise of concurrency has created a surprising new market for GPUs - mining.
Everybody knows that there is an insane number of GPUs working on crypto-mining today. It’s common knowledge that the Proof of Work algorithm that the Ethereum blockchain depends on is largely computed by consumer graphics cards produced for the gaming market. These same cards also outperform all other options for machine learning and artificial intelligence both in terms of raw power and cost-per-performance.
Blockchain and Machine Learning (ML) have been making a lot of noise over the last couple of years, but not so much together.
Blockchain, or distributed ledger technology (DLT), may provide the computational resources AI needs by utilizing the computing power of machines that hold non-utilized GPU computing power. In some ways, this is what the Blockchain protocol was designed to do. Part of the Blockchain protocol requires miners to solve complex mathematical problems that no one computer can solve by itself, as a way to confirm and validate transactions on the blockchain. As the process went on, it evolved and virtual currency was born. If we can tokenize value, can’t be also tokenize computing power?
Blockchain based projects are now working on connecting computers in a peer-to-peer network allowing individual to rent resources out from each other. These resources can be used to complete tasks requiring any amount of computation time and capacity. Today, such resources are supplied by centralized cloud providers which, are constrained by closed networks, proprietary payment systems, and hard-coded provisioning operations.
Blockchain based solutions are now working on building decentralized marketplaces for GPU computing power that machine learning task need. These projects aim to match a computationally intensive project with connected platform members who will share their system resources to complete a given task. With distributed ledger technology (DLT), AI innovation can dramatically reduce its cost of computing by accessing the globally distributed GPUs, used by crypto miners, and then make them available to AI companies.
Blockchains create an environment where data is private, immutable, transparent, distributed, and is free to operate without the direction of a sovereign entity. Eventually, public mineable blockchains will be the AI Superhighways, but not just with computation power. They will also act as the data feeds into AI models, which will be essential to preserving the validity of the models. Blockchain technologies hold the promise of adding structure and accountability to AI algorithms, as well as the quality and usefulness of the intelligence they produce.