BigDL for Apache Spark: A Real Big Step For Deep Learning
I had been in a few meetings in recent past where ‘productionizing’ a new Deep Learning based AI project was the central theme. In such meetings, there is never one, but many elephants roam the room freely and are least talked about. The biggest and scariest of them all are the following two:
- How do we move the (terabytes of) data to feed the model?
- What is the REAL cost of compute (GPUs)?
Most of these ‘Go No-go’ meetings end up in an impasse at this point as its extremely vexing to find the right answer for the above questions. Cool DL models pass with flying colours in the lab but its almost impractical to get years’ worth of data in the enterprise to move to the infrastructure where the model runs. On the question of GPU cost, its almost impossible to predict with any accuracy as to how much GPU power the model needs in production and GPU pricing from most of the cloud vendors make it very difficult to decipher, given the newness of such infrastructure.
Here is where Apache BigDL, that runs on Apache Spark, really enters like a hero. Yes, I agree that its very compelling to ask “Do we need one more Deep Learning Library in a space where it seems to be already crowded with the likes of Tensorflow, Caffe, Theano, Keras, Torch and fastai (yeah, my favourite) ?".
Honestly, BigDL looks and behaves a lot like its peers and by Intel’s own admission, it offers 'feature parity' to them. And there ends the similarity. BigDL is natively integrated with Spark and thats a huge plus in terms of performance and ease of programming.
But the most important point is, BigDL lets you take the Deep Learning model to the existing Big Data infrastructure and is a more than willing co-habitant to your existing workloads like ETL. This is a HUGE win for data scientists who were otherwise struggling to take their terabytes of data to the Deep Learning infrastructure. And the next most important one is, BigDL lets the Deep Learning models to run on the existing CPU based infrastructure, both by exploiting Intel? Math Kernel Library (Intel? MKL) as opposed to GPU’s in-built vector operations as well as scaling out to multiple nodes like any Spark work load would.
Over the past few weeks I have been experimenting on my regular CPU based machines running CNN models for image classification using Apache Spark and BigDL. I have used GPUs for my earlier experiments. While admitting that its a bit clumsy to get BigDL up and running (you need quite a bit of linux hackery) I must say that I am mighty impressed with the performance. While its a too early for me to do a shoot out, I am more than willing to invest my time on BigDL and I will readily recommend enterprises to build their POCs on the same.
For those who want to take a foray into BigDL, I will highly recommend the Cloudera distribution and their very succinct blog (https://blog.cloudera.com/blog/2017/09/deep-learning-with-intels-bigdl-and-apache-spark/). It helped me a lot to get myself bootstrapped really fast.
Building AI Solutions at Cisco
7 年Agree! Most of the cloud providers like AWS has a custom hardware for Deep Learning! AWS has DeepLens on-board compute capable of running deep learning inference on sophisticated models in real time.
CTO, Engineering Head, RPA, AI ,DevOps, SRE, Cloud, Kubernetes - SVP at Jio
7 年Thanks, I am just curious to find out how many enteprises today would want to invest in GPUs unless they move their AI infrastructure to cloud