登录查看更多内容

BigDL for Apache Spark: A Real Big Step For Deep Learning

Renga Bashyam

发布日期: 2018年2月9日

I had been in a few meetings in recent past where ‘productionizing’ a new Deep Learning based AI project was the central theme. In such meetings, there is never one, but many elephants roam the room freely and are least talked about. The biggest and scariest of them all are the following two:

How do we move the (terabytes of) data to feed the model?
What is the REAL cost of compute (GPUs)?

Most of these ‘Go No-go’ meetings end up in an impasse at this point as its extremely vexing to find the right answer for the above questions. Cool DL models pass with flying colours in the lab but its almost impractical to get years’ worth of data in the enterprise to move to the infrastructure where the model runs. On the question of GPU cost, its almost impossible to predict with any accuracy as to how much GPU power the model needs in production and GPU pricing from most of the cloud vendors make it very difficult to decipher, given the newness of such infrastructure.

Here is where Apache BigDL, that runs on Apache Spark, really enters like a hero. Yes, I agree that its very compelling to ask “Do we need one more Deep Learning Library in a space where it seems to be already crowded with the likes of Tensorflow, Caffe, Theano, Keras, Torch and fastai (yeah, my favourite) ?".

Honestly, BigDL looks and behaves a lot like its peers and by Intel’s own admission, it offers 'feature parity' to them. And there ends the similarity. BigDL is natively integrated with Spark and thats a huge plus in terms of performance and ease of programming.

But the most important point is, BigDL lets you take the Deep Learning model to the existing Big Data infrastructure and is a more than willing co-habitant to your existing workloads like ETL. This is a HUGE win for data scientists who were otherwise struggling to take their terabytes of data to the Deep Learning infrastructure. And the next most important one is, BigDL lets the Deep Learning models to run on the existing CPU based infrastructure, both by exploiting Intel? Math Kernel Library (Intel? MKL) as opposed to GPU’s in-built vector operations as well as scaling out to multiple nodes like any Spark work load would.

Over the past few weeks I have been experimenting on my regular CPU based machines running CNN models for image classification using Apache Spark and BigDL. I have used GPUs for my earlier experiments. While admitting that its a bit clumsy to get BigDL up and running (you need quite a bit of linux hackery) I must say that I am mighty impressed with the performance. While its a too early for me to do a shoot out, I am more than willing to invest my time on BigDL and I will readily recommend enterprises to build their POCs on the same.

For those who want to take a foray into BigDL, I will highly recommend the Cloudera distribution and their very succinct blog (https://blog.cloudera.com/blog/2017/09/deep-learning-with-intels-bigdl-and-apache-spark/). It helped me a lot to get myself bootstrapped really fast.

Sunil Kumar R Prasad

Building AI Solutions at Cisco

7 年

Agree! Most of the cloud providers like AWS has a custom hardware for Deep Learning! AWS has DeepLens on-board compute capable of running deep learning inference on sophisticated models in real time.

Yogesh Gundurao

CTO, Engineering Head, RPA, AI ,DevOps, SRE, Cloud, Kubernetes - SVP at Jio

7 年

Thanks, I am just curious to find out how many enteprises today would want to invest in GPUs unless they move their AI infrastructure to cloud

查看更多评论

要查看或添加评论，请登录

Renga Bashyam的更多文章

How well GPT-4 performs in simple but nuanced tasks?

2023年3月21日

How well GPT-4 performs in simple but nuanced tasks?

How does GPT4 perform in highly nuanced language tasks that's fairly simple for a human adult? My first test : Solving…

1 条评论
Rhythm is all that matters

2022年8月31日

Rhythm is all that matters

Our startup has been making some enviable progress with product development, especially over the past year. We have…

20 条评论
How to judge a person’s work ethics during an interview process?

2021年3月24日

How to judge a person’s work ethics during an interview process?

I consider work ethics as one of the top evaluation criteria for selecting my team members. But what is work ethics in…

19 条评论
What Makes AI In Medical Imaging Deeply Interesting?

2021年3月15日

What Makes AI In Medical Imaging Deeply Interesting?

Computer vision has come a long way since the advent of deep learning and especially since the breakthrough that was…
Automatic Speech Recognition on the Edge - A chimera or a reality?

2021年3月4日

Automatic Speech Recognition on the Edge - A chimera or a reality?

ASR or Automatic Speech Recognition is definitely knocking at the doors of industrial-applications. While Alexa and…

10 条评论
Tips for building a scalable IT Infrastructure for running AI/ML

2018年3月4日

Tips for building a scalable IT Infrastructure for running AI/ML

‘Systems are more important than algorithms’ could be the most profound learning for me in the past year while…

1 条评论
What ails B2B startups?

2017年7月30日

What ails B2B startups?

I met with a leader of a digital program recently. His company pushed the digital envelope really hard.

5 条评论
What The AI Advocates Should Learn From Nuclear Tech

2017年7月19日

What The AI Advocates Should Learn From Nuclear Tech

Nobel laureate Richard P Feynman was just 26, twenty whole years younger than what Elon Musk is today, when he…
A car loses its driver, Indian IT gets its own

2017年7月15日

A car loses its driver, Indian IT gets its own

My first reaction to the news that Infosys doyen Sikka arrived in a driverless buggy for his quarterly media briefing…

6 条评论
Book : What to do when machines do everything : How to get ahead in a world of AI, Algorithms and Big Data by Malcom Frank, Paul Roehrig and Ben Pring

2017年2月4日

Book : What to do when machines do everything : How to get ahead in a world of AI, Algorithms and Big Data by Malcom Frank, Paul Roehrig and Ben Pring

As soon as I selected this book on Amazon.in, the site’s recommendation engine popped up another book as something I…

4 条评论

See all articles

BigDL for Apache Spark: A Real Big Step For Deep Learning

Renga Bashyam

Renga Bashyam的更多文章

社区洞察

其他会员也浏览了

Issue #219 - THE ML ENGINEER ??

Issue #214 - THE ML ENGINEER ??

FAQ - Mathematical Foundations of Data Science

Implementing AdaGrad Optimizer in Spark

TOP 10 MACHINE LEARNING ALGORITHMS: THE KEY TO SUCCESSFUL ML CAREER IN 2021

New Books and Resources for DSC Members

Navigating the Evolution and Future of Machine Learning Infrastructure

MODEL VS ALGORITHM IN ML

Mastering Model Fine-Tuning in PyTorch: A Comprehensive Guide

News Classifier with Naive Bayes in Python

Renga Bashyam的更多文章

How well GPT-4 performs in simple but nuanced tasks?

Rhythm is all that matters

How to judge a person’s work ethics during an interview process?

What Makes AI In Medical Imaging Deeply Interesting?

Automatic Speech Recognition on the Edge - A chimera or a reality?

Tips for building a scalable IT Infrastructure for running AI/ML

What ails B2B startups?

What The AI Advocates Should Learn From Nuclear Tech

A car loses its driver, Indian IT gets its own

Book : What to do when machines do everything : How to get ahead in a world of AI, Algorithms and Big Data by Malcom Frank, Paul Roehrig and Ben Pring

社区洞察

其他会员也浏览了

Issue #219 - THE ML ENGINEER ??

Issue #214 - THE ML ENGINEER ??

FAQ - Mathematical Foundations of Data Science

Implementing AdaGrad Optimizer in Spark

TOP 10 MACHINE LEARNING ALGORITHMS: THE KEY TO SUCCESSFUL ML CAREER IN 2021

New Books and Resources for DSC Members

Navigating the Evolution and Future of Machine Learning Infrastructure

MODEL VS ALGORITHM IN ML

Mastering Model Fine-Tuning in PyTorch: A Comprehensive Guide

News Classifier with Naive Bayes in Python