登录查看更多内容

Is the future Berkeley's Ray or Spark or may be both?

SAMIR OULDSAADI

AI & Analytics Consulting Management - Global

发布日期: 2018年1月8日

Quite interesting! Both Ray and Spark are the production of U.C Berkeley, both are distributed frameworks and both can handle AI/ML tasks . Spark obviously has come a long way with its 2.x version, an incredible number of improvements with Tungsten project, catalyst, datasets and many other features. Ray although new in its infant phase, has a promise of bringing further improvements, not just IO speeds where spark exceeds many frameworks, but by eliminating some bottle necks that Spark might encounter in very large scale projects especially in the AI domain, like providing immediate feedback, with tightly integrated feedback loops which are extremely important in machine learning, One of the important features in Ray is the ability to go back in time and simulate the environment or the use case at least at the core of its goals, so you don't have to start your iterations from 0. In addition it promises the elimination of serialization of objects and things like that, easy of object sharing by saving objects states. Although Spark can easily integrate with other frameworks like Apache Arrow or Alluxio (previously Tachyon) and can improve tremendously the performance and eliminate the penalty of objects conversion, Ray seem to promise quite a bit in this area as well and seems to be a wonderful addition to AI opensource offerings where immediate feedback is much needed to learn quickly and act.

Courtesy of Michael Jordan (Director of AmpLab) now Rise - The improvements are not just in the way Ray works but also the architecture model:

Although support seems to be only for python at this time, am impressed by the speed of python when invoking ray framework, using 4 cores, a regular python function took me about 4 seconds, while by invoking Ray I got to sub millisecond. The speed of Ray also comes from the ability to take advantage of CPU pipelining and it can also take advantage of GPUs as well.

I would think it is worth to run few real use case with some large data sets using same conditions including language (e.g. python) for now and benchmark both Ray and Spark see where that get us!

It would be also interesting if/when Ray offers support for Scala, Java and R how that scale against Spark. I would also think it is worth to test Julia which has the ability to parallelize tasks out of the box with its support for distributed parallel execution, (also spark has this in its core engine with parallelize method) or perhaps we should think of a framework initially written in Julia instead!

Please note I am not recommending anything at this point this is a thought for food, Spark has dozens if not hundreds of real life production use cases, I just wanted to get the thinking going and possibly start some benchmarks/feedback of both and other frameworks .

要查看或添加评论，请登录

SAMIR OULDSAADI的更多文章

"Less is more" has never been more important in everything!!

2023年3月5日

"Less is more" has never been more important in everything!!

Why less is always more? What are the pragmatic applications of this metaphor! How can you do more with less? How can…
How good technologies can contribute to your organization or country net zero carbon goals!!

2023年2月14日

How good technologies can contribute to your organization or country net zero carbon goals!!

There is a scientific consensus, to prevent and reduce climate disasters around the world, humanity need to come…

3 条评论
The Digital Transformation Dilemma, why should you care?

2022年7月12日

The Digital Transformation Dilemma, why should you care?

Beware of Darwinism Phenomena, be the future not the Ancient! A staggering number of companies with excellent track…
Key IDOL Capabilities that Fuel Intelligent Virtual Assistant

2018年5月8日

Key IDOL Capabilities that Fuel Intelligent Virtual Assistant

There are a many Chatbots or Virtual assistant out there, some are dialogue based, some incorporate some sort of…

2 条评论
How can we extract meaning from Visual Data?

2017年12月12日

How can we extract meaning from Visual Data?

Professor Alyosha Efros of UC Berkeley said in one of his talks in Strata Hadoop “Visual Data is hard to handle” here…

See all articles

Is the future Berkeley's Ray or Spark or may be both?

SAMIR OULDSAADI

AI & Analytics Consulting Management - Global

SAMIR OULDSAADI的更多文章

社区洞察

其他会员也浏览了

Summary Notes on Algorithms: Recursion, Divide and Conquer, Sorting, and Searching

10 Best Frameworks and Libraries for AI

Why should the concept of Imperatively Hidden Elements (IHE) become common knowledge, especially in machine learning and feature selection?

The Best ML Tool I’ve Used

How to solve a real machine learning problem with Nx

Machine Learning Roadmap

Gradient Boosting To Predict Hospital Length Of Stay

Why Julia for Data Science

DeepSeek R1

Supercharge ML models with Distributed Xgboost on CML

SAMIR OULDSAADI的更多文章

"Less is more" has never been more important in everything!!

How good technologies can contribute to your organization or country net zero carbon goals!!

The Digital Transformation Dilemma, why should you care?

Key IDOL Capabilities that Fuel Intelligent Virtual Assistant

How can we extract meaning from Visual Data?

社区洞察

其他会员也浏览了

Summary Notes on Algorithms: Recursion, Divide and Conquer, Sorting, and Searching

10 Best Frameworks and Libraries for AI

Why should the concept of Imperatively Hidden Elements (IHE) become common knowledge, especially in machine learning and feature selection?

The Best ML Tool I’ve Used

How to solve a real machine learning problem with Nx

Machine Learning Roadmap

Gradient Boosting To Predict Hospital Length Of Stay

Why Julia for Data Science

DeepSeek R1

Supercharge ML models with Distributed Xgboost on CML