登录查看更多内容

Barrier Execution Mode in Spark 3.0 - Part 1: Introduction

madhukara phatak

Chief Architect at Tellius

发布日期: 2020年11月11日

Barrier execution mode is a new execution mode added to spark in 3.0 version. This marks a significant change to platform which had only supported Map/Reduce based execution till now. This will allow spark to diversify the kind of workloads it can support on it’s platform.

In this series posts we will discuss about this execution mode in detail.This is the first post in the series. In this post we will discuss about what is barrier execution mode and why it is needed. You can access all the posts in this series here.

Execution Mode

An execution mode in spark is way of executing the jobs in platform. The mode will dictate how jobs are divided into multiple parallel tasks and how they are scheduled. The execution mode defines what kind of processing can be handled in platform.

Map/Reduce been a popular execution mode in majority of big data frameworks including spark. This execution mode is flexible enough to handle wide variety workloads like ETL, SQL and ML etc.

Map/Reduce Execution Mode

In this section of the post we will look into the Map/Reduce from a execution point of view. Understanding this will help us how its different from the barrier execution mode.

In Map/ Reduce

A job is collection of stages. Each stage can be Map or Reduce. Between these stages there will usually be shuffling.
Each stage is collection of tasks. These tasks are independent of each other. This approach is called shared nothing. This allows system to scale as more resources are available.
As the tasks are independent of each other, when one of the tasks is failed only that task is retried.
Number of tasks in Map task is determined by amount of data and number of tasks in reduce phase is determined by developer

The above points summarises the Map/Reduce approach in very high level. Even though there are many implementation details, this information is enough for our discussion.

Need for New Execution Mode

Map/Reduce execution mode has served well for many years for different workloads. Why we need different execution mode now?

One of the reasons is to support deep learning frameworks on spark. Deep learning frameworks don’t lend themselves to Map/Reduce model. They work well with other kind of execution model called MPI ( Message Passing Interface). For example, Horovod, an open source framework to do deep learning on scale by Uber, uses the MPI to implement the distributed deep learning for variety of DL frameworks. You can learn more here.

In order to support the deep learning natively, spark need to support an execution model that is different than Map/Reduce. The new execution Model is modeled after MPI

https://blog.madhukaraphatak.com/barrier-execution-mode-part-1/

要查看或添加评论，请登录

madhukara phatak的更多文章

Email Spam Detection using Pre-Trained BERT Model: Part 2 - Model Fine Tuning

2023年2月16日

Email Spam Detection using Pre-Trained BERT Model: Part 2 - Model Fine Tuning

Recently I have been looking into Transformer based machine learning models for natural language tasks. The field of…
Email Spam Detection using Pre-Trained BERT Model : Part 1 - Introduction and Tokenization

2023年2月13日

Email Spam Detection using Pre-Trained BERT Model : Part 1 - Introduction and Tokenization

Recently I have been looking into Transformer based machine learning models for natural language tasks. The field of…
Java Streams: Write Functional Collection code in Java

2023年1月23日

Java Streams: Write Functional Collection code in Java

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Higher Order Functions in Java

2022年10月17日

Higher Order Functions in Java

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Functional Interfaces: Java Lambda Expressions and Backward Compatibility

2022年10月13日

Functional Interfaces: Java Lambda Expressions and Backward Compatibility

I started my career as a Java developer back in 2011. I developed most of my code in the 1.

1 条评论
Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions

2022年10月10日

Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference

2022年9月14日

Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Pandas API on Apache Spark - Part 2: Hello World

2021年7月23日

Pandas API on Apache Spark - Part 2: Hello World

Pandas API on Apache Spark brings the familiar python Pandas API on top of distributed spark framework. This…
Pandas API on Apache Spark- Part 1: Introduction

2021年7月21日

Pandas API on Apache Spark- Part 1: Introduction

Apache Spark has revolutionized the data science field with its support for big data. With its support for multiple…
Barrier Execution Mode in Spark 3.0 - Part 2: Barrier RDD

2020年11月20日

Barrier Execution Mode in Spark 3.0 - Part 2: Barrier RDD

Barrier execution mode is a new execution mode added to spark in 3.0 version.

See all articles

Barrier Execution Mode in Spark 3.0 - Part 1: Introduction

madhukara phatak

Chief Architect at Tellius

Execution Mode

Map/Reduce Execution Mode

Need for New Execution Mode

madhukara phatak的更多文章

社区洞察

其他会员也浏览了

Issue #264 - The ML Engineer ??

Issue #263 - The ML Engineer ??

Evaluating ML Models with Azure, Preventing AI Failure, and Interactive Pipelines

Hyperparameter Optimization, Achieving Responsible AI, and How to Hire Data Scientists

Strategies for Improving Machine Learning Algorithms: Tips & Tricks

THE ML ENGINEER ?? Issue #160

KD 17:n01: 5 Machine Learning Projects You Can’t Overlook; Future of Deep Learning

Knowledge graphs for Machine Learning are so cool !

December 06, 2023

Develop and Deploy GenAI Architectures on Azure

Execution Mode

Map/Reduce Execution Mode

Need for New Execution Mode

madhukara phatak的更多文章

Email Spam Detection using Pre-Trained BERT Model: Part 2 - Model Fine Tuning

Email Spam Detection using Pre-Trained BERT Model : Part 1 - Introduction and Tokenization

Java Streams: Write Functional Collection code in Java

Higher Order Functions in Java

Functional Interfaces: Java Lambda Expressions and Backward Compatibility

Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions

Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference

Pandas API on Apache Spark - Part 2: Hello World

Pandas API on Apache Spark- Part 1: Introduction

Barrier Execution Mode in Spark 3.0 - Part 2: Barrier RDD

社区洞察

其他会员也浏览了

Issue #264 - The ML Engineer ??

Issue #263 - The ML Engineer ??

Evaluating ML Models with Azure, Preventing AI Failure, and Interactive Pipelines

Hyperparameter Optimization, Achieving Responsible AI, and How to Hire Data Scientists

Strategies for Improving Machine Learning Algorithms: Tips & Tricks

THE ML ENGINEER ?? Issue #160

KD 17:n01: 5 Machine Learning Projects You Can’t Overlook; Future of Deep Learning

Knowledge graphs for Machine Learning are so cool !

December 06, 2023

Develop and Deploy GenAI Architectures on Azure