登录查看更多内容

4G of Big Data "Apache Flink" - Introduction and a Quickstart Tutorial

Santosh Bakliwal

Assistant Vice President at DataFlair

发布日期: 2017年1月7日

1. Objective

In this tutorial we will discuss about the introduction to Apache Flink, What is Flink, Why and where to use Flink. This Flink tutorial will answer the question why Apache Flink is called 4G of Big Data? The tutorial also briefs about Flink APIs and features.

2. Video Tutorial

3. Introduction

Apache Flink is an open source platform which is a streaming data flow engine that provides communication, fault-tolerance and data-distribution for distributed computations over data streams. Flink is a top level project of Apache. Flink is a scalable data analytics framework that is fully compatible to Hadoop. Flink can execute both stream processing and batch processing easily.

Apache Flink was started under the project called Stratosphere. In 2008 Volker Markl formed the idea for Stratosphere and attracted other co-principal Investigators from HU Berlin, TU Berlin and the Hasso Plattner Institute Potsdam. They jointly worked on a vision and had already put the great efforts on open source deployment and systems building. Later on several decisive steps had been so that the project can be popular in commercial , research and open source community. A commercial entity named this project as Stratosphere. After applying for Apache incubation in April 2014 Flink name was finalized. Flink is a german word which means swift or agile.

4. Why Flink?

The key vision for Apache Flink is to overcome and reduces the complexity that has been faced by other distributed data driven engines. It is achieved by integrating query optimization, concepts from database systems and efficient parallel in-memory and out-of-core algorithms, with the MapReduce framework. As Apache Flink is mainly based on streaming modal, Apache Flink iterates data by using streaming architecture. The concept of iterative algorithm is tightly bounded in to Flink query optimizer. Apache Flink’s pipelined architecture allows processing the streaming data faster with lower latency than micro-batch architectures (Spark).

5. Apache Flink-API’s

Apache Flink provides API’s for creating several applications which use flink engine

i. DataStream APIs

It is a regular program in Apache Flink that implements the transformation on data streams For example- filtering, aggregating, update state etc. Results are returned through sink which can be generated through write data on files or in a command line terminal.

ii. DataSet APIs

It is regular program in Apache Flink that implements the transformation on data sets For example-joining, grouping, mapping, filtering etc. This API is used for batch processing of data, the data which is already available in the repository.

iii. Table APIs

This API in FLink used for handling relational operations. It is a SQL-like expression language used for relational stream and batch processing which can also be integrated in Datastream APIs and Dataset APIs.

Read the complete article>>

要查看或添加评论，请登录

Santosh Bakliwal的更多文章

Data Science vs Artificial Intelligence – Eliminate your Doubts

2019年6月20日

Data Science vs Artificial Intelligence – Eliminate your Doubts

Data Science and Artificial Intelligence, are the two most important technologies in the world today. While Data…
Skills Needed to Become a Data Scientist – Learn, Grasp, Implement!

2019年6月19日

Skills Needed to Become a Data Scientist – Learn, Grasp, Implement!

Do you know – We perform 40,000 search queries every second (on Google alone), which makes it 3.5 searches per day and…
Data Science at Netflix – A Must Read Case Study for Aspiring Data Scientists

2019年6月18日

Data Science at Netflix – A Must Read Case Study for Aspiring Data Scientists

Data Science Case Study – How Netflix Used Data Science to Improve its Recommendation System? Do you remember the last…
7 Breathtaking Applications of Data Science in Finance

2019年6月17日

7 Breathtaking Applications of Data Science in Finance

1. Objective – Data Science Careers Today, in this tutorial of Future of Data Science, we will discuss what is Data…
Top Data Science Jobs & Roles for 2019: Find What Suits You Best

2019年6月15日

Top Data Science Jobs & Roles for 2019: Find What Suits You Best

“Data Scientist, the sexiest job title for the 21st century” If you have ever witnessed a discussion on data science…
Data Science Prerequisites – Top Skills Every Data Scientist Need to Have

2019年6月14日

Data Science Prerequisites – Top Skills Every Data Scientist Need to Have

Data Science is a massive sector, it is not just one standalone topic but a combination of many. Often, many of us…
Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning

2019年6月13日

Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning

1. Objective In this blog, we will discuss Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning.
20 Interesting Applications of Deep Learning with Python

2019年6月12日

20 Interesting Applications of Deep Learning with Python

1. Top Python Deep Learning Applications Today, in this Deep Learning with Python Tutorial, we will see Applications of…
Data Scientist vs Business Analyst – 5 Core Aspects to Choose Your Career

2019年6月11日

Data Scientist vs Business Analyst – 5 Core Aspects to Choose Your Career

Data Science and Business Analysis are two of the most recurring terms in the industries. Like data scientists…
14 Most Used Data Science Tools for 2019 – Essential Data Science Ingredients

2019年6月10日

14 Most Used Data Science Tools for 2019 – Essential Data Science Ingredients

A Data Scientist is responsible for extracting, manipulating, pre-processing and generating predictions out of data. In…

See all articles

4G of Big Data "Apache Flink" - Introduction and a Quickstart Tutorial

Santosh Bakliwal

Assistant Vice President at DataFlair

1. Objective

2. Video Tutorial

3. Introduction

4. Why Flink?

5. Apache Flink-API’s

i. DataStream APIs

ii. DataSet APIs

iii. Table APIs

Santosh Bakliwal的更多文章

社区洞察

其他会员也浏览了

“What are the big Data Tools and Technologies?”

Five Reasons Why Apache Spark is the Swiss Army Knife of Big Data Analytics

Data Glossary: Know the terms. #BigData

Datalake

Big Data analytics tools and their key features

Apache Spark: Revolutionizing Big Data Processing

Apache Spark?-?Data Engineering

Apply These Techniques to Learn Complex Event Processing with Apache Flink.

Introduction to Apache Flink – A Quickstart Tutorial

Big-Data Ingestion

1. Objective

2. Video Tutorial

3. Introduction

4. Why Flink?

5. Apache Flink-API’s

i. DataStream APIs

ii. DataSet APIs

iii. Table APIs

Santosh Bakliwal的更多文章

Data Science vs Artificial Intelligence – Eliminate your Doubts

Skills Needed to Become a Data Scientist – Learn, Grasp, Implement!

Data Science at Netflix – A Must Read Case Study for Aspiring Data Scientists

7 Breathtaking Applications of Data Science in Finance

Top Data Science Jobs & Roles for 2019: Find What Suits You Best

Data Science Prerequisites – Top Skills Every Data Scientist Need to Have

Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning

20 Interesting Applications of Deep Learning with Python

Data Scientist vs Business Analyst – 5 Core Aspects to Choose Your Career

14 Most Used Data Science Tools for 2019 – Essential Data Science Ingredients

社区洞察

其他会员也浏览了

“What are the big Data Tools and Technologies?”

Five Reasons Why Apache Spark is the Swiss Army Knife of Big Data Analytics

Data Glossary: Know the terms. #BigData

Datalake

Big Data analytics tools and their key features

Apache Spark: Revolutionizing Big Data Processing

Apache Spark?-?Data Engineering

Apply These Techniques to Learn Complex Event Processing with Apache Flink.

Introduction to Apache Flink – A Quickstart Tutorial

Big-Data Ingestion