Why Apache Spark?

Why Apache Spark?

Let's review and polish the title. Maybe the main question here is "Why not Apache Spark?"

When the main focus is working with BIG data, streaming data, we need Speed, Ease of use, Cover all the needs, and Run on different source of data. I will focus on these four main purposes to see Spark could be a good solution or ...

Speed

Run workloads 100x faster. (Wow)

Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.

Ease of Use

Write applications quickly in Java, Scala, Python, R, and SQL. (Amazing)

Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells.

Generality

Combine SQL, streaming, and complex analytics.

Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Runs Everywhere

Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.

You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

[ Source: apache.org ]

要查看或添加评论,请登录

Amir Maleki的更多文章

社区洞察