Why use Spark?

Why use Spark?

The very first reason you'll find on the internet is to work with Big Data, but why not use Pandas, Hadoop, or Dask? Let's find out about it in a few words.

Pandas is good for a dataset with millions of data points and works perfectly fine in most cases but when your data is too huge to be stored on one machine and is distributed across a network of machines, using pandas won't be a valid choice.

Hadoop is one of the leading Big Data technologies out there, so why do we need Spark? Hadoop also manages the data very efficiently but there is a thin line of difference between Spark and Hadoop. Hadoop does most of its computation by utilizing space on the hard disk whereas Spark utilizes memory (RAM). Though Spark without to be more expensive it's for you to decide between speed or cost. Spark is proven to be faster than Hadoop as it consumes the RAM on the hardware.

DASK is a python package also written in python that helps you work with the HDFS system but still, there are drawbacks to it in comparison to Spark. Spark supports Scala, Java, Python, R, and SQL whereas DASK only supports python. Spark gives you a complete package for all your needs but DASK is dependent on other libraries to get its job done. A plus point is that DASK and Spark both can handle up to 1000 nodes.

Based on the aforementioned differences it's for you to decide whether or not you would need Spark.

要查看或添加评论,请登录

Rithwik Chhugani的更多文章

  • Tools for Smart/Lazy Data Scientists (ft. LazyPredict)

    Tools for Smart/Lazy Data Scientists (ft. LazyPredict)

    Being a data scientist you don't necessarily need to write tons and tons of code to see the performance of your models.…

  • Vanilla Regression VS Robust Regression

    Vanilla Regression VS Robust Regression

    Regression is one of the most widely used algorithms for forecasting. Regression is the first thing you'd learn in the…

  • Likelihood VS Probability

    Likelihood VS Probability

    It may look simple, but it's capable to create head-scratching situations at times. Let's understand in a few words…

  • Popular CNN Architectures

    Popular CNN Architectures

    Every now and then researchers try to fine-tune their existing model or come up with new architectures to win the…

  • Types of Hyperparameter Tuning

    Types of Hyperparameter Tuning

    What is hyperparameter tuning? Hyperparameter tuning is an extra step to make sure that your model is using the right…

社区洞察

其他会员也浏览了