Why use Spark?
The very first reason you'll find on the internet is to work with Big Data, but why not use Pandas, Hadoop, or Dask? Let's find out about it in a few words.
Pandas is good for a dataset with millions of data points and works perfectly fine in most cases but when your data is too huge to be stored on one machine and is distributed across a network of machines, using pandas won't be a valid choice.
Hadoop is one of the leading Big Data technologies out there, so why do we need Spark? Hadoop also manages the data very efficiently but there is a thin line of difference between Spark and Hadoop. Hadoop does most of its computation by utilizing space on the hard disk whereas Spark utilizes memory (RAM). Though Spark without to be more expensive it's for you to decide between speed or cost. Spark is proven to be faster than Hadoop as it consumes the RAM on the hardware.
DASK is a python package also written in python that helps you work with the HDFS system but still, there are drawbacks to it in comparison to Spark. Spark supports Scala, Java, Python, R, and SQL whereas DASK only supports python. Spark gives you a complete package for all your needs but DASK is dependent on other libraries to get its job done. A plus point is that DASK and Spark both can handle up to 1000 nodes.
Based on the aforementioned differences it's for you to decide whether or not you would need Spark.