Get ready for the Spark skill shortage
There has been a marked rise in the popularity of Spark in the Big Data environment, as it overtakes Hadoop as the tool of choice for many data professionals.
The transition has been helped by Spark's user friendly elements - it runs on YARN, an execution environment which will already be familiar to users of Hadoop 2, and can be supported by Databricks. The outcome may be that Hadoop is not completely replaced by Spark, but that it becomes a tool to use as a replacement for Hadoop components.
It is the MapReduce component of Hadoop, in particular, that Spark is replacing at an increasing rate. A comprehensive survey which quizzed over 3,000 IT professionals, conducted by San Francisco development tool maker Typesafe, found that 22 per cent are currently working with Spark. Considering that only 59 per cent of the 3,000 respondents were developers, this is a sure sign that the trend is catching on.
Anand Tyer is the senior product manager at Cloudera, which was the first firm to commercialise Hadoop. Commenting on the software's fast growth, he told the Tech Target news platform,: “Compared with MapReduce, Spark is almost an order of magnitude faster, it has a significantly more approachable and extensible API, and it is highly scalable. Spark is a fantastic, flexible engine that will eventually replace MapReduce.”
IBM is one of the first heavyweights to get on board with Spark, with a large number of its researchers using the tool in their analytics work.
What does all this mean for recruitment? As the demand for Spark rises, so will the need for associated skill sets. Given the rapid rise of Spark, it seems unlikely that at this stage there is the number of qualified professionals needed to meet this demand in Europe. This looks set to lead to a scramble for the best talent with Spark experience, and until this shortage is addressed by the creation of more training programmes - both within organisations and from academic institutions - data professionals coherent with Spark look to be hot property. How are you adapting your recruitment strategy to compete with the massive demand for Spark professionals? Get in touch with us here at Darwin and let's have a chat about Spark, up-skilling in Data Science and securing the best engineer for your project.
AI Developer
8 年With respect to Spark training: there is a lot of online resources (and I'm only mentioning couple of free ones) - Databricks Community Cloud - has tutorials in form of notebooks - edX Spark specialization - actually uses aforementioned Cloud, and videos are also available there - Coursera's last Scala specialization course is said to be available in February - Big Data Academy courses
AI Developer
8 年I didn't get the YARN part. Hadoop also uses it for scheduling. And Spark can be run not only on YARN.