Fog Computing and Spark
While big data analytics may be getting a lot of attention, the concept that really sparks the tech community’s imagination is the Internet of Things (IoT). The IoT embeds objects and devices with tiny sensors that communicate with each other and the user, creating a fully interconnected world. This world collects massive amounts of data, processes it and delivers revolutionary new features and applications for people to use in their everyday lives. However, as the IoT expands so too does the need for distributed massively parallel processing of vast amounts and varieties of machine and sensor data. All that processing, however, is tough to manage with the current analytics capabilities in the cloud.
That’s where fog computing and Apache Spark come in.
Fog computing decentralizes data processing and storage, instead performing those functions on the edge of the network. However, Fog computing brings new complexities to processing decentralized data because it increasingly requires low latency, massively parallel processing of machine learning, and extremely complex graph analytics algorithms. Fortunately, with key stack components such as Spark Streaming, an interactive real-time query tool (Shark), a machine learning library (MLib), and a graph analysis engine (GraphX), Spark more than qualifies as a fog computing solution. In fact, as the IoT industry gradually and inevitably converges, many industry experts predict that — compared to other open source platforms — Spark has the potential to emerge as the de facto fog infrastructure.