Seeing the Big Picture: MapReduce, Hadoop and the Cloud
Dr. Shannon Block, CFE
Board Director?3-time CEO?President?Chief Digital Officer?Chief Strategy Officer?COO?CBDO?Doctorate in Computer Science?M.S. Physics?B.S. Applied Mathematics?B.S. Physics
Big data contains patterns and methods to inform companies about their customers and vendors, as well as help improve business processes. Some of the biggest companies in the world like Facebook have used MapReduce framework as a tool for their cloud computing applications sometimes through implementing Hadoop, an open source code of MapReduce. MapReduce was designed by Google for parallel distributed computing of big data.
Before MapReduce, companies needed to pay data modelers and buy supercomputers to process timely big data insights. MapReduce has been an important development in helping businesses solve complex problems across big data sets like determining the optimal price for products, understanding the return on the investment of advertising, performing long term predictions and mining web clicks to inform product and service development.
MapReduce works across a network of low-cost commodity machines allowing actionable business insights to be more accessible than ever before. It is strong computation tool for solving problems that can involve pattern matching, social network analysis, log analysis and clustering.
The logic behind MapReduce is basically dividing big problems into small manageable tasks that are then distributed to hundreds of thousands of server nodes. The server nodes operate in parallel to generate results. From a programming standpoint, this involves writing a map script where the data is mapped into a collection of key value pairs and writing a reduce script over all pairs with the same key. One challenge is the time it takes to convert and break the data into the new key-value pair which increases latency.
Hadoop is Apache’s open-source implementation of the MapReduce framework. In addition to the MapReduce distributed processing layer, Hadoop uses HDFS for reliable storage, YARN for resource management and has flexibility in dealing with structured and unstructured data. New nodes can be added easily to Hadoop without downtime and if a machine goes down, data can be easily retrieved. Hadoop can be a cost efficient solution for big data processing, allow terabytes of data to be analyzed within minutes.
But, cloud platforms like Amazon Web Services (AWS), Microsoft Azure, and Google’s Cloud Platform offer similar MapReduce components where the operational complexity is handled by the cloud vendors instead of the individual businesses. Hadoop is known for its strong combination of computation with storage, but in place of HDFS, cloud-based object stores have been built on models like AWS which given the ability to still compute and use virtualization technology like Kubernetes instead of YARN. With the growing shift to cloud vendors, there have been some increased concerns around the long-term vision for Hadoop.
Hortonworks was the data software company that supported open-source software, primarily Hadoop. But in January 2019, Hortonworks closed an all-stock $5.2 billion merger with Cloudera. While Cloudera also supports open source Hadoop, it has a different vendor-lock management suite that is supposed to help with both installation and deployment whereas Hortonworks was 100% open-source. In May 2109, another Hadoop provider, MapR, announced they were looking for a new source of funding. On June 6, 2019, Cloudera’s stock declined 43% and the CEO left the company.
Understanding the advantages and disadvantages of the MapReduce framework and Hadoop in big data analytics is helpful to making informed business decisions as this field continues to evolve. In terms of the drawbacks of Hadoop, Monte Zwebe, the CEO of Splice Machine, that creates relational databases for Hadoop says, “When we need to transport ourselves to another location and need a vehicle, we go and buy a car. We don’t buy a suspension system, a fuel injector, and a bunch of axles and put the whole thing together, so to speak. We don’t go get the bill of materials.”
What do you think? Please DM me or leave your feedback in the comments below.
#Hadoop #MapReduce #CloudComputing
?About the Author
Shannon Block is an entrepreneur, mother and proud member of the global community. Her educational background includes a B.S. in Physics and B.S. in Applied Mathematics from George Washington University, M.S. in Physics from Tufts University and she is currently completing her Doctorate in Computer Science. She has been the CEO of both for-profit and non-profit organizations. Currently as Executive Director of Skillful Colorado, Shannon and her team are working to bring a future of skills to the future of work. With more than a decade of leadership experience, Shannon is a pragmatic and collaborative leader, adept at bringing people together to solve complex problems. She approaches issues holistically, helps her team think strategically about solutions and fosters a strong network of partners with a shared interest in strengthening workforce and economic development across the United States. Prior to Skillful, Shannon served as CEO of the Denver Zoo, Rocky Mountain Cancer Centers, and World Forward Foundation. She is deeply engaged in the Colorado community and has served on multiple boards including the International Women's Forum, the Regional Executive Committee of the Young Presidents’ Organization, Children’s Hospital Quality and Safety Board, Women’s Forum of Colorado, and the Colorado-based Presbyterian/St. Luke’s Community Advisory Council. Follow her on Twitter @ShannonBlock or connect with her on LinkedIn.
Visit www.ShannonBlock.org for more on technology tools and trends.
Information Technology Consultant
5 年Great article Shannon! I am intrigued to learn more about MapReduce and especially Hadoop as this falls directly in line with my dissertation research. Big data processing is vital in today’s technological society with almost everything now operating on cloud services and virtuality. Thank you!