Hadoop

Hadoop

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed?storage?and?computation?across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

Hadoop Architecture

At its core, Hadoop has two major layers namely ?

  • Processing/Computation layer (MapReduce), and
  • Storage layer (Hadoop Distributed File System).

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Hadoop's ability to process and store different types of data makes it a particularly good fit for big data environments. They typically involve not only large amounts of data, but also a mix of structured?transaction data?and semistructured and unstructured information, such as internet clickstream records, web server and mobile application logs, social media posts, customer emails and sensor data from the internet of things (IoT).

要查看或添加评论,请登录

Dipti Goyal的更多文章

  • Consumer Lending

    Consumer Lending

    Consumer lending is the provision of credit (loans or credit lines) to individuals for personal, family, or household…

  • Six Sigma

    Six Sigma

    Six Sigma is a set of methodologies and tools used to improve business processes by reducing defects and errors…

  • Scrapy

    Scrapy

    Scrapy is an open-source web crawling framework written in Python, designed for extracting data from websites. It is…

  • Scala

    Scala

    Scala is a coding language short for “Scalable Language.” Some professionals consider Scala to be a modern version of…

  • Oracle Essbase

    Oracle Essbase

    Oracle Essbase is a business analytics solution and multidimensional database management system (MDBMS) that provides a…

  • BigQuery

    BigQuery

    Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets. BigQuery…

  • Gap Analysis

    Gap Analysis

    A gap analysis is a method for comparing a business's current performance to its desired performance. It's a strategic…

  • Tableau

    Tableau

    Tableau is a visual analytics platform that empowers users to explore, visualize, and analyze data to gain insights and…

  • Jira

    Jira

    Jira is a project management and issue tracking tool developed by Atlassian, used by teams to plan, track, release, and…

  • Natural Language Processing

    Natural Language Processing

    Natural language processing (NLP) is the ability of a computer program to understand human language as it's spoken and…

社区洞察

其他会员也浏览了