Hadoop

Hadoop

Hadoop is an?open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers. It's at the center of an ecosystem of?big data?technologies that are primarily used to support advanced analytics initiatives, including?predictive analytics, data mining and?machine learning. Hadoop systems can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing, analyzing and?managing data?than relational databases and?data warehouses?provide.

Hadoop's ability to process and store different types of data makes it a particularly good fit for big data environments. They typically involve not only large amounts of data, but also a mix of structured?transaction data?and semistructured and unstructured information, such as internet clickstream records, web server and mobile application logs, social media posts, customer emails and sensor data from the internet of things (IoT).

Formally known as Apache Hadoop, the technology is developed as part of an open source project within the?Apache Software Foundation. Multiple vendors offer commercial Hadoop distributions, although the number of Hadoop vendors has declined because of an overcrowded market and then competitive pressures driven by the increased deployment of big data systems in the cloud. The shift to the cloud also enables users to store data in lower-cost?cloud object storage?services instead of Hadoop's namesake file system; as a result, Hadoop's role is being reduced in some big data architectures.

要查看或添加评论,请登录

Dipti Goyal的更多文章

  • Keras

    Keras

    Keras is a high-level, open-source neural network API written in Python, designed to simplify the creation and training…

  • Sql Database

    Sql Database

    SQL databases, also known as relational databases, are systems that store collections of tables and organize structured…

  • Regulatory Reporting

    Regulatory Reporting

    Regulatory reporting is the process of collecting and submitting data to regulatory bodies to demonstrate compliance…

  • IFRS

    IFRS

    IFRS, or International Financial Reporting Standards, are a set of globally accepted accounting standards designed to…

  • Alteryx

    Alteryx

    Alteryx is a data analytics and visualization platform that allows users to easily prepare, blend, and analyze data…

  • Consumer Lending

    Consumer Lending

    Consumer lending is the provision of credit (loans or credit lines) to individuals for personal, family, or household…

  • Six Sigma

    Six Sigma

    Six Sigma is a set of methodologies and tools used to improve business processes by reducing defects and errors…

  • Scrapy

    Scrapy

    Scrapy is an open-source web crawling framework written in Python, designed for extracting data from websites. It is…

  • Scala

    Scala

    Scala is a coding language short for “Scalable Language.” Some professionals consider Scala to be a modern version of…

  • Oracle Essbase

    Oracle Essbase

    Oracle Essbase is a business analytics solution and multidimensional database management system (MDBMS) that provides a…

社区洞察

其他会员也浏览了