!! Big Data Concept with Distributed Storage Cluster and Hadoop !!
AK Panchal
DevOps Engineer | Cloud & Infrastructure Automation | Docker | Kubernetes | AWS | Open Source | Web3 Enthusiast | Developer Advocate | Community Management | Technical Evangelist | Explorer
Big Data Evolution/history:-
* The 21st Century. In 2005 Roger Mougalas from O'Reilly Media coined the term Big Data for the first time, only a year after they created the term Web 2.0. It refers to a large set of data that is almost impossible to manage and process using traditional business intelligence tools.
Big Data concept:-
Big Data:- Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. or Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.
Big data technology:-
BIG DATA is a term used for a collection of data sets so large and complex that it is difficult to process using traditional applications/tools. It is the data exceeding Terabytes in size. ... Here are the top technologies used to store and analyse Big Data. example of big data stock exchanges, social media sites, jet engines etc.
REAL WORLD BIG DATA EXAMPLES:- Discovering consumer shopping habits Personalized marketing Fuel optimization tools for the transportation industry Monitoring health conditions through data from wearables Live road mapping for autonomous vehicles Streamlined media streaming Predictive inventory ordering Personalized health plans for cancer patients Real-time data monitoring and cybersecurity protocols
Types Of Big Data:-
Structured
Structured is one of the types of big data and By structured data, we mean data that can be processed, stored, and retrieved in a fixed format. It refers to highly organized information that can be readily and seamlessly stored and accessed from a database by simple search engine algorithms.
Example: Database Management Systems
Unstructured
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This makes it very difficult and time-consuming to process and analyze unstructured data.
Example:Comma Separated Values(CSV) File,Email
Semi-structured
Semi structured is the third type of big data. Semi-structured data pertains to the data containing both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers to the data that although has not been classified under a particular repository (database), yet contains vital information or tags that segregate individual elements within the data.
Example: Audio Files, Images etc.
Characteristics of Big Data:-
Big data architecture:-
A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. Big data solutions typically involve one or more of the following types of workload: Batch processing of big data sources at rest
Big Data impact on different industry:-
Agriculture
The role of Big Data in the field of agriculture cannot be ignored. It helps the farmers in yield prediction which in turns helps them to know what to plant and where to plant. One important area of its application is risk management; by helping farmers evaluate the chances of crop failure and even improve the overall feed efficiency, this technology can save crop damage caused by alternating weather conditions.
Online Retail
Top online retailers like Flipkart, Amazon, Urban Clap have rose to heights of popularity in the last few years. They have created a whole new class of individuals who have got accustomed to shopping at the comfort of their home instead of visiting the physical stores. While shopping at home definitely beats our hassle of commuting, we cannot ignore the fact that how online retail companies are using the information provided by us to create opportunities for them. With the help of big data, the online retailers can predict our consumption habits and persuade us to buy goods. Apart from that, this data will also help them in the process of supply chain.
Consumer Goods Industry
Utilizing big data analysis, organizations can anticipate upcoming possible price variations and change purchases as need be. These value variances can be anticipated by pursuing essential parameters around the globe that have an effect on the cost, for example, market data and weather data.
Big data challenges :
- Dealing with data growth. ...
- Generating insights in a timely manner. ...
- Recruiting and retaining big data talent. ...
- Integrating disparate data sources. ...
- Validating data. ...
- Securing big data. ...
- Organizational resistance.
Distributed Storage:-
A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.
Storage Clustering:-
Clustered storage is the use of two or more storage servers working together to increase performance, capacity, or reliability. ... A loose cluster offers performance, I/O, and storage capacity within the same node. As a result, performance scales with capacity and vice versa.
Hadoop:-
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
example:- Word count example is the “Hello World†program of the Hadoop and MapReduce. In this example, the program consists of MapReduce job that counts the number of occurrences of each word in a file. This job consists of two parts Map and Reduce.
Hadoop Architecture:-
Per Day Daily Data use:-
Here are some key daily statistics highlighted in the infographic:
- 500 million tweets are sent
- 294 billion emails are sent
- 4 petabytes of data are created on Facebook
- 4 terabytes of data are created from each connected car
- 65 billion messages are sent on WhatsApp
- 5 billion searches are made
By 2025, it’s estimated that 463 exabytes of data will be created each day globally – that’s the equivalent of 212,765,957 DVDs per day!
By 2025, it’s estimated that 463 exabytes of data will be created each day globally – that’s the equivalent of 212,765,957 DVDs per day!