The Dawn of Big Data

The Dawn of Big Data

Can you guess how the data generated across the globe were handled about 10-12 years ago? Relational databases were mainly used to manage data until the advent of Hadoop technology. Do you know that 95% of such data is unstructured as of now and the percentage is growing enormously? The data in the world is projected to grow 50 times in 2020 compared to the data in 2011. Now, you may ask what unstructured data is. It is nothing but video, images among many others which are produced in incredible proportions. A lot of such data is generated thanks to free and scalable social media networks. But how do top tech MNCs process this huge amount of data to derive business decisions from them?

And this all data is that only which we term as -- BIG DATA !!!! --- CAN ARISE AS BOTH ANGEL AS WELL AS DEMON

But now firstly the question will arise Why ??? Why as both !!!!

So let's analyze this, Do you even know..!!! The technologies that are booming greatly in the recent times are Artificial intelligence, Big data, Cloud Computing among many others. Companies across the world from startups to mature tech players have already shifted to Big data analytics. The storage which is scalable to a very large proportion at an inexpensive cost is one reason why big data analytics is so popular today. Another factor for this trend is competent processing power that comes with big data that is also fault tolerant.

Can we let this happen like chatting on social media, and other activities like sharing posts, uploading videos, photos, that all data will vanish once you switch off the application. obviously, No!!! We'll stop using those application. because we need to store our data, that's very important to us in many perspectives. But can we even imagine how social media platforms like Facebook, Instagram, twitter, WhatsApp, store our whole data, without any fail. And many more companies from startups to mature tech players how they fix the issue of Big Data. So in simple terms if Data is analyzed in a right manner obviously our data is angel to us useful to us in every manner, otherwise losing our data and not analyzed in right manner will arise as demon to us to our business and in other terms too. Data analytics is important because it helps businesses optimize their performances. Implementing it into the business model means companies can help reduce costs by identifying more efficient ways of doing business and by storing large amounts of data. Before moving ahead let's have some more information regarding Big Data, given below:

As the term “Big data” itself defines data i.e. very big in size or very straight forward it refers to data that is so large, and so complex that it’s difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around a long time. But the concept of big data gained momentum in the early 2000's when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three V’s:

 Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media and many more. In the past, storing it would have been a problem – but today cheaper storage on platforms like data lakes and Hadoop have eased the burden.

 Velocity: With the growth in the Internet of Things, data streams into businesses at an unprecedented speed and must be handled in a timely manner. RFID (Radio Frequency Identification) tags, sensors and smart meters are driving the need to deal with these torrents of data in Near-Real Time.

 Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions.

But we consider two more additional dimensions when it comes to big data:

Veracity: Veracity refers to the quality of data. Because data comes from so many different sources, it’s difficult to link, match, cleanse and transform data across systems. Businesses need to connect and correlate relationships, hierarchies and multiple data linkages. Otherwise, their data can quickly spiral out of control.

Value: Finally, the V for value sits at the top of the big data pyramid. This refers to the ability to transform a tsunami of data into business.

No alt text provided for this image

So we can conclude by above info we've 5 V's of Big Data as shown/figured in above figure too.

Optimized Production with Big Data Analytics

At USG Corporation, using big data with predictive analytics is key to fully understanding how products are made and how they work. And in a market with a barrage of global competition, manufacturers like USG know the importance of producing high-quality products at an affordable price. Using the SAS Platform, USG has removed guesswork and optimized its production investments. The results: improved product quality and time to market.

 

Why Is Big Data Important?

The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable

1) Cost reductions

2) Time reductions

3) New product development and optimized offerings

4) Smart decision making.

When we combine big data with high-powered analytics, we can accomplish business-related tasks such as:

  • Determining root causes of failures, issues and defects in near-real time.
  • Generating coupons at the point of sale based on the customer’s buying habits.
  • Recalculating entire risk portfolios in minutes.
  • Detecting fraudulent behavior before it affects your organization.
Deep learning craves big data because big data is necessary to isolate hidden patterns and to find answers without over-fitting the data. With deep learning, the more good quality data you have, the better the results.

by Wayne Thompson || SAS Product Manager

How Big Data works

Before businesses can put big data to work for them, they should consider how it flows among a multitude of locations, sources, systems, owners and users. There are five key steps to taking charge of this big “data fabric” that includes traditional, structured data along with unstructured and semi-structured data:

  • Set a big data strategy.
  • Identify big data sources.
  • Access, manage and store the data.
  • Analyze the data.
  • Make data-driven decisions.

1) Set a big data strategy

At a high level, a big data strategy is a plan designed to help you oversee and improve the way you acquire, store, manage, share and use data within and outside of your organization. A big data strategy sets the stage for business success amid an abundance of data. When developing a strategy, it’s important to consider existing – and future – business and technology goals and initiatives. This calls for treating big data like any other valuable business asset rather than just a byproduct of application.

2) Know the sources of big data

  • Streaming data comes from the Internet of Things (IoT) and other connected devices that flow into IT systems from wearable, smart cars, medical devices, industrial equipment and more. You can analyze this big data as it arrives, deciding which data to keep or not keep, and which needs further analysis. 
  • Social media data stems from interactions on Facebook, YouTube, Instagram, etc. This includes vast amounts of big data in the form of images, videos, voice, text and sound – useful for marketing, sales and support functions. This data is often in unstructured or semi structured forms, so it poses a unique challenge for consumption and analysis. 
  • Publicly available data comes from massive amounts of open data sources like the US government’s data.gov, the CIA World Fact book or the European Union Open Data Portal. 
  • Other big data may come from data lakes, cloud data sources, suppliers and customers.

3) Access, Manage and Store Big Data

Modern computing systems provide the speed, power and flexibility needed to quickly access massive amounts and types of big data. Along with reliable access, companies also need methods for integrating the data, ensuring data quality, providing data governance and storage, and preparing the data for analytics. Some data may be stored on-premises in a traditional data warehouse – but there are also flexible, low-cost options for storing and handling big data via cloud solutions, data lakes and Hadoop.

4) Analyze big data

With high-performance technologies like grid computing or in-memory analytics, organizations can choose to use all their big data for analyses. Another approach is to determine upfront which data is relevant before analyzing it. Either way, big data analytics is how companies gain value and insights from data. Increasingly, big data feeds today’s advanced analytics endeavors such as artificial intelligence.

5) Make intelligent, data-driven decisions

Well-managed, trusted data leads to trusted analytics and trusted decisions. To stay competitive, businesses need to seize the full value of big data and operate in a data-driven way – making decisions based on the evidence presented by big data rather than gut instinct. The benefits of being data-driven are clear. Data-driven organizations perform better, are operationally more predictable and are more profitable.

How to analyze Big Data: The Concept of DFS

Distributed file system (DFS) is a method of storing and accessing files based in a client/server architecture. In a distributed file system, one or more central servers store files that can be accessed, with proper authorization rights, by any number of remote clients in the network.

Much like an operating system organizes files in a hierarchical file management system, the distributed system uses a uniform naming convention and a mapping scheme to keep track of where files are located. When the client device retrieves a file from the server, the file appears as a normal file on the client machine, and the user is able to work with the file in the same ways as if it were stored locally on the workstation. When the user finishes working with the file, it is returned over the network to the server, which stores the now-altered file for retrieval at a later time.

Distributed file systems can be advantageous because they make it easier to distribute documents to multiple clients and they provide a centralized storage system so that client machines are not using their resources to store files.

Implementing DFS with: Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a NameNode and Data Node architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.

No alt text provided for this image

HDFS is a key part of the many Hadoop ecosystem technologies, as it provides a reliable means for managing pools of big data and supporting related big data analytics applications.

How do top tech MNCs process this huge amount of data to derive business decisions from them? How do top technology companies leverage this big data to serve their customers? Let’s see

Google: Did you know that Google processes about 3.5 billion search queries on single day? Do you know that each request queries about pages numbering 20 billion? Google derives such search results from knowledge graph database, indexed pages and Google bots crawling over a plethora of web pages. The user requests are processed in Google’s application servers. The application server searches results in GFS (Google File System) and logs the search queries in logs cluster for quality testing. Google uses Dremel which is a query execution engine to run almost near real-time, ad-hoc queries from search engines. This kind of advantage is not present in MapReduce. Google launched Big Query which runs queries based on aggregation over billions row tables in a matter of seconds. Google is really advanced in its implementation of big data technologies.

Facebook: Did you know that users of Facebook upload 500+ terabytes of data per day? To process such large chunks of data, Facebook uses Hive for parallel map-reduce operations and Hadoop for its data storage. Would you believe me if I say Facebook uses Hadoop cluster which is the largest in the world? Employees also use Cassandra which is fault-tolerant, distributed storage system aiming to manage large amount of structured data across variety of commodity servers. Facebook also uses Scuba to carry out real-time ad-hoc analysis on massive data sets. Hive is used to store large data in Oracle data warehouse. Prism is used to bring out and manage multiple namespaces instead of a single one managed by Hadoop. Facebook also uses many other big data technologies such as Corona, Peregine, among many others.

Oracle: There is an explosive growth like 12.5 billion devices which doesn’t include phones, tablets and PCs. This has helped to increase the research and development in the field of Internet-of-Things and in storage requirements which in turn require database management support. Oracle users use Oracle Advanced Analytics which requires Oracle database to be loaded with data. Oracle advanced analytics provides functionalities such as text mining, predictive analytics, statistical analysis and interactive graphics among many others. HDFS data can be loaded into an Oracle data warehouse using Oracle Loader for Hadoop. This feature is used to link data and search query results from Hadoop to Oracle data warehouse. Oracle Exadata Database Machine provides scalable and high-end performance for all database applications. Oracle is leveraging big data to mainly expand its business in Database management systems.

Here are top 5 hadoop technology companies expected to contribute to this fast-growing market:

Amazon Web Services: “Amazon Elastic MapReduce provides a managed, easy to use analytics platform built around the powerful Hadoop framework. Focus on your map/reduce queries and take advantage of the broad ecosystem of Hadoop tools, while deploying to a high scale, secure infrastructure platform.”

Cloudera: Cloudera develops open-source software for a world dependent on Big Data. With Cloudera, businesses and other organizations can now interact with the world’s largest data sets at the speed of thought — and ask bigger questions in the pursuit of discovering something incredible.”

IBM: IBM Info Sphere Big Insights makes it simpler for people to use Hadoop and build big data applications. It enhances this open source technology to withstand the demands of your enterprise, adding administrative, discovery, development, provisioning, and security features, along with best-in-class analytical capabilities from IBM Research. The result is that you get a more developer and user-friendly solution for complex, large scale analytics.”

Microsoft: “Quickly build a Hadoop cluster in minutes when you need it, and delete it when your work is done. Choose the right cluster size to optimize for time to insight or cost. Seamlessly integrate HDInsight into your existing analysis workflows with Windows Azure PowerShell and Windows Azure Command-Line Interface.”

 Science Soft: Science Soft provides a full range of Hadoop-related services: health checks, architecture design, implementation, integration, and support. Alongside the core Hadoop framework, the company offers a combination of other big data frameworks and technologies to present the most efficient big data solution to their customers. Science Soft is ready to back up a big data project at any stage to secure optimized costs, uninterrupted performance, and a system speed-up.

Thanks a lot for giving your precious time to read the article. Hope you worth it. share your vies and thoughts regarding the article in comment section of post. This article is written by going through various blogs, articles, case studies, & obviously with the help of internet. Thanks Again..!!?


Raghav Tiwari

___________________________

4 年

Great work

回复
Nilesh Gopale

MTS 1 @Cohesity | Ex-Veritas | Kubernetes | Docker | Golang | Python

4 年

Great ?

回复
Aaditya Tiwari

DevOps Engineer @Amdocs

4 年

Nice work ?

回复

要查看或添加评论,请登录

Mukta Luhach的更多文章

社区洞察

其他会员也浏览了