Big Data Archi

Big Data Archi

Introduction

Data is everywhere you look, and this is becoming more so as we have more gadgets, wearables, and, more so in the enterprise.

No alt text provided for this image

Figure 1 – Different Data types & Sources

As the above diagram illustrates, the types and sources of data vary, from the traditional structured databases to the unstructured, steaming videos and IoT devices

Ten years ago, when the people started talking about data, it was described as the new oil.?Now, with 81 ZB (yes Zeta bytes) of data created in 2021, data is the new Oxygen for organizations to survive.

The demand to turn raw data into business insight for today’s companies is now higher than ever. So it’s no surprise that there’s significant interest by data-driven companies for new products and services in a single platform.

We need a common architecture to enable enterprises and end users to adapt on any cloud or hybrid platform to enable us to uncover hidden patterns, unknown correlations, markets, trends, and other useful predictions.

The Big Data Architecture

To harness the value of big data, a robust Data Processing Architecture must be designed. One of the cornerstones of big data architecture is its processing, enabling it to accommodate the processing of almost unlimited amounts of data.

The big data architecture framework serves as a reference blueprint for big data infrastructures and solutions, logically defining how big data solutions will work, the components that will be used, how information will flow, and security details.

The Big Data Architecture is split into core functions and an overlaying orchestration to keep it all together.?It’s main benefit it it’s flexibility, in that it can be applied to static data or Real Time or both at the same time

No alt text provided for this image

Figure 2 - Recommended Big Data Architecture

As I have mentioned in the diagram, each cloud provider has its native tools to simplify the process, and common code can be produced to automate repetitive analysis

The Big Data Architecture Components

Data Sources - Relational databases, data warehouses, cloud-based data warehouses, SaaS applications, real-time data from company servers and sensors, CRM etc

Data Storage - Often referred to as a data lake, a distributed file store holding bulks of large files in different formats, which are subsequently used for batch processing operations.

Batch Processing - The filtering, aggregation, and preparation of the data files through long-running batch jobs.

Real Time Ingestion - A way to capture and store messages from real-time sources for stream processing

Machine Learning - Use machine learning, combining supervised and unsupervised machine learning algorithms to make predictions (say, high insurance claims in a season, grade criminality in an area, predict a heart attack or even predicting stock prices

Stream Processing - This step is needed to handle all of that streaming data in the form of windows or streams and writes it to the sink.

Analytics & Reporting - Native reporting and analysis tools that utilize embedded technologies and solutions to produce useful graphs, analysis, insights and recommendations that are beneficial to the businesses

Orchestration - Automating the workflows involved in repeated data processing operations, such as transforming data structure or moving data between sources and sinks etc

Recommended Best Practices

Based on our experience of architecting and working with big data, we recommend that you deploy the following best practices

1.??????Eliminate Internal Data Silos – make the data accessible to anyone who needs it when they need it

2.??????Ensure all you data is trustworthy – modernize your big data architecture such that you can ingest data, cleanse it, de-duplicate it and validate it any time

3.??????Solid Data Governance – you must implement a robust data governance policy as part of the data architecture and orchestration.

4.??????Account for different data formats/structures – refer to figure 1 – Sources & Types of Data

5.??????Align the big data architecture with your business vision – need to align this with your business key drivers, business strategies and governance models

6.??????Do not under-estimate Machine Learning – This is where you can use your data to predict pricings, outcomes and enable you to make appropriate decisions.?Go beyond just graphical analysis of the data, use it to make the right decisions

Final Thoughts

No alt text provided for this image

Figure 3 - Convergence of Data, AI and Analytics

Big data architecture is an overreaching system that manages huge volumes of data so it can be analysed to steer big data analytics and provide a suitable environment where big data analytic tools can extract and validate vital business information.

With the above-mentioned big data architecture best practices at your fingertips, you can be able to design a system that can handle all the processing, ingesting, and analysis needs for data that is too large and complex for traditional database systems. Message me if you want to know more or would like an assessment of your architecture

Freya Juniper-Nine

Data and Engineering Leader (AWS Solutions Architect Professional)

2 年

Good read, thanks for sharing. Have you got any thoughts on Data Mesh?

Roman Khromin

Transforming with Data | Managing Director | ERP, Data and AI-Driven $100M+ Business Transformations | 24+ Years Digital Leadership | Cloud | Consulting | Engineering | Programme Delivery | Ex-Amazon

2 年

Thanks for sharing, Hassan Shuman! Keen to work with you to progress this further ??

要查看或添加评论,请登录

Hassan Shuman的更多文章

社区洞察

其他会员也浏览了