Big Data Archi
Hassan Shuman
CTO | GenAI Pioneer | AWS & Azure Expert Transforming Enterprises with GenAI, Cloud Migration, and Innovation | CIO/CTO Advisor | ex-IBM, Accenture,
Introduction
Data is everywhere you look, and this is becoming more so as we have more gadgets, wearables, and, more so in the enterprise.
Figure 1 – Different Data types & Sources
As the above diagram illustrates, the types and sources of data vary, from the traditional structured databases to the unstructured, steaming videos and IoT devices
Ten years ago, when the people started talking about data, it was described as the new oil.?Now, with 81 ZB (yes Zeta bytes) of data created in 2021, data is the new Oxygen for organizations to survive.
The demand to turn raw data into business insight for today’s companies is now higher than ever. So it’s no surprise that there’s significant interest by data-driven companies for new products and services in a single platform.
We need a common architecture to enable enterprises and end users to adapt on any cloud or hybrid platform to enable us to uncover hidden patterns, unknown correlations, markets, trends, and other useful predictions.
The Big Data Architecture
To harness the value of big data, a robust Data Processing Architecture must be designed. One of the cornerstones of big data architecture is its processing, enabling it to accommodate the processing of almost unlimited amounts of data.
The big data architecture framework serves as a reference blueprint for big data infrastructures and solutions, logically defining how big data solutions will work, the components that will be used, how information will flow, and security details.
The Big Data Architecture is split into core functions and an overlaying orchestration to keep it all together.?It’s main benefit it it’s flexibility, in that it can be applied to static data or Real Time or both at the same time
Figure 2 - Recommended Big Data Architecture
As I have mentioned in the diagram, each cloud provider has its native tools to simplify the process, and common code can be produced to automate repetitive analysis
The Big Data Architecture Components
Data Sources - Relational databases, data warehouses, cloud-based data warehouses, SaaS applications, real-time data from company servers and sensors, CRM etc
Data Storage - Often referred to as a data lake, a distributed file store holding bulks of large files in different formats, which are subsequently used for batch processing operations.
Batch Processing - The filtering, aggregation, and preparation of the data files through long-running batch jobs.
领英推荐
Real Time Ingestion - A way to capture and store messages from real-time sources for stream processing
Machine Learning - Use machine learning, combining supervised and unsupervised machine learning algorithms to make predictions (say, high insurance claims in a season, grade criminality in an area, predict a heart attack or even predicting stock prices
Stream Processing - This step is needed to handle all of that streaming data in the form of windows or streams and writes it to the sink.
Analytics & Reporting - Native reporting and analysis tools that utilize embedded technologies and solutions to produce useful graphs, analysis, insights and recommendations that are beneficial to the businesses
Orchestration - Automating the workflows involved in repeated data processing operations, such as transforming data structure or moving data between sources and sinks etc
Recommended Best Practices
Based on our experience of architecting and working with big data, we recommend that you deploy the following best practices
1.??????Eliminate Internal Data Silos – make the data accessible to anyone who needs it when they need it
2.??????Ensure all you data is trustworthy – modernize your big data architecture such that you can ingest data, cleanse it, de-duplicate it and validate it any time
3.??????Solid Data Governance – you must implement a robust data governance policy as part of the data architecture and orchestration.
4.??????Account for different data formats/structures – refer to figure 1 – Sources & Types of Data
5.??????Align the big data architecture with your business vision – need to align this with your business key drivers, business strategies and governance models
6.??????Do not under-estimate Machine Learning – This is where you can use your data to predict pricings, outcomes and enable you to make appropriate decisions.?Go beyond just graphical analysis of the data, use it to make the right decisions
Final Thoughts
Figure 3 - Convergence of Data, AI and Analytics
Big data architecture is an overreaching system that manages huge volumes of data so it can be analysed to steer big data analytics and provide a suitable environment where big data analytic tools can extract and validate vital business information.
With the above-mentioned big data architecture best practices at your fingertips, you can be able to design a system that can handle all the processing, ingesting, and analysis needs for data that is too large and complex for traditional database systems. Message me if you want to know more or would like an assessment of your architecture
Data and Engineering Leader (AWS Solutions Architect Professional)
2 年Good read, thanks for sharing. Have you got any thoughts on Data Mesh?
Transforming with Data | Managing Director | ERP, Data and AI-Driven $100M+ Business Transformations | 24+ Years Digital Leadership | Cloud | Consulting | Engineering | Programme Delivery | Ex-Amazon
2 年Thanks for sharing, Hassan Shuman! Keen to work with you to progress this further ??