登录查看更多内容

Design your Modern Data Architecture by Making These Six Fundamental Shifts

Reza K.

发布日期: 2021年11月29日

To succeed in today’s environment, businesses need to lead through growing complexity and volatility, drive operational excellence and enable interaction across enterprise functions, develop higher quality leadership and talent, manage change and unlock new possibilities grounded in data. Organizations are under ever-greater demands to innovate faster and at a larger scale. Data and analytics are being leveraged to generate radical new business models and disrupt traditional industry structures as a result of this response. How are data and analytics driving new ideas and new innovation, and what can executives do to incorporate data-driven innovation more directly into their strategies and initiatives?

Innovation, and the value that it can bring, does not occur in a vacuum. Data is key to motivating and driving innovation.

Analytics, business intelligence (BI), and data management can help organizations innovate, including by making inter-dependencies between humans, institutions, entities, and processes more apparent through study of data relationships. These tools can help organizations obtain a better understanding of how changes to one process or function will affect other processes or functions. However, it will not happen if the organizations don’t have a data architecture to drive innovation and support the business growth. This article is taken from the point of view published by mckinsey and intends to help you to have a clear view about what you can do to drive innovation in your organization.

Data architecture from yesterday can't keep up with today's demands for speed, flexibility, and creativity. Agility is the key to a successful upgrade and a lot of possible benefits. During the last few years, organizations have had to act quickly to deploy new data technologies alongside legacy infrastructure to drive market-driven innovations such as personalized offers, real-time alerts, and predictive maintenance. These technical additions, from data lakes to customer analytics platforms to stream processing, have greatly increased the complexity of data architectures, making it difficult for organizations to deliver new capabilities, maintain existing infrastructures, and ensure the integrity of artificial intelligence (AI) models.

Such slowdowns aren't possible in the current market environment. Leaders like Amazon, Microsoft and Google have been using artificial intelligence to disrupt traditional business models. Cloud providers have introduced cutting-edge products such as serverless data platforms that can be deployed quickly, allowing early adopters to benefit from a shorter time to market and more agility. Users of analytics want more integrated tools, such as automated model deployment platforms, so they can employ new models more quickly.

Many companies are using application programming interfaces (APIs) to expose data from various systems to their data lakes and incorporate insights into front-end apps quickly. Now, companies' need for flexibility and speed has increased as they manage the enormous humanitarian crisis created by the COVID-19 pandemic and prepare themselves for the new normal.

For companies to build a competitive edge or even to maintain parity, they will need a new approach to defining, implementing, and integrating their data stacks, leveraging both cloud and new concepts and components.

This way, there are Six Key Moves to make in order to build a Game-changing Data Architecture. This enables more rapid delivery of new capabilities and vastly simplifies existing architectural approaches.

They involve almost every aspect of data management, including acquisition, processing, storage, analysis, and exposure. Although organizations can implement some adjustment while leaving their core technology stack intact, many require careful re-architecting of the existing data platform and infrastructure, including legacy and newer technologies previously bolted on.

Organizations must have a clear strategic strategy, and data and technology leaders must make courageous decisions to prioritize those changes that will have the most impact on business goals and to invest in the appropriate level of architectural sophistication. As a result, data-architecture blueprints often look very different from one company to another.

The return on investment can be substantial if done correctly. Benefits can be derived from a variety of sources: IT cost savings, productivity improvements, reduced regulatory and operational risk, and the delivery of wholly new capabilities, services, and even entire businesses.

So, what are the major changes and adjustments that enterprises must consider?

1. Cloud-based data platforms (Old-fashioned: On-premise)

Cloud is the most disruptive driver of a new data-architecture strategy, as it allows businesses to rapidly scale AI tools and capabilities for competitive advantage. Major cloud service providers such as AWS, Azure and GCP have revolutionized the way organizations deploy, and run data infrastructure, platforms, and applications at scale.

To modularize application capabilities, for example, an organization can integrate a cloud-based data platform with container technology, which holds microservices such as searching billing data or adding new features to the account. This allows the company to roll out new self-service capabilities to more business customers in days rather than months, deliver large amounts of real-time inventory and transaction data to end users for analytics, and decrease the costs by "buffering" transactions in the cloud rather than on more expensive on-premise systems.

Practical Hints:

Serverless data platforms like Amazon S3 and Google BigQuery enable businesses to design and run data-centric apps at scale without having to worry about installing and configuring software or managing workloads. Such services can reduce the amount of expertise required, reduce deployment time from weeks to minutes, and require almost no operational overhead.
Companies can decouple and automate the deployment of additional computational power and data storage systems utilizing containerized data solutions based on Kubernetes (which are available via cloud providers as well as open source and can be integrated and deployed quickly). This feature is also useful for ensuring that data platforms with more complex setups, such as those that must retain data from one application session to the next and those that require complex backup and recovery, can scale to meet demand.

2. Real-time data processing (Old-fashioned: Batch)

Real-time data messaging and streaming capabilities have become much less expensive, opening the door for widespread adoption. These technologies enable plenty of new business applications: transportation companies, for example, can provide customers with accurate-to-the-second arrival predictions as their taxi approaches; insurance companies can use real-time behavioral data from smart devices to customize rates; and manufacturers can forecast infrastructure issues based on real-time sensor data.

Real-time streaming functionalities, such as a subscription mechanism, enable data consumers, including data marts and data-driven employees, to subscribe to "topics" in order to receive a constant feed of the transactions they require. In most cases, a shared data lake serves as the "brain" for such services, storing all detailed transactions.

Practical Hints:

Messaging platforms like Apache Kafka offer fully scalable, durable, and fault-tolerant publish/subscribe services that can process and store millions of messages per second for immediate or later consumption. This enables real-time use cases to be supported, as well as bypassing existing batch-based systems and having a far lighter footprint (and cost base) than traditional enterprise messaging queues.
Apache Kafka Streaming, Apache Flume, Apache Storm, and Apache Spark Streaming are examples of streaming processing and analytics solutions that enable for real-time direct message analysis. This analysis might be based on rules or use advanced analytics to extract events or signals from the data. Analysis frequently incorporates historical data in order to compare patterns, which is particularly important in recommendation and prediction engines.
Alerting platforms like Graphite or Splunk can trigger business actions to users, such as warning salespersons by sending a notification if they don't reach their daily sales targets, or integrate these actions into existing processes in ERP or CRM systems.

3. Modular platforms (Old-fashioned: Pre-integrated commercial solutions)

Businesses must frequently go beyond the limitations of legacy data ecosystems provided by large solution vendors to scale applications. Many organizations are increasingly adopting a highly modular data architecture that incorporates best-of-breed and, in many cases, open-source components that can be replaced with new technologies as needed without affecting other aspects of the data architecture.

This strategy can be used by a company to swiftly deploy new, data-heavy digital services to millions of users and to integrate cloud-based apps at scale. For example, it provides accurate daily views of consumer behavior. The firm can establish an independent data layer that comprises both commercial databases and open-source components. Data is synced with back-end systems via a proprietary enterprise service bus, and business logic on the data is implemented by microservices hosted in containers.

ITC Infotech 1 年前

Building a Dashboard KPI with Advanced Technologies:…

Mihir Kumar Jhaveri, PMP 3 周前

2024 Big Data Trends

ACI INFOTECH 9 个月前

Practical Hints:

Data pipelines and API-based interfaces make it easier to integrate diverse tools and platforms by sheltering data teams from the complexities of the various layers, speeding time to market, and lowering the risk of causing new issues in current applications. Individual components can also be replaced more easily when requirements change thanks to these interfaces.
Amazon Sagemaker and Kubeflow are two analytics workbenches that enable it to construct end-to-end solutions in a highly modular architecture. These tools may link to a wide range of databases and services, allowing for a highly modular architecture.

4. Decoupled data access (Old-fashioned: Point-to-point)

Data can be exposed via APIs to limit and secure direct access to view and alter data while also providing faster, up-to-date access to common data sets. This enables data to be easily reused across teams, expediting access and facilitating seamless cooperation among analytics teams, allowing for more rapid development of AI use cases in an efficient way.

Instead of depending on proprietary interfaces, one pharmaceutical business is setting up an internal "data marketplace" for all employees using APIs to simplify and standardize access to core data assets.

Practical Hints:

To design and publish data-centric APIs, establish usage policies, control access, and analyze usage and performance, an API management platform (also known as an API gateway) is required. Developers and users can also use this platform to look for existing data interfaces and reuse them rather than creating new ones. An API gateway is frequently implemented as a separate zone within a data hub, but it can also be built as a stand-alone capability.
It's common to need a data platform to "buffer" transactions outside of core systems. Such buffers could be provided by central data platforms like a data lake or in a distributed data mesh, which is an ecosystem made up of best-fit platforms (including data lakes and data warehouses) developed for each business domain's projected data usage and workloads. A firm can use a structured database to give customer information, such as their most recent financial transactions, to online and mobile banking applications directly, reducing costly mainframe workloads.

5. Domain-based architecture (Old-fashioned: Enterprise warehouse)

To enhance time to market for new data products and services, several data-architecture leaders have shifted away from a central enterprise data lake toward "domain-driven" solutions that can be customized and "fit for purpose".? While the data sets may still exist on the same physical platform, "product owners" in each business domain (for example, marketing, sales, manufacturing, and so on) are made responsible with organizing their data sets in an easily consumable manner for both internal users and downstream data consumers in other business domains. This approach requires a careful balance to avoid becoming fragmented and inefficient, but in return it can reduce the time spent up front on building new data models into the lake, often from months to just days, and can be a simpler and more effective choice when mirroring a federated business structure or adhering to regulatory limitations on data mobility.

A distributed domain-based architecture can be employed so that sales and operations employees could provide customer, order, and billing data to data scientists for use in AI models or to customers directly via digital channels. Rather than creating a single central data platform, the firm established logical platforms that are managed by product owners inside the sales and operations departments. Product owners are motivated to promote the use of data for analytics, and to drive adoption, they are doing it through digital channels, forums, and hackathons.

Practical Hints:

Data infrastructure as a platform provides common features and resources for storage and management, making it easier to deploy and removing the responsibility of establishing one's own data-asset platform from data producers.
Data virtualization techniques, which were first employed in specialized areas like customer data, are increasingly being applied throughout companies to manage access to and integrate distributed data assets.
Data cataloging tools allow for business data search and exploration without the need for full access or preparation. To make access to data assets easier, the catalog usually includes metadata definitions and an end-to-end interface.

6. Flexible, extensible data schemas (Old-fashioned: Rigid data models)

To reduce redundancy, software providers' predefined data models and proprietary data models that satisfy specific business-intelligence objectives are frequently designed in highly normalized schemas with rigid database tables and data elements. While this method is still the standard for reporting and regulatory use cases, it also necessitates extensive lengthy development cycles and a deep understanding of the system for incorporating new data elements or data sources, as any modifications can compromise data integrity.

Companies are transitioning to "schema-light" approaches, which use denormalized data models with fewer physical tables to structure data for maximum performance, to gain greater flexibility and a powerful competitive edge when exploring data or supporting advanced analytics. This approach has a number of advantages, including faster data exploration, more flexibility in storing structured and unstructured data, and reduced complexity, as data leaders no longer need to add additional abstraction layers to query relational data, such as multiple "joins" between highly normalized tables.

Practical Hints:

Techniques like data-point modeling in data vault 2.0 can ensure that data models are extensible, allowing data elements to be added or removed in the future with minimal disruption.
In recent years, graph databases, a sort of NoSQL database, have gotten a lot of interest. Because of the ability to tap into unstructured data, NoSQL databases are perfect for digital applications that demand large scalability and real-time capabilities, as well as data layers serving AI applications. Many organizations are utilizing graph databases to develop master data repositories to support changing information models, as they give the capacity to model relationships within data in a powerful and flexible manner.
Azure Synapse Analytics, as a technology service for example, allows querying file-based data like a relational database by dynamically applying table structures to the files. This provides users the ability of continuing to utilize common interfaces like SQL while accessing data stored in files.
Organizations can modify database structures without changing business information models by storing data in JSON.

The Next Step: How To Start

Traditional efforts to define and build toward three-to-five-year goal architectural states are both risky and inefficient since data technologies are rapidly developing. Data and technology leaders will benefit most from adopting procedures that allow them to quickly evaluate and deploy new technologies in order to adapt. In this case, four practices are critical:

Experiment with different components and concepts by applying a test-and-learn approach when it comes to architecture construction. Agile approaches have been used in application development for a long time and have only just made their way into the data world. Instead of engaging in lengthy discussions about optimal designs, products, and vendors to identify the "perfect" choice, leaders can start with smaller budgets and create minimum viable products (MVP) or string together existing open-source tools to create a provisional product, releasing them into production (using cloud to accelerate) to demonstrate their value before expanding and evolving further.
Create data "tribes", in which teams of data stewards, data engineers, and data modelers collaborate with end-to-end accountability to build the data architecture. These tribes are also working to establish standard, repeatable data- and feature-engineering processes to assist the development of highly curated data sets that may be used in modeling. These agile data practices can help new data services get to market faster.
Invest in DataOps (enhanced DevOps for data), which can help teams quickly implement and update solutions based on feedback by accelerating the development and deployment of new components into the data architecture.
Create a data culture in which employees are motivated to use and apply new data services in their work. Ensuring that data strategy is tied to business goals and represented in the C-level's messages to the company, which can reinforce the value of this effort to business teams, is one important method to achieve this.

要查看或添加评论，请登录

查看全部

Design your Modern Data Architecture by Making These Six Fundamental Shifts

Reza K.

1. Cloud-based data platforms (Old-fashioned: On-premise)

2. Real-time data processing (Old-fashioned: Batch)

3. Modular platforms (Old-fashioned: Pre-integrated commercial solutions)

领英推荐

4. Decoupled data access (Old-fashioned: Point-to-point)

5. Domain-based architecture (Old-fashioned: Enterprise warehouse)

6. Flexible, extensible data schemas (Old-fashioned: Rigid data models)

The Next Step: How To Start

更多精彩文章

社区洞察

其他会员也浏览了

Unlocking Data Potential: Semantic Layers, Metric Stores, DataMart and Data Mesh in Modern Enterprise Data Platforms

Data Mesh with Snowflake AI Recipes

What’s Shaping the Industry Right Now: Data Engineering Tech Trends to Watch in 2024

From Chaos to Control: How Dagster Unifies Orchestration and Data Cataloging

The anatomy of an active metadata platform, bringing data analysts to the table, mapping data journey with column lineage, and more

Delta Live Tables Series — Part 3 — Data Lineage and Dependency Management

Building Scalable Data Pipelines: Key Architectural Choices for High-Performance Solutions

Why Does Successful AI Require The Right Data Architecture

Featured Consultant of the Week: Ryan Oliveira, Sr. Consultant at Solutia Consulting Microsoft Fabric: Transforming Data Ecosystems

Unlocking Business Insights with CCS’s Data Engineering Services

1. Cloud-based data platforms (Old-fashioned: On-premise)

2. Real-time data processing (Old-fashioned: Batch)

3. Modular platforms (Old-fashioned: Pre-integrated commercial solutions)

领英推荐

4. Decoupled data access (Old-fashioned: Point-to-point)

5. Domain-based architecture (Old-fashioned: Enterprise warehouse)

6. Flexible, extensible data schemas (Old-fashioned: Rigid data models)

The Next Step: How To Start

Building a Robust Data Framework and Data Governance: Solutions for Big Data Challenges in Financial Ecosystems

2024年11月5日

Understanding the Challenge: Why Identity Columns Work in Snowflake and Azure Synapse, But Not in Microsoft Fabric

2024年10月7日

Overcoming Identity Column Limitations in Microsoft Fabric Warehouse: Exploring Alternative Architectural Approaches

2024年10月4日

Avoiding AI Pitfalls: Recognizing Where AI May Not Benefit Your Organization

2024年7月16日

Integrating AI and Machine Learning into Your Data Platform Architecture: An Encompassing Guide

2024年7月2日

The Power of Declarative Machine Learning: Revolutionizing Organizational Efficiency

2023年6月6日

How a company can adapt to changes, comprehensively?

2020年2月5日

社区洞察

其他会员也浏览了

Unlocking Data Potential: Semantic Layers, Metric Stores, DataMart and Data Mesh in Modern Enterprise Data Platforms

Data Mesh with Snowflake AI Recipes

What’s Shaping the Industry Right Now: Data Engineering Tech Trends to Watch in 2024

From Chaos to Control: How Dagster Unifies Orchestration and Data Cataloging

The anatomy of an active metadata platform, bringing data analysts to the table, mapping data journey with column lineage, and more

Delta Live Tables Series — Part 3 — Data Lineage and Dependency Management

Building Scalable Data Pipelines: Key Architectural Choices for High-Performance Solutions

Why Does Successful AI Require The Right Data Architecture

Featured Consultant of the Week: Ryan Oliveira, Sr. Consultant at Solutia Consulting Microsoft Fabric: Transforming Data Ecosystems

Unlocking Business Insights with CCS’s Data Engineering Services