登录查看更多内容

Metadata-Driven (MDD) Framework Meets Data Warehouse Automation

Koustubh Dhopade

High-Impact Leader, Innovator in implementing Modern Data Platforms & Programs, Data evangelist, Mining & Engineering, Solution architect and Machine learning enthusiast

发布日期: 2022年6月1日

Data has been the fastest growing currency of the future that outrivals the value of the likes of oil and gold in today’s insight-driven world. To maximize its utility and drive powerful business decisions, you have data warehouses that sit at the centre of most organizations’ analytics strategy. Speed and accuracy of information are of the essence when you need to make game-changing decisions in the competitive landscape.

Since data warehouses have a lot more moving parts, it makes sense to have automated processes in place that enable IT teams to deliver actionable insights at the speed of business, considering its overall wait period to get the platform ready. And this is where the concept of configuration, automation comes in. All the modern tech-stacks endorse automation mindset shift. With digital transformation, it is imperative to ensure a faster data-to-value journey, giving more room for innovation, and makes work more purpose-oriented and enjoyable for your IT team.

After?data modeling, perhaps the most time-consuming part is writing ETL/ELT code for populating your data warehouse. Automation introduces developers to the zero-code or low-code(configurable), where they work at a logical level (design level) to create the integration flows. This means that IT teams no longer have to fight SQLs and can get the data from source systems to the destination warehouse in hours or days, to bring in value for business.

Metadata in digital systems is abundant and permeating. The role of the records and information management is to identify what metadata in business applications, systems and cloud environments is necessary for the creation, capture and management of authoritative records and information used for operational as well as orchestration activities. We did implement the metadata driven architecture(Operational & technical Metadata) for domains like Banking, Security, Retail, Logistics, Life Sciences & eventually brought to Insurance.

In a data warehouse, metadata can be many things, like data types, data formats, source and destination database tables, entity relationships, SCD patterns, and ETL mappings and transformations, and more. As such, a?metadata-driven architecture?allows you to bring source database schema into a data model, customize its structure based on your business requirements, and make the data model available for subsequent processes, such as data analytics. When the metadata-driven approach is coupled with automation, they become the perfect partners that streamline design, development, and deployment, leading to a robust data warehouse implementation. Such combination provides IT teams with everything they need to formulate agile and sustainable processes that help deliver high-quality outputs consistently. This becomes handy in data drifts as well.

In order to be authoritative, metadata should possess:

a description of the content of records and information
the structure of records and information
the business context in which records and information were created or received and used
relationships with other records, information and metadata
business actions and events
information that may be needed to retrieve and present records and information

The effective implementation of metadata is based on the following principles we learnt while developing configurable design/architecture for data management platforms on-premise or on-cloud:

Metadata requirements should be considered as part of process: identify what metadata is necessary for the creation, capture and management of orchestration of data management jobs, and what metadata supports organisational auditing and business lineage.
Metadata is scalable: determine which levels of metadata can best meet its various business needs & to what level of depth.
Metadata should be described, stored and managed: use metadata schemas and encoding schemes to promote the entry of meaningful, standardised and consistent metadata.
Metadata is dynamic and grows over time: be aware that records and information will continue to accrue metadata throughout their existence.
Metadata should be persistently linked with records and information, so as to generate reports on the top of: ensure that metadata is linked with the records and information to which it relates when they are transferred out of their original creating environment and through subsequent migrations.
Metadata should be managed as a record: document how it has configured and applied the metadata in its systems.

The idea in this blog is divided into two categories.??Principles?are those concepts judged to be common to all domains of metadata and which might inform the design of any metadata schema or application.??Practicalities?are the rules of thumb, constraints, and infrastructure issues that emerge from bringing theory into practice in the form of useful and sustainable systems.?

领英推荐

Data reliability: All along the pipeline

Dr. RVS Praveen Ph.D 1 年前

(New Project) Build an ETL service pipeline

Atul Kumar 10 个月前

The Definitive Guide to Reverse ETL: Unlocking the…

Sanjay K Mohindroo. 6 个月前

Principles:

A. Modularity : Metadata modularity is a key organizing principle for environments characterized by vastly diverse sources of content, styles of content management, and approaches to resource description. It allows designers of metadata schemas to create new assemblies based on established metadata schemas and benefit from observed best practice, rather than reinventing elements anew.

B. Extensibility : Metadata systems must allow for extensions so that particular needs of a given application can be accommodated. Some metadata elements are likely to be found in most metadata schemas (the concept of?creator?or?identifier?of an information resource, for example). Others will be specific to particular applications or domains (degree of cloud cover,?for example, in remote sensing data).

C. Refinement : Application domains will differ according to the degree of detail that is necessary or desirable. The design of metadata standards should allow schema designers to choose a level of detail appropriate to a given application. Populating databases with metadata is costly, so there are strong economic incentives to create metadata with sufficient detail to meet the functional requirements of an application, but not more.?

Practicalities:

A. Application Profiles: No single metadata element set will accommodate the functional requirements of all applications, and it becomes increasingly important to be able to also cross discovery boundaries. Application profiles will facilitate this by allowing designers to 'mix and match' schemas as appropriate. Application Profiles achieve this modularity through Cardinality enforcement:?Cardinality refers to constraints on the appearance of an element. Is it optional? Mandatory? Conditional??

B. Syntax and Semantics: Semantics is about meaning; syntax is about form. Agreements about both are necessary for two development communities or different departments or LoBs to share metadata. Two communities may agree about the meaning of the term title or creator or identifier, but until they have a shared convention for identifying and encoding values, they cannot easily exchange their metadata. This will help to standardise the data across the organization & saves lot of integration challenges.?

Kudos to all the data & quality management team members - Swati, Nainish, Joydeep, Thanga, Aarti, Dhiraj, Tanya, Yogita, Sadhana, Aaditya, Vaishali, Lalchand, Harshal, Kalai, Madhuri, Madhura, Pooja, Shubhada, Vini & all. Thank you for making it a memorable journey.

Conclusion:

Saama MDD(metadata driven) framework simplifies and automates data warehouse development end-to-end, using the agile metadata-driven approach. The product fetches metadata directly from source databases and allows you to utilize it in the design, development, and deployment phases of your data warehouse. Once implemented, introducing changes to the design is easy as the captured metadata allows you to propagate changes across the board while ensuring the integrity of existing models, integration flows, and deployments.

Want to see the power of the metadata-driven approach and how these two technologies in action together? Reach out [email protected] for more details.

SEARCH "JOB Listings" & "Job Resources"

2 年

Wonderful!

1 次回应

Cloudoniq Technologies Pvt Ltd

2 年

Very meaningful insights

2 次回应

Discover Talent?

2 年

2 次回应

Vaibhav Karpe

2 年

Kudos to whole team ??

1 次回应

Prajakta Borkar

Solutions Architect at Snowflake - The Data Cloud

2 年

Awesome KD ! Data insights is real focus to enable more data points which was challenging in previous life :) but now it is at ease of cloud platform and reusablity feature.

1 次回应

查看更多评论

要查看或添加评论，请登录

Koustubh Dhopade的更多文章

How to accelerate time to value with predictive data quality and observability

2023年3月29日

How to accelerate time to value with predictive data quality and observability

Trusted decisions are driven by high-quality data. However, ensuring constant access to high-quality data is difficult,…

3 条评论
Key Steps to DataOps Success

2021年11月29日

Key Steps to DataOps Success

Enterprises face ongoing pressure to rapidly release new products and capabilities to meet market demands. Outdated…

1 条评论
How Leaders Get Through Tough Times

2021年9月14日

How Leaders Get Through Tough Times

Post pandemic, IT industry has flourished with tremendous opportunities. Demand has surged in the market & helped…

1 条评论
What is Dremio? How it can be used?

2021年9月14日

What is Dremio? How it can be used?

With the evolution of new buzz word Big Data, Cloud and way of handling data in much better way in terms of performance…

2 条评论
Building Modern Data Architecture, Keep these things in Mind !!

2021年9月9日

Building Modern Data Architecture, Keep these things in Mind !!

The database world is changing fast, and may see some more in coming years as snowflake has given breakthrough to…

7 条评论
Data Engineering Design Principles : Holds true even in modern era

2021年6月11日

Data Engineering Design Principles : Holds true even in modern era

There are many factors to consider when designing data pipelines, which include disparate data sources, dependency…
Automated ETL vs Manual Coding : Which is better choice in Data Management?

2021年5月24日

Automated ETL vs Manual Coding : Which is better choice in Data Management?

Lot of technical programmer have written in this space. I have been instrumental consistently designing legacy to…
How we balance individual autonomy with accountability to make progress on our project's ambitious goals

2021年5月22日

How we balance individual autonomy with accountability to make progress on our project's ambitious goals

I happened to be in delivery engagement in 2018 after a stint in Sales & marketing team for 15 months. I accidentally…

2 条评论
Concurrent Employment: is it Legal, Radical or cost effective??

2021年5月8日

Concurrent Employment: is it Legal, Radical or cost effective??

CNN has an article that I find extremely interesting… some employees are working two jobs at the same time. This…
Metadata Management on Raw Data Vault

2021年4月26日

Metadata Management on Raw Data Vault

In many of our projects, clients ask us how to track metadata, “the data about data.” While there are some solutions…

See all articles

Metadata-Driven (MDD) Framework Meets Data Warehouse Automation

Koustubh Dhopade

High-Impact Leader, Innovator in implementing Modern Data Platforms & Programs, Data evangelist, Mining & Engineering, Solution architect and Machine learning enthusiast

领英推荐

Koustubh Dhopade的更多文章

社区洞察

其他会员也浏览了

Data Integration and Interoperability

ZERO-ETL

Data Extraction

Data Vault

Real-time vs. Batch Data Integration: Differences, Use Cases, Pros, and Cons

Application & Data Integration Strategy: Bridging Theory, Practice, and Real-World Challenges

Mastering ETL Processes: The Backbone of Data Integration

Enterprise AI Technology Stack -- Data Management and Governance ( Part 3 of 8)

What is Data Extraction?

What is Data Extraction?

领英推荐

Koustubh Dhopade的更多文章

How to accelerate time to value with predictive data quality and observability

Key Steps to DataOps Success

How Leaders Get Through Tough Times

What is Dremio? How it can be used?

Building Modern Data Architecture, Keep these things in Mind !!

Data Engineering Design Principles : Holds true even in modern era

Automated ETL vs Manual Coding : Which is better choice in Data Management?

How we balance individual autonomy with accountability to make progress on our project's ambitious goals

Concurrent Employment: is it Legal, Radical or cost effective??

Metadata Management on Raw Data Vault

社区洞察

其他会员也浏览了

Data Integration and Interoperability

ZERO-ETL

Data Extraction

Data Vault

Real-time vs. Batch Data Integration: Differences, Use Cases, Pros, and Cons

Application & Data Integration Strategy: Bridging Theory, Practice, and Real-World Challenges

Mastering ETL Processes: The Backbone of Data Integration

Enterprise AI Technology Stack -- Data Management and Governance ( Part 3 of 8)

What is Data Extraction?

What is Data Extraction?