登录查看更多内容

Boost BI & Development with Metadata-Driven ETL Framework:Data First, Insight-Lead, Organisations with Metadata-Driven Data ETL Architecture.

Ryan Julyan

Senior Data Engineer @ BSG | Data and Analytics Assets | Senior Software Developer

发布日期: 2023年5月30日

+ 关注

Abstract:

All businesses must quickly, effectively, and efficiently source and report the data that drives daily operations.

In today’s economy, a key trend for Insurance (Health, Property and Casualty, Life), Healthcare (Hospitals, Clinics, Doctors Groups), and Other Financial Services (Banking and Investment Management) organisations are to continue to grow and adapt, where success is measured in speed. Analysts and decision-makers can make faster actionable insights, the greater the chance that information can be translated into value for the organisation.

To implement an agile approach, organisations must reconsider the divided handoffs from one division to another, preventing open communication between business users, developers, and architects that create traditional Enterprise Data Warehouse (EDW) development — instead, a suitable method of incorporating metadata to shorten and create iterative development cycles.

Why Metadata-Driven Architecture:

Organisations must provide standard, accessible, and generic services and data analytics, enabling each team level to report on relevant data in a near real-time manner. However, traditional methods can limit organisations from growing and adapting because they are limited to technology or some predefined structures that are too difficult or expensive to change, not to mention the challenges of integrating business systems and data from similar organisations or divisions.

It becomes easier and more efficient to achieve the goal of providing standard, accessible, near real-time services and reports on relevant data by implementing a flexible Enterprise Data Warehouse (EDW)/Analytics Architecture that implements a metadata-driven Extract, Transform and Load (ETL) framework and a simplified Service-oriented architecture (SOA) services approach, providing visibility of the procedures and processes in the Enterprise Data Warehouse (EDW)/Analytics Architecture by allowing all users in an organisation to quickly manage and control their data without going into the code itself, reducing the dependency on strong technical knowledge to deliver business solutions.

Definition of Metadata-Driven Data ETL Architecture:

Extracting the data required, traditionally, meant using a variety of tools. However, using those tools effectively requires strong technical knowledge and experience with each Software Vendor’s toolset. The dependency on solid technical understanding means integrating new data sources becomes challenging, often requiring complicated, time-consuming and error-prone customisation.

Metadata-driven ETL frameworks simplify the technology by abstracting layers, establishing an easier-to-use and more flexible method to implement new data sources. This means that the learning curve is simpler to understand and easier to adopt while reducing implementation time.

Metadata-driven ETL frameworks create templates for data structures (including structural artefacts with entity relationships and data formats that define the architecture of the EDW), data migration controls, exception/error handling, and rules management. In addition, transformation and integration rules can be created via templates made available in accessible, ubiquitous tools such as Excel spreadsheets by non-technical, domain specialist users.

Similarly, data source locations, schemas, error handling logic, and job control parameters can be stored in physical configuration files that the Framework can easily maintain and process to generate executable ETL jobs. Data schemas should be created using data models to build the schema dynamically, allowing different data storage engines to be used without breaking functionality. A configuration file can generate the code for the specific storage engine.

A metadata repository stores and manages the data structures, data migration controls, exception/error handling, and business rules. There are several challenges with sharing and administering the metadata. As such, different architectures have emerged to combat these challenges. The three most common metadata architectural approaches are:

Centralized Metadata Management
Distributed Metadata Management
Hybrid Centralised

How to implement:

The simple configuration and rule-based implementation streamline loading data into traditional EDWs and make data available sooner for analytics, reporting, and use by other applications. This is achieved by removing bottlenecks of highly niche technical staff and allowing domain specialists to create and maintain their pipelines. This allows for the technical team to support standardised and is easy to review code for their platform while providing the non-technical staff with the ability to create flexible and streamlined applications that can be quickly assembled to provide end users access to the data through Portals, Dashboards, and other applications, without the technical overhead, by providing the capability to replicate and add new data sources and business logic, quickly and effectively.

A?metadata-driven ETL framework?provides an abstraction layer making it trivial to define and reuse mappings, define multiple sources and targets of data, and easy to define and reuse transformation rules. As a result, a metadata-driven ETL framework makes it faster and easier to process, load, and transform data by managing the EDW environments and quickly replicating standard reporting services and applications.

Regardless of the application’s architecture, code can be generated for each layer by using suitable metadata attributes. This means an “n-tier architecture” is achievable and can be augmented by a metadata-driven ETL framework. Each layer from the UI layer, service layer, persistence layer, data access layer and storage layer can be augmented using layer-wise patterns and practices.

Advantages:

A metadata-driven ETL framework provides an approach for standardising incoming and outgoing data by simplifying complicated processes. The uniform generic way of data ingestion makes it easy to review existing configurations or add new configurations. A metadata-driven ETL framework provides unique agility in developing or changing configurations. Changes typically would not require any code providing scalability since new sources, configurations, and environments are created by creating meta-data and configuration rules. Maintenance time and effort are reduced by the ease and accessibility of the meta-data and configurations, from business logic to data flow, through ubiquitous tools such as Excel spreadsheets.

领英推荐

Introduction to ETL/ELT (Part 1)

Data & Analytics 1 年前

Top ETL Tool for 2024-Make the best choice to achieve…

Lyftrondata 2 个月前

How to Master ETL Processes for Clean and Usable Data

Muhammad Ishtiaq Khan 1 个月前

Many of the advantages listed assume a no-compilation architecture, meaning the elements can be loaded at runtime. Any functionality designed can be instantly previewed and published, making it available to the end users and testers sooner and without delays. This means users can benefit from the systems and data immediately, are empowered and can take ownership of delivering insight-led decisions.

Metadata-driven ETL frameworks do not need to replace one’s existing ETL platforms. Instead, a metadata-driven ETL framework can be an accelerator or code generator for rapid development in the native ETL platform. Furthermore, since the metadata-driven ETL framework provides the configuration and instructions, translating the configurations into functional code allows existing platforms to be used and extended without much impact.

Performance Concerns:

Generating code from a configuration does not mean that the code is inefficient.

Iterations and reviews of changes can be given more focus, allowing similar tasks to be grouped. This can be achieved because users can access the information they require freeing the technical staff to review and monitor the system instead of delivering on business functionality and reports. This process improvement should be worked into the life-cycle and approval process (which can be systemised and, in some instances, automated). These processes can be put in place to protect sensitive data (considering GDPR and POPIA) by ensuring that only people who require the data are getting access to this data and ensuring that business users do not put forward the exact requests for the same data. In addition, these processes can vastly improve the performance times of business logic, bringing the system much closer to real-time communication and reports. Thus allowing all users in an organisation to make far more effective, insight-led business decisions.

Implementation Concerns:

With every user in an organisation now having the ability to update and access the information from the EDW, managing user access to the specific information they should and should not be able to access is critical. This access should not be limited to the design of the reports but rather to the ETL system at runtime since the rules and data are decoupled. Because each metadata point can now be accessed, more granular attribute-based user access rights should be implemented. These access rights could be initially controlled and set up at a role level and inherited by users, but allowing a more granular control will become paramount to the success of the metadata-driven ETL framework. These permissions should extend from the standard Create/Read/Update/Delete (CRUD) permissions to include a data entity access level (potentially even an attribute in a data entity) as well as elements on a form, as such, an attribute-based authentication lends itself to work well with the metadata-driven ETL framework ecosystem.

Other concerns of users not creating complete/accurate or optimised rules as the templated structures tools such as Excel spreadsheets, which do not enforce data integrity. The metadata-driven ETL framework should allow for a pre-process validation of the configuration and rule-based information to prevent long run times and potentially break queries from running inside the EDW.

Metadata-driven ETL frameworks are not an all-or-nothing approach. A hybrid system will provide the ability to reverse engineer and generate dynamic code where applicable while allowing for customisations when needed. This means the dependency on the Metadata-driven ETL framework should not be a vendor lock nor prevent growth and development in IT or business areas.

A good metadata-driven ETL framework will implement version control, meaning every change to the metadata files is archived (historical versions of the metadata files are archived), enabling rollbacks when necessary.

Conclusion:

Metadata-driven ETL frameworks provide visibility of the procedures and processes in the Enterprise Data Warehouse/Analytics Architecture by allowing all users in an organisation to quickly manage and control their data without going into the code itself, reducing the dependency on strong technical knowledge to deliver business solutions.

Metadata-driven ETL frameworks create templates for data structures, data migration controls, exception/error handling, and rules management. The simple configuration and rule-based implementation streamline loading data into traditional EDWs and make data available sooner for analytics, reporting, and use by other applications. In addition, data source locations, schemas, error handling logic, and job control parameters can be stored in physical configuration files that the Framework can easily maintain and process to generate executable ETL jobs.

A metadata-driven ETL framework provides the following:

Uniformity & standardization
Agility & flexibility
Easy to scale
Maintainability
Acceleration & code generation

Using a metadata-driven ETL framework over traditional development is estimated to reduce time to deployment by an estimated 30% when integrating new data sources into a data warehousing and business analytics environment. In addition, using metadata-driven ETL frameworks allows all users in an organisation to quickly manage and control their data without going into the code itself, reducing the dependency on strong technical knowledge to deliver business solutions.

References:

Introduction to Metadata-Driven Data Architecture. (2021, July 14). Retrieved from?https://www.astera.com/type/blog/introduction-to-metadata-architecture/
Meta Data Driven Architecture. (n.d.). Retrieved from?https://itabok.iasaglobal.org/portfolio-items/meta-data-driven-architecture-1/
Metadata Driven Architecture For Application Development. (2019, October 14). Retrieved from?https://www.claysys.com/blog/metadata-driven-application-development/
The Value of Metadata-driven ETL Frameworks and Simplified SOA Services. (2020, February 18). Retrieved from?https://rcgglobalservices.com/the-value-of-flexible-etl-frameworks-and-soa-services/

要查看或添加评论，请登录

Ryan Julyan的更多文章

Leveraging Data Engineering for Business Insights: The Unseen Catalyst of Modern Business Success

2024年7月12日

Leveraging Data Engineering for Business Insights: The Unseen Catalyst of Modern Business Success

Introduction to Data Engineering and Its Importance Imagine if instinct rather than data guided your business…
AI Ethics: Navigating the Ethical Challenges in AI Development

2024年6月28日

AI Ethics: Navigating the Ethical Challenges in AI Development

The Perilous Path of AI Ethics: Are We Ready? Artificial Intelligence (AI) has been heralded as the ultimate…
The Importance of Data Governance in Business

2024年6月21日

The Importance of Data Governance in Business

In today's data-driven world, businesses collect, store, and analyse more data than ever. This data holds the potential…
A Recap of AWS's Recent GenAI Excellence Showcase

2024年2月15日

A Recap of AWS's Recent GenAI Excellence Showcase

On February 13, 2024, Amazon Web Services (AWS) hosted an exclusive event titled "Elevating Possibilities with Partners…

7 条评论
Psychological safety - Part 2 - building a psychologically safe environment

2022年5月17日

Psychological safety - Part 2 - building a psychologically safe environment

The mass resignations from the workforce, referred to as "the Great Resignation", have made us all pause and wonder…
Psychological safety - Part 1 - What is psychological safety in the workplace

2022年4月16日

Psychological safety - Part 1 - What is psychological safety in the workplace

Getting recognition for the hard work that went into a big launch or a small milestone is very important to employees…
Why Classify Your Inventory? Part 4/4

2021年3月17日

Why Classify Your Inventory? Part 4/4

K-Means clustering In Part 1 of Why Classify Your Inventory, we discussed the challenges of dealing with inventory and…

1 条评论
Why Classify Your Inventory? Part 3/4

2021年3月10日

Why Classify Your Inventory? Part 3/4

Predictable forecasts with relative deviation (XYZ classification) In Part 1 of "Why Classify Your Inventory", we…
Why Classify Your Inventory? Part 2/4

2021年3月3日

Why Classify Your Inventory? Part 2/4

The Pareto principle - 80/20 rule (ABC classification) Stop wasting your time, efforts, staff and resources focusing on…

2 条评论
What Google Hashcode 2021 Taught Me About Management

2021年2月27日

What Google Hashcode 2021 Taught Me About Management

Valuable lessons I learned in 4 hours to delivery from Google Hashcode 2021 What is Google Hashcode? Google Hash Code…

3 条评论

See all articles

Boost BI & Development with Metadata-Driven ETL Framework:Data First, Insight-Lead, Organisations with Metadata-Driven Data ETL Architecture.

Ryan Julyan

Senior Data Engineer @ BSG | Data and Analytics Assets | Senior Software Developer

Abstract:

Why Metadata-Driven Architecture:

Definition of Metadata-Driven Data ETL Architecture:

How to implement:

Advantages:

领英推荐

Performance Concerns:

Implementation Concerns:

Conclusion:

References:

Ryan Julyan的更多文章

社区洞察

其他会员也浏览了

ETL with Mage is like the secret sauce that helps you squeeze out the full flavor of your data's potential.

The Must-Have ETL Tools to Unleash Data Warehousing Potential in 2023

Unlocking Data Gold: Choosing the Right ETL Tool to Transform Analytics and Data Science

?? Integrations Unlocked: ETL Pipelines (Part 3) ??

The Evolution of ETL (Extract, Transform, Load) Processes: A Journey from Simplicity to Innovation

Now Playing: Data Warehousing ft. ETL/ELT Pipelines

A Comprehensive Guide to ETL: Architecting Data Pipelines for the Modern Enterprise

Mastering the Art of ETL: Overcoming Challenges and Maximizing Efficiency

ETL or ELT?

Mastering ETL Processes: The Backbone of Data Integration

Abstract:

Why Metadata-Driven Architecture:

Definition of Metadata-Driven Data ETL Architecture:

How to implement:

Advantages:

领英推荐

Performance Concerns:

Implementation Concerns:

Conclusion:

References:

Ryan Julyan的更多文章

Leveraging Data Engineering for Business Insights: The Unseen Catalyst of Modern Business Success

AI Ethics: Navigating the Ethical Challenges in AI Development

The Importance of Data Governance in Business

A Recap of AWS's Recent GenAI Excellence Showcase

Psychological safety - Part 2 - building a psychologically safe environment

Psychological safety - Part 1 - What is psychological safety in the workplace

Why Classify Your Inventory? Part 4/4

Why Classify Your Inventory? Part 3/4

Why Classify Your Inventory? Part 2/4

What Google Hashcode 2021 Taught Me About Management

社区洞察

其他会员也浏览了

ETL with Mage is like the secret sauce that helps you squeeze out the full flavor of your data's potential.

The Must-Have ETL Tools to Unleash Data Warehousing Potential in 2023

Unlocking Data Gold: Choosing the Right ETL Tool to Transform Analytics and Data Science

?? Integrations Unlocked: ETL Pipelines (Part 3) ??

The Evolution of ETL (Extract, Transform, Load) Processes: A Journey from Simplicity to Innovation

Now Playing: Data Warehousing ft. ETL/ELT Pipelines

A Comprehensive Guide to ETL: Architecting Data Pipelines for the Modern Enterprise

Mastering the Art of ETL: Overcoming Challenges and Maximizing Efficiency

ETL or ELT?

Mastering ETL Processes: The Backbone of Data Integration