Metadata-Driven (MDD) Framework Meets Data Warehouse Automation
Koustubh Dhopade
High-Impact Leader, Innovator in implementing Modern Data Platforms & Programs, Data evangelist, Mining & Engineering, Solution architect and Machine learning enthusiast
Data has been the fastest growing currency of the future that outrivals the value of the likes of oil and gold in today’s insight-driven world. To maximize its utility and drive powerful business decisions, you have data warehouses that sit at the centre of most organizations’ analytics strategy. Speed and accuracy of information are of the essence when you need to make game-changing decisions in the competitive landscape.
Since data warehouses have a lot more moving parts, it makes sense to have automated processes in place that enable IT teams to deliver actionable insights at the speed of business, considering its overall wait period to get the platform ready. And this is where the concept of configuration, automation comes in. All the modern tech-stacks endorse automation mindset shift. With digital transformation, it is imperative to ensure a faster data-to-value journey, giving more room for innovation, and makes work more purpose-oriented and enjoyable for your IT team.
After?data modeling, perhaps the most time-consuming part is writing ETL/ELT code for populating your data warehouse. Automation introduces developers to the zero-code or low-code(configurable), where they work at a logical level (design level) to create the integration flows. This means that IT teams no longer have to fight SQLs and can get the data from source systems to the destination warehouse in hours or days, to bring in value for business.
Metadata in digital systems is abundant and permeating. The role of the records and information management is to identify what metadata in business applications, systems and cloud environments is necessary for the creation, capture and management of authoritative records and information used for operational as well as orchestration activities. We did implement the metadata driven architecture(Operational & technical Metadata) for domains like Banking, Security, Retail, Logistics, Life Sciences & eventually brought to Insurance.
In a data warehouse, metadata can be many things, like data types, data formats, source and destination database tables, entity relationships, SCD patterns, and ETL mappings and transformations, and more. As such, a?metadata-driven architecture?allows you to bring source database schema into a data model, customize its structure based on your business requirements, and make the data model available for subsequent processes, such as data analytics. When the metadata-driven approach is coupled with automation, they become the perfect partners that streamline design, development, and deployment, leading to a robust data warehouse implementation. Such combination provides IT teams with everything they need to formulate agile and sustainable processes that help deliver high-quality outputs consistently. This becomes handy in data drifts as well.
In order to be authoritative, metadata should possess:
The effective implementation of metadata is based on the following principles we learnt while developing configurable design/architecture for data management platforms on-premise or on-cloud:
The idea in this blog is divided into two categories.??Principles?are those concepts judged to be common to all domains of metadata and which might inform the design of any metadata schema or application.??Practicalities?are the rules of thumb, constraints, and infrastructure issues that emerge from bringing theory into practice in the form of useful and sustainable systems.?
领英推荐
Principles:
A. Modularity : Metadata modularity is a key organizing principle for environments characterized by vastly diverse sources of content, styles of content management, and approaches to resource description. It allows designers of metadata schemas to create new assemblies based on established metadata schemas and benefit from observed best practice, rather than reinventing elements anew.
B. Extensibility : Metadata systems must allow for extensions so that particular needs of a given application can be accommodated. Some metadata elements are likely to be found in most metadata schemas (the concept of?creator?or?identifier?of an information resource, for example). Others will be specific to particular applications or domains (degree of cloud cover,?for example, in remote sensing data).
C. Refinement : Application domains will differ according to the degree of detail that is necessary or desirable. The design of metadata standards should allow schema designers to choose a level of detail appropriate to a given application. Populating databases with metadata is costly, so there are strong economic incentives to create metadata with sufficient detail to meet the functional requirements of an application, but not more.?
Practicalities:
A. Application Profiles: No single metadata element set will accommodate the functional requirements of all applications, and it becomes increasingly important to be able to also cross discovery boundaries. Application profiles will facilitate this by allowing designers to 'mix and match' schemas as appropriate. Application Profiles achieve this modularity through Cardinality enforcement:?Cardinality refers to constraints on the appearance of an element. Is it optional? Mandatory? Conditional??
B. Syntax and Semantics: Semantics is about meaning; syntax is about form. Agreements about both are necessary for two development communities or different departments or LoBs to share metadata. Two communities may agree about the meaning of the term title or creator or identifier, but until they have a shared convention for identifying and encoding values, they cannot easily exchange their metadata. This will help to standardise the data across the organization & saves lot of integration challenges.?
Kudos to all the data & quality management team members - Swati, Nainish, Joydeep, Thanga, Aarti, Dhiraj, Tanya, Yogita, Sadhana, Aaditya, Vaishali, Lalchand, Harshal, Kalai, Madhuri, Madhura, Pooja, Shubhada, Vini & all. Thank you for making it a memorable journey.
Conclusion:
Saama MDD(metadata driven) framework simplifies and automates data warehouse development end-to-end, using the agile metadata-driven approach. The product fetches metadata directly from source databases and allows you to utilize it in the design, development, and deployment phases of your data warehouse. Once implemented, introducing changes to the design is easy as the captured metadata allows you to propagate changes across the board while ensuring the integrity of existing models, integration flows, and deployments.
Want to see the power of the metadata-driven approach and how these two technologies in action together? Reach out [email protected] for more details.
Wonderful!
Very meaningful insights
Aw
Talent Acquisition Manager | Project Owner-INDIA R&D & Services | Dassault Systèmes & Medidata | Technology Enthusiast | Creative & Innovative | Mad over data & Technology Exploring opportunities? Message me??
2 年Kudos to whole team ??
Solutions Architect at Snowflake - The Data Cloud
2 年Awesome KD ! Data insights is real focus to enable more data points which was challenging in previous life :) but now it is at ease of cloud platform and reusablity feature.