Episode 5: Solving the Mysteries of The Data Management

Episode 5: Solving the Mysteries of The Data Management

Introduction:

I am Beshoy Gamal, Bigdata and Machine Learning geek, I have worked on implementing data-driven solutions?for more than 9 years in cross countries in the world and cross technologies from on-primes and cloud, and now I am working in Vodafone Group as Senior Data Architect.

From my all experience, I found that many organizations have invested in a central data lake and a data team with the expectation to drive their business based on data. However, after a few initial quick wins, they notice that?the central data team often becomes a bottleneck,?as they?cannot handle all the analytical questions of management and product owners quickly enough

So I have decided to write this series of articles about the DataMesh, Data Product, Selfe Services, and Data Democratization


Data Mesh Principal 4: Federated Governance

In the previous episode, we studied the Self-Services Analytics passed by Cloud Technologies.

No alt text provided for this image

Now we are going to start studying the Data Mesh Principal 4: Federated Governance, but before starting data governance let's take a step above and talk about the Data Management


Why is Data Management Critical to Your Business?

No alt text provided for this image


While most businesses today have put together a documented data strategy, a majority of those businesses have yet to become truly data-driven. Most still don’t treat data as a business asset to help them successfully compete in the marketplace. As a result, there’s a massive opportunity for organizations that recognize the importance of creating a holistic data infrastructure. By implementing a combination of data virtualization, master data management (MDM),?metadata management, and other essential data management technologies, businesses can better meet business objectives and place data at the center of their business.

  1. What is data management?
  2. Types of Data Management
  3. Data Management Goals and Challenges
  4. Data Management Processes and Strategy
  5. Data Mesh Management: Federated Governance

What is data management?

Data management is the effective practice of collecting, storing, protecting, delivering, and processing data. In business, data is usually associated with customers, prospects, employees, deals, competitors, and finances. When an organization effectively manages data, they gain insights that drive business decisions.

Protecting your data should be a priority throughout the entire process, especially as data privacy concerns rise and?ransomware attacks become rampant.

Since business applications and the databases within them come in all sizes, each company should take its own approach to these stages. You should do so considering your particular technology ecosystem and, if necessary, define and add new steps to the process.

Data cleansing, for instance, might be a small and short step for a startup with limited data. But an enterprise-level company might need to prioritize it early on in the process.


Types of Data Management

No alt text provided for this image


Managing data is a complicated job that impacts every facet of your business. Data management might include daily tasks, policy creation, or maintaining processes. So, whether you’re researching?big data?or?master data, you'll use many types of data management.


Data Lifecycle Management

No alt text provided for this image


In simple terms,?DLM?identifies the different stages that information flows through and creates policies to manage each one of those stages.

The ultimate goal of this framework is to maximize the useful life of your data.

The stages or steps of DLM are:

  • Collection
  • Access
  • Usage
  • Storage
  • Transfer
  • Deletion or destruction

DLM is mostly used by big companies working with massive amounts of data that need to be categorized into tiers, often with complex automation.

For smaller businesses, it can also be a useful structure to keep in mind to create?scalable data management strategies.

Data Pipelines

No alt text provided for this image


A data pipeline is the path that a group of data takes from one system to another. Sometimes following these paths changes the data, but other times the data stays the same.

For example, say you are a HubSpot customer working on a?Google Ads campaign. Your paid ad data moves from Google Ads into your HubSpot dashboard through the integration. This allows you to analyze paid ad data from multiple platforms in one spot.

For ease of comparison, you can make changes to the data through this process, for example, matching the time zones. You can also leave the data the same.

ETLs (Extract, Transform, Load)

No alt text provided for this image


ETLs are a popular type of data pipeline. They make it easier for businesses to pull data from multiple sources into a single source. During the process, the data moves through three steps:

Extraction: Pulling data from a database.

Transformation: Manipulating the data with code for formatting and preparing it for analysis.

Loading: Writing the data into the new location.

Data Processing

Data processing is when data scientists collect and translate data into useful information.

There are three typical methods for data processing — electronic, mechanical, and manual. Many businesses today rely on automated data processing.

Inaccurate data processing can have serious impacts on data output. The wrong data can lead companies to act on the wrong ideas and strategies.

Data Architecture

Data architecture is a structure that helps your team support your data strategy. It shows how your company gets its data and where that data goes. It also covers data storage, usage, and security. Data architecture is where most data strategy begins.

Your data architecture helps your business understand your data. It also makes it easier to create guidelines for data governance.

Data Modeling

No alt text provided for this image


Data models are simple diagrams of your systems and the data those systems contain. Data modeling makes it easier for teams to see how data flows through your systems and business processes.

Here are some examples of information a data model might include:

  • Product data
  • Partner information
  • Customer data

Data Catalogs

No alt text provided for this image


Data catalogs are inventories of data resources within a business. They usually use metadata to organize these resources. A data catalog can make business data more transparent and searchable for users.

For example, vendors like?Google?offer data catalogs as a complementary product for data management. These products are essentially search bars to make data assets easy to find and categorize.

If you are running a small business, you can replicate the function of data catalogs by creating an inventory of all the data assets your company has. This data catalog can help your different teams easily find the data they need to access. Tags and labels are a great way to categorize groups of data to find them easily later on.

Having a clear and complete inventory of your data assets is also useful when you want to build workflows or integrations between databases.

Data Integration

No alt text provided for this image


Data integrations combine data from different systems to create a unified data set.

Data is rarely collected by a single platform. Usually, there are several applications in place for specialized processes. Separate teams often have their own databases and each gathers a section of your company's data.

For example, let's say you have an online shop where you sell running shoes. You might have one app gathering the information your customers fill out when they make a purchase. A second app will collect billing or accounting information. The third app with a chatbot answers customer questions.

Each app collects data about each customer. The goal of integration is to pull those fragments together and offer a single customer view (SCV).

When you integrate data, its quality improves because you can compare data for accuracy and relevance. Integration also allows you to track users throughout the entire customer journey.

If your company is working with in-house software applications, you might need a team of engineers with an ad-hoc solution to integrate your data. For those small and medium-sized organizations that work with cloud-based platforms,?iPaaS?can be a great solution.

ETL is a type of data integration. Data integration is different from ETL because ETL processes data within a warehouse environment.

Data Governance

No alt text provided for this image


Data governance?is the rules and procedures that define data management at a company. Often a team or individual will be responsible for data governance. They will be responsible for things such as:

  • Access requests
  • Column name definitions
  • Database record maintenance

Effective data governance creates consistent and trustworthy data. It also helps keep data secure.

Data Security

Companies use data security to protect data from theft, corruption, and more throughout the data lifecycle.

Data security includes:

  • Hardware
  • Software
  • Storage
  • Backups
  • User devices
  • Access
  • Admin controls
  • Data governance

For example,?CAPTCHAs?are a popular way to deter hackers from entering malicious code into web forms.

Data Storage

Data storage is the practice of recording and preserving data for the future. Electronic storage is more common than paper document storage because of the increased volume of data.

Companies might use magnetic tape, optical discs, or mechanical media to store data. Other options include:

  • Physical file storage
  • Block storage in storage area networks (SANs)
  • Object storage, which stores objects like videos from Facebook or files from Dropbox

Customer Data Platforms and Data Warehouses

Data warehouses and customer data platforms are two common ways companies collect and store data.

A?data warehouse?is a database that a company transfers all its data to – usually from diverse sources. Data warehouses are often called data lakes or data marts. You may also be familiar with the term enterprise data warehouse (EDW) for larger companies.

A?customer data platform?is a more user-friendly platform. It also collects data relevant to your customers and displays the data to end-users in tailored, visual reports. Often, a customer data platform is simply the ‘front end' of a behind-the-scenes data warehouse.

In both cases, a business may store all the data from its CRM, help desk, web analytics, financial, and other internal systems in one of these locations.

Other Data Management Concepts

Metadata

Metadata is data that describes other data within a database or data warehouse.

Business Intelligence

Business intelligence is the practice of analyzing and presenting data to offer insights that can help companies make business decisions. This data often takes the form of a metrics dashboard or report.

Data Cleansing

This process is also sometimes called data scrubbing. It's the process of detecting and correcting corrupt, inaccurate data.

Data cleansing might include removing duplicates, correcting errors, or removing outliers. Because this process can be tedious and time-consuming, validation of data accuracy is important.

Data Quality

Data quality?is one of the main obstacles companies face today.

In a 2021 Experian study,?over 50%?of business leaders say they don’t fully trust their data assets.

And there are many reasons for that lack of faith. Ambiguous, incomplete, and duplicate data, different formatting, and access difficulties all impact data quality.

And data quality depends on business priorities. So, those data irregularities can also erode employee and customer trust.

When talking about high-quality data, there are three concepts to highlight:

  • Accessibility
  • Consistency
  • Relevance

No alt text provided for this image

Data Management Processes: How to Carry Out Your Data Strategy

Developing and putting a data management strategy in place has many benefits.

Whether you are refining enterprise data processes or creating a data strategy for your small business, it’s an exacting process.

While many steps will be unique to your business and its data, these steps are a good place to start.

No alt text provided for this image

Data Mesh Management: Federated Governance

No alt text provided for this image


?Data mesh implementation requires a governance model that embraces?decentralization and domain self-sovereignty, interoperability through global standardization, a dynamic topology?and most importantly?automated execution of decisions by the platform. I call this a federated computational governance. A decision making model led by the federation of domain data product owners and data platform product owners, with autonomy and domain-local decision making power, while creating and adhering to a set of global rules - rules applied to all data products and their interfaces - to ensure a healthy and interoperable ecosystem. The group has a difficult job: maintaining an?equilibrium between centralization and decentralization; what decisions need to be localized to each domain and what decisions should be made globally for all domains. Ultimately global decisions have one purpose, creating?interoperability?and a?compounding network effect?through discovery and composition of data products.

The priorities of the governance in data mesh are different from traditional governance of analytical data management systems. While they both ultimately set out to get value from data, traditional data governance attempts to achieve that through centralization of decision making, and establishing global canonical representation of data with minimal support for change. Data mesh's federated computational governance, in contrast, embraces change and multiple interpretive contexts.

Many practices of pre-data-mesh governance, as a centralized function, are no longer applicable to the data mesh paradigm. For example, the past emphasis on certification of golden datasets - the datasets that have gone through a centralized process of quality control and certification and marked as trustworthy - as a central function of governance is no longer relevant. This had stemmed from the fact that in the previous data management paradigms, data - in whatever quality and format - gets extracted from operational domain’s databases and gets centrally stored in a warehouse or a lake that now requires a centralized team to apply cleansing, harmonization and encryption processes to it; often under the custodianship of a centralized governance group. Data mesh completely decentralizes this concern. A domain dataset only becomes a data product after it locally, within the domain, goes through the process of quality assurance according to the expected data product quality metrics and the global standardization rules. The domain data product owners are best placed to decide how to measure their domain’s data quality knowing the details of domain operations producing the data in the first place. Despite such localized decision making and autonomy, they need to comply with the modeling of quality and specification of SLOs based on a global standard, defined by the global federated governance team, and automated by the platform.

The following table shows the contrast between centralized (data lake, data warehouse) model of data governance, and data mesh.

No alt text provided for this image

要查看或添加评论,请登录

Beshoy Gamal的更多文章

社区洞察

其他会员也浏览了