Accelerating data value through data lineage

Accelerating data value through data lineage

Today, organizations generate huge volumes of data. Increasingly this information is published and shared through data portals with employees, the public, partners and other stakeholders.?

Publishing and sharing generates new uses for this data and transforms its value, helping to build agile, innovative, data-driven ecosystems.

However, currently it is difficult for businesses to?know which datasets are being used, and what they are being used for.

Tracking these uses and reuses manually is incredibly time-consuming, and simply not possible on larger portals with many datasets.

This means that organizations lack insights that can be used to enhance and accelerate data value. For example:

  • Which types of data are reused most? Are there similar datasets that could also be shared, or could data be updated more regularly?
  • Which datasets are used least? Should they be removed or promoted more heavily to users?
  • Are there invalid relationships between datasets? How can quality be improved?
  • How is usage split between external and internal visitors? How can audiences be encouraged to increase their usage?

What is needed is a new approach to data lineage that analyzes and accelerates data use inside and outside your organization.


The changing face of data lineage

Traditional data lineage tools focus on the technical aspects of how data flows through your organization. They aim to trace back errors to their source and therefore remove them, improving data quality. These tools provide a record of data throughout its lifecycle, including its origin and any transformations or joins that have been applied.?

The information provided is technical and focused on factors such as consistency and accuracy. For example, data lineage could be used to show how changes in a source system could affect downstream uses of data or to track down any changes made by users to datasets that impacted other systems and processes. These changes could then be reversed before any lasting damage was done.

Next generation tools, such as Opendatasoft’s new data lineage feature, go beyond technical monitoring to deliver more strategic insights that drive greater data sharing. They still model the dataset journey from its creation to final destination, providing information on its origin and how it has been modified, joined and transformed. However by tracking key performance indicators (KPIs) they provide answers to crucial questions that help enhance your data sharing strategy, enabling you to:

  • Better understand the needs of data consumers, by identifying the datasets and formats that they reuse most
  • Improve data maintenance by tracking the status of relationships and which datasets deliver most value
  • Track relationships with other data providers, showing how much external data is on your portal
  • Understand who is using your data portal, by measuring the split of traffic between internal and external visitors


The features required for effective data lineage

Data lineage has to be easy to understand, without requiring detailed technical knowledge of data. Tools should therefore contain these key features:

The ability to map data

To better view data flows you need your tool to provide detailed mapping of each and every dataset, from its creation to final destination. This has to be presented in an intuitive, easily-understandable way, allowing you to drill down to see the current status of data, relationship between datasets, objects and processors used in the journey and which modifications have taken place.

No alt text provided for this image

Dashboard and KPIs

Managing data lineage effectively is critical to maximizing value from your data. That’s why you need a tool built around a clear dashboard that provides instant analysis of data usage, showing you what is happening across your data portal compared to pre-set KPIs.

No alt text provided for this image


Data lineage in action

These examples from Opendatasoft clients demonstrate how data lineage can drive greater value from data sharing.

UK Power Networks?

Electricity distribution network and system operator UK Power Networks (UKPN) is using data lineage to better understand what its users, who range from energy developers to local authorities, are doing with the data shared on its portal. Data lineage provides additional insight into the maps and charts that have been created, while preserving user anonymity. This helps better plan and meet user needs. In this example, a user has combined datasets showing different overhead electricity? lines and models of forecast load and potential network development for a specific geographic area.?

No alt text provided for this image

Lamie?mutuelle

Lamie mutuelle specializes in health and property insurance, and gives all of its employees a 360° customer view through access to data. It is using data lineage to map how data flows through its systems and to better understand how it is used internally. This is especially important as much of its data comes from its external sales partners, before flowing into systems such as Lamie mutuelle’s CRM solution.? This means it has to be checked for quality, such as against the government database of companies to ensure information is correct, before it is then reused on the corresponding company page for that organization.

No alt text provided for this image

Observatoire des finances et de la gestion publique locales (OFGL)

OFGL collects, analyzes and shares information on the financial management of French local authorities, which is then made available via its rapidly growing data portal. As the portal is managed by a large team, it uses data lineage to understand flows and links between different datasets, or between datasets and pages, controlling operations and ensuring consistency and reliability. OFGL uses a mixture of publicly available data (such as from the French national statistics agency) and its own information, which is then used to build pages on its portal and is also reused by other government agencies, such as La Caisse des Dépots.

No alt text provided for this image


As data drives greater efficiency and innovation inside and outside organizations, understanding how it is reused is increasingly vital to accelerating its use and enabling data democratization. Data lineage is central to creating this understanding, whatever your data sharing strategy.?


Keep up to date on the latest data democratization & data experience news by following this monthly newsletter, our?LinkedIn account?& our?Twitter account.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了