At least three good reasons to document data

At least three good reasons to document data

Record or report in detail, support with citations or references, support with evidence or proof, provide information is the semantics of the term "documenting". Prefixed to data, the purpose of data documentation is to collect information about the data to facilitate understanding, interpretation, transformation, use, transmission, etc.

Today, data documentation is proving to be a necessity, and sometimes an obligation, for the company that handles daily a large variety of data or in which data are increasingly shared.

This article answers two key questions: Why and what? Why documenting data raises the question of both motivations and benefits. What to document is concerned with the relevant attributes to serve intended uses (regulatory reporting, personal data processing, 360 ° customer view, sales reporting, financial reporting, etc.), which may vary depending on the motivations.

Why document data?

In the course of its mission, any company is led to build relationships with customers, suppliers, partners, administrations and other authorities, etc. As part of these relationships, the company is in charge of create and deliver or of receive and use data. It acts respectively as "data producer " or as "data consumer". Thus, creation, delivery, reception and use of data are materialized by the data processing.

Prior to any use, the company may be obliged (legal and regulatory framework) or find interest in documenting the data, to be able:

  • As a data consumer, to inform about its uses, requirements and desired levels of service;
  • As a data producer, specify the authorizations, restrictions or prohibitions of use, the characteristics of the data and actual service levels.

During implementation, the documentation is then used as a basis for the reception of the data delivery by the data consumer and for the acquittal of the data producer for the service delivered (see Figure 1).

Figure 1: Simplified relationship between data producer and data consumer - actor to actor

For the company, documenting data can therefore appear as a necessity induced by:

  • External relations (Client - Supplier, Company - Supervisory Authority, Company - Administration, etc.), which take place in a legal, regulatory, economic, social and cultural environment, which regulates authorizations, restrictions and prohibitions use related to the data.
  • Internal relationships, which are materialized by data processing between actors, roles, organizational units, etc.

In the latter context, the data consumer may find it useful to document data to inform its intended uses, requirements and service levels (see Figure 2). The data producer may also find it useful to specify the authorizations, restrictions or prohibitions of use, the data characteristics and actual service levels. The company can finally find an interest in coordinating these individual interests to help meet its legal and regulatory commitments.

Figure 2: Simplified relationship between data producer and data consumer – activity to activity

What should be documented?

Data documentation consists in setting up a documentary system, that is to say "a structured and organized set of documents of different natures" within the meaning of the AFNOR FD S 99-31 standard. It has the advantage of:

  • Preserving knowledge and facilitate its transmission;
  • Building an accessible, valid, timely and shared information base;
  • Preventing gaps, risks and malfunctions;
  • Providing an information and knowledge basis for the harmonization of practices.

Despite this apparent and immutable utility, the development of a corporate data documentation strategy is a recent practice. Several reasons contributed to this change:

The first is due to a productivity challenge (personal or collective), in a context characterized by the digital revolution. Indeed, the digital transformation has brought to the company capabilities of representing, collecting and processing data, such as the ability to digitize a character, sound, form, color, word, text, photography, music, film, etc. The acquisition of these capabilities was accompanied by the need to document the data thus collected, to facilitate identification, localization, search or use.

In this context, the useful documentary attributes are:

  • Descriptive, to characterize the data content;
  • Structural, to specify the data format and syntax;
  • Location, to characterize the data location;
  • Administrative, to protect intellectual property or define preservation rules.

Equipped with an elaborated data documentation, the user is thus more productive: he devotes his time to the exploitation of the data and no longer to their preliminary evaluation. In addition, all users enjoy the same document quality, in contrast to the case where the evaluation would be left to the discretion of each user.

The following table lists some attributes by type within this context:

Aucun texte alternatif pour cette image

Risk control is the second issue. It takes shape between regulation, norms, and standardization.

Indeed, many regulators have been concerned about the risks to the data subjects’ rights and freedoms. Texts have emerged here and there (HIPAA on the processing and exchange of health data in the United States, GDPR in Europe, etc.), with the same desire to empowering businesses and data subjects. Other initiatives (Solvency II in the insurance sector, Basel II and III or CRD IV for the banking sector, etc.) have taken shape to address prudential risk, develop the culture of self-assessment and promote the monitoring of major risks. Finally, others, based on the use of standards (IFRS, etc.) or the promotion of a common language (BCBS 239), have been developed to control data quality issues and risk assessment and to facilitate reconciliations and comparisons.

In this context, the useful documentary attributes are:

  • Descriptive, to share the semantics of the data;
  • Prescriptive, to document authorizations, restrictions and prohibitions of use, as well as the functional and operational requirements related to regulations;
  • Location, to characterize the origination and destination of the data and to explain the path covered by the data.

Shared semantics aims to improve understanding and to reduce errors in interpretation or implementation; it also makes it possible to limit the efforts made to reconcile the data. The prescriptive attributes shed light on the conditions of lawfulness or fairness of a processing relating to personal data and make it possible to justify its purpose; they also make it possible to specify the requirements (quality, security, protection or life cycle) that weigh on the data. The knowledge of the path covered (processing steps) by the data allows for rapid impact or dependency analyzes, which meet the expectations of the supervisory authorities and the internal actors of the company.

The table below additional attributes within this context:

Aucun texte alternatif pour cette image

Traceability is the third key issue. Indeed, the continuing evolution of regulations, data-driven decision-making and digital transformation are all elements that promote the emergence of new requirements for data traceability. In this context, the company must document data to control:

  • The origin, especially in its identification, semantics, quality levels at source (which may be non-contractual), etc.;
  • The destination, in its identification, the result, its quality levels at the target;
  • The transformation cycle that leads to this result, in the rules applied, the controls carried out and the responsibilities involved.

This documentation plays an important role in monitoring and assessing levels of service. It allows the company to:

  • Acting curatively to rectify detected defects (incident management);
  • Conducting an upstream analysis (dependency analysis), prior to corrective actions;
  • Conducting a downstream analysis (impact analysis) prior to a change;
  • Preemptively integrating relevant documentary attributes into the design (requirement management);
  • Taking legal action, challenging a data producer that has caused harm to the company or proving it as a data producer in the event of a data breach by a data consumer.

Data documentation brings transparency on the data processing carried out. It assures the data consumer of the impacts associated with the exploitation of data. It gives him more confidence in his actions and decisions.

The documentary attributes useful in this context are:

  • Structural, to characterize the composition and relationships between data;
  • Location, to define the origination and destination of the data;
  • Processes, to determine the stages of transformation since the origination;
  • Rules, to specify the transformations;
  • Commitments, to characterize levels of service;
  • Administrative, to define the data lifecycle.

Finally, more than a necessity, an obligation: Compliance is the fourth issue that could be cited. Indeed, the company is today confronted with a regulatory proliferation, which is accompanied by various requirements:

  • Enhanced requirements for reporting and evidence management at the functional level;
  • Tighter requirements in terms of data granularity and depth as well as service levels (notification period, data completeness, etc.) at the operational level;
  • Increased requirements in terms of means (roles and responsibilities) of production (process) and control (policies and procedures) at the organizational level.

In order to propose appropriate answers, the company needs to identify and locate the data subject to the regulatory provisions. In this case, useful documentary attributes are:

  • Descriptive, to characterize the content of the data;
  • Prescriptive, to say what is authorized, subject to or prohibited by a regulatory provision;
  • Location, to characterize the origination, the waypoints and the destination of the data;
  • Stewardship, to clarify data rules and ensure that they are enforced;
  • Administrative, to control the data lifecycle.

In other words, a company needs to back data processing on a data dictionary. This should be the baseline for data, both for description and for prescription, stewardship, structure, processes, rules, commitments, etc. In addition, the company needs to support the actual use of data and evidence management on data lineage.

What to be expected from documenting data?

Documenting data is therefore as much a necessity as an obligation. For the company, the expected benefits can be expressed in terms of:

Individual or collective productivity

  • Definitions and other information related to a given data (rules, responsible, etc.) are documented, time is saved in the search (productivity in the search for information);
  • Definitions and other information related to a given data (rules, responsible, etc.) are documented, requests and other operations requiring these data (decision making) are processed more quickly (more reactive);
  • Processing stages are documented, time is saved in the implementation of controls (control effectiveness);
  • Processing stages are documented, time is saved in impact analysis (impact analysis effectiveness);
  • Errors are fewer, money is saved from error correcting (reduction of non-quality costs and more generally operational support costs);
  • Errors are fewer, data is delivered more and more in time and in the right place (productivity in the data delivery);
  • Less time is spent correcting errors, time is saved for other tasks (user devotes more time to his main activity);
  • Staff share the same data documentation; they understand each other better and lose less time getting together (fluid collaboration);
  • Systems are sharing the same metadata repositories, data exchanges are easier between systems, organizational units (regions, business units, etc.);
  • Etc.

Usability

  • Data definitions, rules and local specificities are documented, interpretability and comparability are improved (fluidity of cognitive processes);
  • Data definitions, rules and local specificities are documented, transposition, reconciliation and consolidation are improved (fluidity of upstream analytical processes);
  • Data rules, controls and quality levels are documented, downstream analytical processes and data usages can be clarified and justified;
  • Etc.

Risk and cost control

  • Data authorizations, restrictions and prohibitions are documented, then data are used wisely;
  • Risks associated with the data are documented, controls can therefore be implemented in the right place;
  • Reliable impact assessments are carried out, so the projects’ failures are limited;
  • Data definitions and related information are documented, the impacts of staff turnover are controlled;
  • Etc.

Regulatory compliance

  • Data definitions and related information are documented and can be expected to reflect regulatory requirements;
  • Processing stages are documented, results can be justified;
  • Processing stages are documented, results can be replayed;
  • Processing stages are documented, checks done, and related information can be proven;
  • Etc.

In conclusion

Documenting data is as much an opportunity as an obligation. The data concerned may be different in both cases: accounting and risk data in the latter case and legal, commercial and operational data in the first case.

For the company that is moving in this direction, the effort required is not negligible and must be oriented by data classification. Indeed, the decision to document the data must prioritize critical data (whose unavailability penalizes the execution of a process), sensitive data (whose diffusion is likely to affect data subjects' rights and freedoms) and regulatory data (those subject to the regulations). For all these:

  • Data quality, protection or security rules will have to be stated and documented;
  • Data lineage will have to be developed;
  • Data quality will have to be measured;
  • Data quality issues will have to be managed and resolved;
  • Etc.

 Adopting such a strategy implies for the company to develop three types of capabilities:

  • Organizational, to define (policy, stewardship model, governance bodies, roles and responsibility matrix), execute and control (evaluation, decision, arbitration, resolution, audit) the metadata processing;
  • Functional, in terms of metadata modeling (enterprise, conceptual, logical, physical and operational), processing (acquisition, storage, retrieval, collaboration, administration, distribution) and delivery (dictionary, catalog, lineage, reporting);
  • Technological, in terms of metadata architecture, software and infrastructure, to enable users (business, technical or operational) of the documentation to quickly find the data meaning, the data rules and the associated levels of quality, protection or security; consuming applications, producing applications, golden sources; the data processing steps, from creation to use through the transformations undergone; the roles involved in the processing steps, the responsibilities in defining the data and associated rules; etc.





JULES GONZALES

Inbound & Outbound Marketing Manager

5 年

Très intéressant sujet j'attends impatiemment la suite Charles. Cordialement. Jules

回复
Charles NGANDO BLACK

Data.AI Consulting Director, Author

5 年

Merci Jo.

回复
JOSEPH YITEMBEN

Consultant chez yit consulting

5 年

Salut ! Félicitations pour cette publication. Joseph

要查看或添加评论,请登录

Charles NGANDO BLACK的更多文章

社区洞察

其他会员也浏览了