Zen and the Art of Data Quality

Zen and the Art of Data Quality

No alt text provided for this image

What is “Quality”?

In his 1974 book “Zen and the Art of Motorcycle Maintenance”, Robert Pirsig considers this a philosophical question.?He describes a fictional journey with his friend, John Sutherland, who has a brand new motorcycle - which he does not maintain, as he prefers to be ‘in the moment’ and to enjoy riding his machine.?When it goes wrong, he seeks professional help. By contrast, Robert rides an older machine which he maintains himself. This requires knowledge of the inner mechanics, but allows him to make proactive adjustments as they travel.?John thinks Robert is very boring! The book demonstrates that motorcycle maintenance may be dull and tedious drudgery, or an enjoyable and pleasurable pastime; it all depends on attitude. It concludes that to truly experience Quality one must embrace both sides and apply as best fits the requirements of the situation.

When searching for Truth in data (or at least a single version of it), as in philosophy, there are several dimensions to consider.

No alt text provided for this image

How is Data Quality defined?

In October 2017, the National Bank of Belgium issued some very detailed guidance on Data Quality. Defined as “the adequacy of the data for the ultimate user goal”, they specified six dimensions by which this would be assessed - as described below.

Accuracy, or - “Get it Right”

This represents the extent to which data values correctly describe the underlying concept / data definition.?Has the correct calculation been applied? Is that an actual ISO country code? Is it displayed in the right currency, unit of measure (millions / thousands)? Were there any errors in the calculation process?

Reliability, or - “Spot the Difference”

This refers to the difference between ‘versions’ of data submitted.?What was the extent of the change? Do we know / understand the difference between the old values and the new?

Completeness, or - “You Missed a Bit”

This dimension is about ensuring that all the relevant data items have been supplied.?A key factor here is actually knowing which of your data items are relevant - ie, that they are ultimately used for producing the prudential reports. If you know your data lineage, you know the scope for checking that all of your reporting inputs are complete.

Consistency, or - “One Version of the Truth”

Described as “logical concordance between data subsets” this basically means that where the same value (or set of values) appear in more than one report, they should not contradict.?Are you using the same version of source data as your colleagues in the other department? Does “Total Assets” (for example) match across reports produced by different systems?

Plausibility, or - “That Doesn’t Usually Happen”

Using time series analysis across reporting periods it should be possible to identify variables that have deviated significantly from the usual results.?

Timeliness, or - “Why are we waiting?”

This ultimately refers to the amount of time between the end of the reporting period and the point at which results are submitted.?On a more individual / systems level, it refers to the point at which the last required input is received, and the final output is produced.?This is especially important for those data processes that are on the ‘critical path’. Is your data produced in a reasonable timeframe?

Principles of Data Quality

No alt text provided for this image

Whilst John would get his motorcycle fixed after it went wrong, Robert would proactively inspect his machine on a regular basis.?In this regard, the NBB defines a set of principles with some specific action points that might be considered as “Preventative Maintenance” for Data Quality.

Principle 1 - Governance

The process of preparing, verifying, and submitting the prudential data to the Bank should be supported by a robust, documented governance system.?The main impact of this for regular users is the identification of roles / responsibilities - who is responsible for the data??Given that this information is always changing, it is important that this information (evidence) can be captured in an automated way.

In addition, there ought to be a separation between those who prepare the data, and those who validate it (sign it off) - which can be described as a ‘4 eyes’ principle.

Principle 2 - Technical Capacities

Institutions should design, establish and manage such data architecture and IT infrastructure as are appropriate for producing and verifying prudential reporting.?There are a number of points here - firstly, do your systems have the capacity to ensure reporting can be produced even in times of stress / crisis??What monitoring do you have in place to ensure this? Performance tuning solutions such as ESM? are invaluable when running heavy analytic workloads in a shared environment during a busy month end.

Secondly, are tools in place to ensure timely detection and resolution of Data Quality errors and inconsistencies? Is this information archived, and appropriately followed up??Are these tools periodically reviewed and maintained?

Finally, it is important that tools for information management are as automated and and integrated as possible.?Each unconnected End-User Computing (EUC) application requires a secured, verified and documented process to ensure it’s reliability.?Where that EUC involves manual data processing, it is also necessary to document the reason for manual processing, as well as the associated risks, and measures taken to compensate for those risks. Now if only there was an easy way to integrate and validate that EUC data..

Principle 3 - Process

No alt text provided for this image

The process of preparing, verifying and submitting the prudential data to the Bank should follow a documented internal process.?This is by far the most onerous of the principles!?A long list of requirements ensue:

  • A description of the flow of information within the institution
  • A description of the processes for verifying that reporting is fit for purpose (instructions in force)
  • A description of the processes for correction and final validation of reporting before the numbers are submitted
  • A documented list of all the mandatory quarterly external reports to be submitted, with the associated tests & checks (plus any additional internal tests applied).?
  • A glossary of all the concepts and instructions used in reporting (and the linkages between them).?This glossary may contain the assumptions and interpretations drawn from the various frameworks.
  • A record of ALL reconciliations between different systems and divisions within the institution itself (!)
  • Per reporting table - a list of divisions involved in the preparation, validation, application of controls, and final approval
  • Documentation of the key controls and steps taken to ensure data quality - this should show for instance, who operates a particular control, when, using which tool.
  • Procedures in place for detecting, reporting and explaining data integrity issues & errors, which should be fully integrated and consistent across the entity or group
  • A robust and documented user access policy
  • Processes should be periodically reviewed & improved to ensure compliance with regulatory policy at all times

As if that wasn’t enough - accredited statutory external auditors are also required to examine Data Quality.

Where is the Zen?

No alt text provided for this image

If your company is based in Belgium, or a country with equivalent requirements for Data Quality, you’d be forgiven for thinking the above requirements are horrifically manual and not particularly zen-like.

The good news is, that if you have a SAS? BI platform, the chances are that a significant chunk of the above can be automated.?The Data Controller is a modern HTML5 web application for real-time data modification and approval (4 eyes) workflow with full audit trail.?Data Quality rules are applied at source, and ‘hook scripts’ allow SAS jobs to execute after data updates are approved, enabling full automation and integration with existing systems.

More information on how this tool can help automate the requirements of the NBB_2017_27 circular?is available below:

https://datacontroller.io/data-quality-and-the-nbb_2017_27-circular/

“Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.”

Kirk Paul Lafler

A Data Scientist, Consultant, Educator, Developer, Programmer, and problem solver who transforms organizations and people with intelligent data-driven solutions and analytics.

1 年

Nice article, Allan B. !

Leonid Batkhan

SAS Consultant & Blogger

3 年

Great, well-written article, Allan! However, definition of Data Quality by the Bank as the “adequacy of data for the ultimate user goal” seems odd. I don't think data quality is connected in any way with "user goal", but rather with depiction of reality. If the ultimate user goal is to deceive then incorrect/madeup data is of great data quality ??

Allan B.

SAS App Migration, Modernisation, and Manifestation

5 年

For the programming geeks, this is a MUST follow-on read:? https://www.slideshare.net/DavidHorvath22/20190413-zen-and-the-art-of-programming?credit David Horvath

回复
Cameron Steward

Strategy | Design | Research | Analytics

5 年

Great article, I love the link to Pirsig, one of my fav books

要查看或添加评论,请登录

Allan B.的更多文章

  • The True Cost of an Excel Loader

    The True Cost of an Excel Loader

    Do you work in a regulated industry, with large volumes of source data, performing analyses in Excel? Do you join those…

    7 条评论
  • 5 Tips for SAS App Developers

    5 Tips for SAS App Developers

    If you have SAS, you have a powerful platform for Enterprise App Development - one that runs extremely fast, connects…

    5 条评论
  • SAS London User Group

    SAS London User Group

    The launch party for the SAS London User Group happened last night, and what an event it was! In just 5 weeks we have…

    8 条评论
  • Easy AF SCL Modernisation with HTML5 and SAS

    Easy AF SCL Modernisation with HTML5 and SAS

    A long time ago in a galaxy far, far away - the only SAS available was Base SAS. Options to surface SAS in an…

    14 条评论
  • SAS UK Forum 2016

    SAS UK Forum 2016

    Another year, another UK SAS conference! 2016 marks a return to the land of Brum, a venue not used since 2007 and home…

    3 条评论
  • SAS Global Forum 2016

    SAS Global Forum 2016

    If SAS are known for doing one thing well (other than software), it has to be - throwing a good party! And this year…

    5 条评论

社区洞察

其他会员也浏览了