Exploring Data Quality: Insights from 'Data Quality Engineering in Financial Services' Book
Data Quality Engineering in Financial Services

Exploring Data Quality: Insights from 'Data Quality Engineering in Financial Services' Book

Recently, I've been immersed in the book "Data Quality Engineering in Financial Services" by Brian Buzzelli making my way through the first couple of chapters. It's been an enlightening journey so far, especially delving into the concept of data as a valuable asset for businesses.

One of the ideas introduced in the book is the concept of Data Quality Specifications (DQS). Essentially, it's about collaborating with process/function experts to define what makes data acceptable both when it's extracted/generated and when it's used by people or other systems.

To achieve this, the book suggests using a framework called "Data Dimensions" to gauge the quality of our data based on certain measures.

In simpler terms, it's about checking our datasets against specific criteria:

  • Completeness: Are all the required data fields there?
  • Timeliness: Is there a timestamp associated with the data?
  • Accuracy: Is the data correct? Defined by someone who knows the business or function well.
  • Precision: How detailed and accurate are the numbers?
  • Conformity: Does the data meet any standards?
  • Congruence: Do the data from different time periods look similar?
  • Cohesion: Are the values consistent across different sets of data?

By setting these Data Quality Specifications, we're equipped to evaluate how well different datasets measure up.


Here's an example illustrating how we might assess these dimensions:

  • Completeness: All customer records include name, address, and contact information.
  • Timeliness: Sales transactions are logged with a timestamp within one hour of occurrence.
  • Accuracy: Customer account balances match actual financial statements.
  • Precision: Stock prices are recorded to two decimal places.
  • Conformity: Data follows GDPR regulations regarding personal information.
  • Congruency: Quarterly revenue figures align across different reports.
  • Cohesion: Customer IDs are consistent across sales and support databases.


As I progress through the book, I'm eager to explore how these Data Quality Specifications can refine my own data processing workflows.

Matthew Lyberg, CFA

Head of Asset Management AI at Manulife | MSCF (MFE) Carnegie Mellon | Quant, Investing, AI, Analytics

11 个月

Applications to investment data in particular are important. There are some attributes of investment data that make it materially different from operational data for example.

Steven Moore

Business Intelligence and Analytics Architect | Microsoft Azure Data | Microsoft Fabric Engineer + Power BI Engineer | CCH? Tagetik | ERP and CPM Implementation | Microsoft Dynamics 365 ERP Finance and Business Central

11 个月

Very good advice from the book. Indeed, companies spending time and effort on data quality is important.

要查看或添加评论,请登录

Ignacio Alvarez的更多文章

  • Side Project - Staging view: Cheaper is Better

    Side Project - Staging view: Cheaper is Better

    Scheduled Here we are again, trying to scrape every possible argentinian peso. Last week I commented on how, to move…

    2 条评论
  • Side Project - Staging Area??

    Side Project - Staging Area??

    Well, here we are. In my previous post, I discussed how I was ingesting data from a web service that emits public…

    5 条评论
  • Side mini-project: Ingestion from WS

    Side mini-project: Ingestion from WS

    Excited to share progress on a side project involving public transportation data from my city. I recently got access to…

    1 条评论
  • One more brick: Delta Sharing

    One more brick: Delta Sharing

    Sharing Data with Delta Sharing When there is a need to share data, either with an end client through visualization…

    3 条评论
  • One more brick: Dynamic Views

    One more brick: Dynamic Views

    In the realm of data management, especially in environments where a consumption layer is accessible to end-users or…

  • One more brick: Delta Data Skipping

    One more brick: Delta Data Skipping

    Internally, Databricks provides the "Delta Data Skipping" functionality to enhance performance in reading tables. This…

    3 条评论

社区洞察

其他会员也浏览了