DW Rule 2:  There's no point in measuring anything, if the data team can't measure itself.

DW Rule 2: There's no point in measuring anything, if the data team can't measure itself.

We've all heard the statistics, data warehouse initiatives have a HUGE failure rate.?They constantly miss functional expectations, budget expectations, timeline expectations or worse, all of the above.

A big part of why this happens is data teams aren't measuring themselves so they can't communicate what's going on and adjust strategically.

One of the best tools for resolving this is simple logging.?For every activity that occurs in the data warehouse, we need a log.?When did the ingestion/summarization start??When did it end??What was the inserted and updated row count??Did the process fail integrity checks??If we log these metrics for every process occurring, even if summarized by hour, we have the ability to actually manage both the warehouse and the team.

This will fulfill the requirements Malcom Baldrige challenged us to meet:

On time:?Did the activity occur within SLA??For hour X, how long did it take to finalize ingestion?

Backlog:?Things simetimes go sideways, so a process stalls.?When this happens, how many batches stack up at any given time that need to be processed??If at noon today, I'm three hours worth of batches behind, we need to know now AND maintain a record of that.

Volume:?For a given ingestion, how many rows were affected??We can't manage physical capacity unless we know volumetrics.

Rate:?This is a bit of a derivative of volume, but it's volume over time.?We may see situations where rate increases for one period of time, and decreases for another.?It gives us visibility into things like resource contention.

Quality:?How often are processes failing.?For a given time period, we need to know the ration of failures/successes.

Once we have these key metrics on every process by subject matter by hour/day/month etc. we can do wonderful things like compare today's performance against a standard deviation over the past three months.?This is where high performance happens.?Alerting on any anomaly in these metrics that breaks a half standard deviation from the norm will drive the data warehouse team to improve.?Constantly.

The often more important thing though, is these metrics become the source of truth for how the warehouse team is doing.?When management wants to know the value you bring, it's a simple report.?

Simon Wright

Experienced Operations Director

1 年

Great article. Would love to see more. Sometimes the simplest things are overlooked. All makes absolute sense

Giles Middleton

Head of data, Orbis Investments

1 年

Great post. We can't assume such insights will be provided out of the box and must be carefully crafted into ETL and analysis jobs.

Ram Prasad R M

Making markets work for smallholder farmers | Data Engineering at Samunnati

1 年

Enjoyed reading it. Curious to know more of your thoughts around articulating the value to Management. I'd think there are two value levers. 1) Operational lever - With cloud based platforms, the operational improvements can be converted to monthly savings in costs which is tangible and easier to convey to management (money always speaks loud and clear). 2) Business lever - For example, the value brought to business through improvements in data quality, timeliness etc. At best, this can only be notional (say, an increase in business revenue need not necessarily be caused only by the data improvement, many factors could be involved.) Can this even be measured as a tangible value?

Alex Helvig

Data Engineering Leader at Progressive Leasing

1 年

Nuzaif Sayyed This is a great summary of what I was saying the other day. Hara Chakravarthy I think you’d like this too.

要查看或添加评论,请登录

Robert Harmon的更多文章

社区洞察

其他会员也浏览了