登录查看更多内容

Nothing magical happens at year end, why does your data stop there?

Adam Schwartz

发布日期: 2021年11月29日

I ran into a frustration today, and it's not a new frustration, but a recurring one with data. I wanted to do some research on historical project data and I couldn't get it. It wasn't that the data wasn't in the database at all, there was some of it, it's just it ended at a very strange point in time - January 1st, 2020. So far as I can recall, the company has been in business longer than that and the systems that the data were sourced from have been around longer than that, but the data only goes back to January of the prior year. Why is that?

If you've come from a world of operational reporting, the year end as some sort of magical boundary is a common occurrence. There's limited space on most reports, dashboards, etc. and once the data gets old enough. you kind of stop looking at it. After all, the reports are operational in nature, so why should anything over a year old be that interesting? In fact, there's a lot of reasons you should have more data than this.

First and foremost, just because you don't want to display it currently (or don't know why to display it) doesn't mean you shouldn't keep it. I get that it's a convenience that all your reports just nicely update themselves if the data is truncated to a year boundary, but you can definitely handle this on the reporting side of things rather than on the data ingestion/storage side of things. And you should, here's why:

领英推荐

DATA DATA DATA | Issue 4 | 1 November 2022

Data Cubed 2 年前

Issue 18 | October 2023

Data Cubed 1 年前

Advanced Data Flow in SwiftUI: Beyond @State and…

Evangelist Apps 8 个月前

Some things in your data are cyclic. For example, it's common in IT to have a year-end freeze so that financial reporting, etc. can be done without risk of system changes impacting those time sensitive tasks. When the freeze lifts, usually in January, a whole bunch of stuff has queued up to be put into production and when you do that you're going to get a spike in production defects. At one company I worked for, without fail there'd be a panic at the start of the year when defects spiked. If you looked at the operational reports, which were truncated, this spike looked huge. Except it happened every year if you went back and looked at the data and it lasted exactly one month (as a result of all the installs) and it went away like clockwork. If you had kept, and displayed, year-over-year data you'd quickly realize that this was normal. In fact, there are ways of seasonally adjusting the data to account for these spikes and lulls so that you are only looking at the unexplainable change. This can be helpful when an increase in defects occurs where there should be a lull (like a quiet summer season for installs) or when you're trying to separate out yearly recurring noise to see if there's a larger pattern.
Some things take longer than 12 months to really show up. Have you heard of the idea of "boiling a frog" (it's a myth by the way)? The myth is that if you slowly heat the water a frog is sitting it, it'll simply adjust to the temperature, never realize anything is wrong until it's too late. Generally speaking, you can lose sight of a bigger trend if the changes are sufficiently small over time. If you only look at a year's worth of data, a slow rising or falling trend will practically disappear. Whether its slowly degrading quality or system performance, or rising usage of system resources, if you don't see the bigger picture it can look like nothing's happening when it's happening over a longer timescale than you're aware of.
If a project was struggling in December, is isn't automatically all better in January. I know this seems obvious, but organizations often cut their financial plans for projects at year boundaries. This makes sense from a finance perspective who is only concerned with the yearly budget, but projects don't suddenly reset at year end. If a project was struggling in the prior year and you reset the financials in the next year, suddenly what was a flaming red cost overrun disappears entirely. Why? New year, new budget. But the information in the prior year about the overruns is a really good clue that this project is probably going to overspend in the current year (and maybe have other issues too).
Your history is a predictor of your future. Unlike the stock market which is very hard to predict, your organization's past is an excellent predictor of its future. If I wanted to make a guess about how this year's work was going to go, the best thing to do would be to look back at prior years. Companies don't magically reset at year boundaries, so the behaviors and processes and all that are going to carry over into the new year. Why ignore these things and pretend like it never happened?
You don't generate enough data to be able to afford losing so much. IT is a slow generator of data for the most part. And it's not because you aren't doing work, but the work tends to occur in relatively large chunks (yes, even if you are doing Agile, which is somewhat better, but it's still not a lot of data) compared to say a manufacturer (who might be generating tens of thousands of parts a day). This speed and quantity of data generation is an issue because most statistical processes are really data hungry. They need hundreds or thousands of observations (particularly when the outcomes can be highly variable like IT) in order to construct all but the simplest of prediction models. That's not to say that simple models can't be helpful, but they're often unsatisfying in that they don't help you separate out more complex causes for what you're seeing. You might be able to predict a trend, but not the underlying components causing it, for example.

I'll add a corollary to the year-end boundary - your data should not end when you change systems. If you move from Portfolio and Project Management (PPM) tool A to PPM tool B, you need to find a way not to lose everything you had recorded in PPM tool A. Don't fool yourself into accepting that all that old data has to be lost with the transition. It's really important that you don't lose it. Like I noted above, IT doesn't generate a whole lot of data, and if you switch systems you don't want to discard all that history. Yet this is commonplace. Yes, it's work to keep your data and even more work to come up with ways to merge potentially differing levels of detail together, but if you can keep the data at some level it'll provide value when you want to understand how your organization has operated (and therefore how it will likely operate in the future).

Neatly filing away data may seem like it'll never be worth it, but with some inspiration about how you might use it, a rich history of data can reap real rewards for you and your organization. Here's to a Happy New Year with more data from prior years.

Andre DeNardo, MBA

IT Portfolio Lead at MassMutual

3 年

Thanks for sharing Adam Schwartz . This was a great read.

1 次回应

要查看或添加评论，请登录

Adam Schwartz的更多文章

51% good decisions

2022年2月8日

51% good decisions

If you've ever tried to convince an organization to use a data-driven model over using their expertise, I'm sure you've…

2 条评论
"Sometimes the best solution to morale problems is just to fire all of the unhappy people." - Despair.com

2022年1月8日

"Sometimes the best solution to morale problems is just to fire all of the unhappy people." - Despair.com

I promise this isn't an article about management philosophy. If you work for a large company, or if you ever have, you…
Don't let subjectivity masquerade as objectivity

2021年11月15日

Don't let subjectivity masquerade as objectivity

This evening I was chilling out with my 11 year old watching a show on Netflix. He loves all things science…

2 条评论
What would the output look like if...

2021年10月22日

What would the output look like if...

I started learning statistics well after I had graduated college, and as such, I didn't really have a professor that I…
We wrote a book!

2020年2月6日

We wrote a book!

It was over a year ago now that I was chatting with my now co-author Deb Walsh about the idea of writing a book…

29 条评论
How the data was generated matters

2020年1月31日

How the data was generated matters

If one person told you they saw Bigfoot, you'd probably be fairly skeptical of their claim. But what if ten people were…
Get thee to a meta-analysis

2019年11月14日

Get thee to a meta-analysis

Is coffee good for you? Does drinking a glass of wine each day really increase your life expectancy? Do you find…

1 条评论
100% of successful leaders breathe air

2019年10月9日

100% of successful leaders breathe air

I'm going to start a new business selling canned air. I'm sure it is going to be huge success.

3 条评论
The data "has noise" vs "is noise"

2019年9月18日

The data "has noise" vs "is noise"

It seems like not a week goes by when someone in a meeting says "I don't trust the data." I'd be happier if most people…

4 条评论
Start with the most simple prediction

2019年8月30日

Start with the most simple prediction

There's no shortage of companies likely sending you marketing messages about how their advanced analytics can help you…

See all articles

Nothing magical happens at year end, why does your data stop there?

Adam Schwartz

领英推荐

Adam Schwartz的更多文章

社区洞察

其他会员也浏览了

The Data Insights People's News - March 2024

Powered by DATA3 Issue 15 | July 2023

Managing Data Overload in the Financial Industry ??

Buy-Side Financial Data Engineering (3) – Market Data Management

Citigroup Fined $136M for Bad Data. What Can We Learn?

Why is calculating the business value of data so tricky?

Critical Data Elements and How Much Data Do We Have to Hold About Them?

HOOK vs Data Vault: Willibald Part 2

Top-down Data Quality Approach with Conformed Dimensions

Mastering Data Accuracy: Confidence Intervals & Data Integrity Simplified

领英推荐

Adam Schwartz的更多文章

51% good decisions

"Sometimes the best solution to morale problems is just to fire all of the unhappy people." - Despair.com

Don't let subjectivity masquerade as objectivity

What would the output look like if...

We wrote a book!

How the data was generated matters

Get thee to a meta-analysis

100% of successful leaders breathe air

The data "has noise" vs "is noise"

Start with the most simple prediction

社区洞察

其他会员也浏览了

The Data Insights People's News - March 2024

Powered by DATA3 Issue 15 | July 2023

Managing Data Overload in the Financial Industry ??

Buy-Side Financial Data Engineering (3) – Market Data Management

Citigroup Fined $136M for Bad Data. What Can We Learn?

Why is calculating the business value of data so tricky?

Critical Data Elements and How Much Data Do We Have to Hold About Them?

HOOK vs Data Vault: Willibald Part 2

Top-down Data Quality Approach with Conformed Dimensions

Mastering Data Accuracy: Confidence Intervals & Data Integrity Simplified