I ran into a frustration today, and it's not a new frustration, but a recurring one with data. I wanted to do some research on historical project data and I couldn't get it. It wasn't that the data wasn't in the database at all, there was some of it, it's just it ended at a very strange point in time - January 1st, 2020. So far as I can recall, the company has been in business longer than that and the systems that the data were sourced from have been around longer than that, but the data only goes back to January of the prior year. Why is that?
If you've come from a world of operational reporting, the year end as some sort of magical boundary is a common occurrence. There's limited space on most reports, dashboards, etc. and once the data gets old enough. you kind of stop looking at it. After all, the reports are operational in nature, so why should anything over a year old be that interesting? In fact, there's a lot of reasons you should have more data than this.
First and foremost, just because you don't want to display it currently (or don't know why to display it) doesn't mean you shouldn't keep it. I get that it's a convenience that all your reports just nicely update themselves if the data is truncated to a year boundary, but you can definitely handle this on the reporting side of things rather than on the data ingestion/storage side of things. And you should, here's why:
- Some things in your data are cyclic. For example, it's common in IT to have a year-end freeze so that financial reporting, etc. can be done without risk of system changes impacting those time sensitive tasks. When the freeze lifts, usually in January, a whole bunch of stuff has queued up to be put into production and when you do that you're going to get a spike in production defects. At one company I worked for, without fail there'd be a panic at the start of the year when defects spiked. If you looked at the operational reports, which were truncated, this spike looked huge. Except it happened every year if you went back and looked at the data and it lasted exactly one month (as a result of all the installs) and it went away like clockwork. If you had kept, and displayed, year-over-year data you'd quickly realize that this was normal. In fact, there are ways of seasonally adjusting the data to account for these spikes and lulls so that you are only looking at the unexplainable change. This can be helpful when an increase in defects occurs where there should be a lull (like a quiet summer season for installs) or when you're trying to separate out yearly recurring noise to see if there's a larger pattern.
- Some things take longer than 12 months to really show up. Have you heard of the idea of "boiling a frog" (it's a myth by the way)? The myth is that if you slowly heat the water a frog is sitting it, it'll simply adjust to the temperature, never realize anything is wrong until it's too late. Generally speaking, you can lose sight of a bigger trend if the changes are sufficiently small over time. If you only look at a year's worth of data, a slow rising or falling trend will practically disappear. Whether its slowly degrading quality or system performance, or rising usage of system resources, if you don't see the bigger picture it can look like nothing's happening when it's happening over a longer timescale than you're aware of.
- If a project was struggling in December, is isn't automatically all better in January. I know this seems obvious, but organizations often cut their financial plans for projects at year boundaries. This makes sense from a finance perspective who is only concerned with the yearly budget, but projects don't suddenly reset at year end. If a project was struggling in the prior year and you reset the financials in the next year, suddenly what was a flaming red cost overrun disappears entirely. Why? New year, new budget. But the information in the prior year about the overruns is a really good clue that this project is probably going to overspend in the current year (and maybe have other issues too).
- Your history is a predictor of your future. Unlike the stock market which is very hard to predict, your organization's past is an excellent predictor of its future. If I wanted to make a guess about how this year's work was going to go, the best thing to do would be to look back at prior years. Companies don't magically reset at year boundaries, so the behaviors and processes and all that are going to carry over into the new year. Why ignore these things and pretend like it never happened?
- You don't generate enough data to be able to afford losing so much. IT is a slow generator of data for the most part. And it's not because you aren't doing work, but the work tends to occur in relatively large chunks (yes, even if you are doing Agile, which is somewhat better, but it's still not a lot of data) compared to say a manufacturer (who might be generating tens of thousands of parts a day). This speed and quantity of data generation is an issue because most statistical processes are really data hungry. They need hundreds or thousands of observations (particularly when the outcomes can be highly variable like IT) in order to construct all but the simplest of prediction models. That's not to say that simple models can't be helpful, but they're often unsatisfying in that they don't help you separate out more complex causes for what you're seeing. You might be able to predict a trend, but not the underlying components causing it, for example.
I'll add a corollary to the year-end boundary - your data should not end when you change systems. If you move from Portfolio and Project Management (PPM) tool A to PPM tool B, you need to find a way not to lose everything you had recorded in PPM tool A. Don't fool yourself into accepting that all that old data has to be lost with the transition. It's really important that you don't lose it. Like I noted above, IT doesn't generate a whole lot of data, and if you switch systems you don't want to discard all that history. Yet this is commonplace. Yes, it's work to keep your data and even more work to come up with ways to merge potentially differing levels of detail together, but if you can keep the data at some level it'll provide value when you want to understand how your organization has operated (and therefore how it will likely operate in the future).
Neatly filing away data may seem like it'll never be worth it, but with some inspiration about how you might use it, a rich history of data can reap real rewards for you and your organization. Here's to a Happy New Year with more data from prior years.
IT Portfolio Lead at MassMutual
3 年Thanks for sharing Adam Schwartz . This was a great read.