Interesting data model problems #1 - temporal reference data
I want to write about a common problem, but we often think it as a one-off issue when we encounter it in consulting. I came across it again at a government project recently.
Consider the dropdown below, the data like (1 Family, 2 Family et. c) is what we typically refer to as Reference Data.
Reference Data changes very slowly relative to other data in the application. It may sometimes be hard coded in the program, and at other times it is stored in a database table. A reasonable solution to this might be:
BuildingTypeId Description 1 1 Family 2 2 Family 3 3-4 Family 4 Motel-Hotel
But how do we deal with values that might no longer be valid? Perhaps something like this?
BuildingTypeId Description IsActive 1 1 Family 1 2 2 Family 1 3 3-4 Family 1 4 Motel-Hotel 0
With a government project, we needed to know when those reference values were actually valid, so we we can change IsActive to a date range.
BuildingTypeId Description FromDate ToDate 1 1 Family 1-Jan-2008 NULL2 2 Family 1-Jan-2008 NULL3 3-4 Family 1-Jan-2008 NULL4 Motel-Hotel 1-Jan-2008 1-Jan-20185 Restaurant 1-Jan-2008 1-Jan-2018
Suddenly, our problem space has become complicated.
- When we display old data, we need to include all past values in our drop down, i.e. include Motel-Hotel
- When we display a blank data entry form, we should only display current values. i.e. exclude Motel-Hotel
- If we allow users to edit old data, our UI and validation rules will have to allow for building type to be Motel-Hotel, but not other building types that were valid.
- When data is shifted across boundaries and systems, what is the date of record?
I am not trying to solve the problem here, but to enumerate how a simple business requirement can gather complexity very quickly, how this can lead to mis-estimates.
Useful questions to ask are:
- Are these values subject to change over time?
- Are these values subject to removal over time?
- Does this mean this attribute is static and should not change once it is committed?
- Which field represents the time event for the building type entry? (see Martin Fowlers post on temporal data on the difference between actual time and record time).
Solutions
1. Look up fields
The simplest and most flexible way of dealing with temporal reference data is to stop dealing with simple methods like this
getValidBuildingTypes -> BuildingTypes[]
instead we need to deal with
getValidBuildingTypes(recordDate) -> BuildingTypes[]
2. Clear documentation
The table BuildingTypes should be properly annotated with the name of the field which constitutes the recordDate. It could be the date the application was lodged, it could be the date the application was paid, or the date application was granted?