Agile SSIS series (7/12): Prototyping the agile way!
Petr Podrouzek
?? Global Tech Leader | ?? SVP at Emplifi | ?? Strategy & Engineering Excellence
In sprint 3 the development of FLOGS started. In sprint 3 I wanted to develop working prototype of FLOGS and measure it against F1 – F4. As FLOGS is data driven application, the first step is to further develop entity-relationship diagram (first version was developed in sprint 2). The first step was done in the previous sprint and now the initial design will be enhanced.
Entity-relationship diagram (physical)
The model from the previous sprint has been implemented and reviewed. It has been concluded that the model is too complex. There are two reasons for this. (1) Every single DIM table is basically a look up table. This means that instead of putting actual value into the log row:
INSERT INTO LOG_TABLE(value) VALUES('Load data into the table')
link (normally a surrogate key) will be inserted:
INSERT INTO LOG_TABLE(valueLookUpID) VALUES(5)
where 5 is a link to dimensional table under which value 'Load data into the table' is. This means that for every single look up table table there needs to be a task and a variable in SSIS package to get and store the look up id. This would be rather tedious to implement and maintain. (2) As mentioned above, the DIM tables are look up tables. Some of these can be pre filled once (like date dimension), others need to be regularly updated in order to keep relevant look up data. The more look up tables there will be the more difficult it will be to maintain those.
After considering the arguments above and the advantages the dimensional normalisation would have for the solution, the model has been substantially reduced. There will be only two tables – Event (formerly EventHeader) and EventDetail. These will be in one-to-many relationship linked though ExecutionInstanceGUID field. However this link is only logical – there will be no constraint implemented.
Reviewed data model
Detailed data model
Few last words...
Here you go! The final data model! It has been simplified and contains only two tables now. The reason is to simplify implementation and enhance maintainability of the logging platform. Now we need to ask the question how the architecture performs. Is it fast enough? Does it cause the DTSX execution to slow down significantly? The performance testing has been part of the sprint 3 as well and the results will be revealed in the next article.
Previous articles in the series
- Agile SSIS series (1/12): High performance SSIS logging platform
- Agile SSIS series (2/12): Top 4 features that SSIS logging platform must have
- Agile SSIS series (3/12): Why do we REALLY use agile approach?
- Agile SSIS series (4/12): Different ways how to do SSIS logging
- Agile SSIS series (5/12): What is the smartest way to log SSIS execution?
- Agile SSIS series (6/12): Well designed data structures for SSIS logging
Want to know more about Agile BI? Please read my agile BI series...
Business Geek/Data Leader
8 年To remove dimension tables for prototyping is a Pentaho standard practice. By the means of its AgileBI process, very rapid OLAP and ad hoc reporting prototyping is enabled by degenareting all dimensions as attributes to the fact table. So, yeah, I guess your approach is ok - for prototyping. The next step would be to refactor the first model into a dimensional one. (In fact, I teach a course with exactly that premisse: https://www.4shot.com.br/pt-br/curso/agilebi-prototipagem-agil-para-bi-com-pentaho .)