Agile SSIS series (7/12): Prototyping the agile way!

Agile SSIS series (7/12): Prototyping the agile way!

In sprint 3 the development of FLOGS started. In sprint 3 I wanted to develop working prototype of FLOGS and measure it against F1 – F4. As FLOGS is data driven application, the first step is to further develop entity-relationship diagram (first version was developed in sprint 2). The first step was done in the previous sprint and now the initial design will be enhanced.

Entity-relationship diagram (physical)

The model from the previous sprint has been implemented and reviewed. It has been concluded that the model is too complex. There are two reasons for this. (1) Every single DIM table is basically a look up table. This means that instead of putting actual value into the log row:

INSERT INTO LOG_TABLE(value) VALUES('Load data into the table')

link (normally a surrogate key) will be inserted:

INSERT INTO LOG_TABLE(valueLookUpID) VALUES(5)

where 5 is a link to dimensional table under which value 'Load data into the table' is. This means that for every single look up table table there needs to be a task and a variable in SSIS package to get and store the look up id. This would be rather tedious to implement and maintain. (2) As mentioned above, the DIM tables are look up tables. Some of these can be pre filled once (like date dimension), others need to be regularly updated in order to keep relevant look up data. The more look up tables there will be the more difficult it will be to maintain those.

After considering the arguments above and the advantages the dimensional normalisation would have for the solution, the model has been substantially reduced. There will be only two tables – Event (formerly EventHeader) and EventDetail. These will be in one-to-many relationship linked though ExecutionInstanceGUID field. However this link is only logical – there will be no constraint implemented.

Reviewed data model

Detailed data model

Few last words...

Here you go! The final data model! It has been simplified and contains only two tables now. The reason is to simplify implementation and enhance maintainability of the logging platform. Now we need to ask the question how the architecture performs. Is it fast enough? Does it cause the DTSX execution to slow down significantly? The performance testing has been part of the sprint 3 as well and the results will be revealed in the next article.

Previous articles in the series

Want to know more about Agile BI? Please read my agile BI series...

Fábio de Salles

Business Geek/Data Leader

8 年

To remove dimension tables for prototyping is a Pentaho standard practice. By the means of its AgileBI process, very rapid OLAP and ad hoc reporting prototyping is enabled by degenareting all dimensions as attributes to the fact table. So, yeah, I guess your approach is ok - for prototyping. The next step would be to refactor the first model into a dimensional one. (In fact, I teach a course with exactly that premisse: https://www.4shot.com.br/pt-br/curso/agilebi-prototipagem-agil-para-bi-com-pentaho .)

要查看或添加评论,请登录

Petr Podrouzek的更多文章

  • Innovating with Emplifi Unified Analytics ??

    Innovating with Emplifi Unified Analytics ??

    Last week marked a significant milestone for Emplifi with the launch of Unified Analytics ??. This feature symbolizes…

    3 条评论
  • My goals for 2022

    My goals for 2022

    My goal for January was to come up with what I would like to achieve in 2022 professionally. I asked myself how can I…

    2 条评论
  • Top 3 things I did in 2021

    Top 3 things I did in 2021

    I truly believe it is important to reflect on past experiences and learnings. Sometimes we are so preoccupied with the…

    3 条评论
  • Data warehouse release nightmares (2/2)

    Data warehouse release nightmares (2/2)

    In the previous text, I have discussed the issues I have encountered when releasing DWH based on my 10 years of…

  • Data warehouse release nightmares (1/2)

    Data warehouse release nightmares (1/2)

    I have been a BI/DWH developer for nearly 10 years now and most of the projects I have worked on had one particular…

    1 条评论
  • What value can MDM bring to organisations and at what cost? (3/3)

    What value can MDM bring to organisations and at what cost? (3/3)

    In the previous articles, I have discussed the benefits of implementing MDM but also the costs it can bring. Now let's…

  • What value can MDM bring to organisations and at what cost? (2/3)

    What value can MDM bring to organisations and at what cost? (2/3)

    In the previous article, I discussed the benefits of implementing MDM. As with any technology, it does come with a cost…

  • What value can MDM bring to organisations and at what cost? (1/3)

    What value can MDM bring to organisations and at what cost? (1/3)

    There are many applications supporting various processes in organizations - there is not a single process that would…

    3 条评论
  • Snowflakes and why not to use them (4/4): Conclusion

    Snowflakes and why not to use them (4/4): Conclusion

    In the previous articles, I have introduced simple snowflake utilizing SCD2 model. I have also build a simple star…

    1 条评论
  • Snowflakes and why not to use them (3/4): The problem

    Snowflakes and why not to use them (3/4): The problem

    As stated in the previous article, snowflakes can be more efficient in using storage compared to stars. And believe me,…

    2 条评论

社区洞察

其他会员也浏览了