We Were Happy and Didn't Know It

We Were Happy and Didn't Know It

Many years ago, we had a large central system and several satellite systems that faithfully sent their data via extracted files to our Data Warehouses (DWH) at night or several times during the day. This data was processed and organized in complex three-layer Data Warehouses, culminating in an analytical layer where our Business Intelligence (BI) systems did their work.

We had a Mainframe system or an ERP that, through ETL tools like Datastage or Informática, sent data to databases such as DB2, Teradata, or Netiza. This data was rationalized and unified by topic, independent of the source system, using Stored Procedures or the ETL tools themselves. This led us to star or snowflake schemas, where Business Objects, Microstrategy, or Cognos provided us with dynamic reports, fixed paginated reports, and interactive dashboards.

Although these tools were costly, they were integrated: execution control, access control, workflow, security down to row level, an integrated catalog in the data handler, data quality (DQ) mechanisms, and reconciliation, among others. We had everything in just a few systems.

Then Big Data came along, promising cheap storage with acceptable I/O times and data redundancy that guaranteed at least three nines of availability. This meant having hundreds of micro servers in racks to create nodes, ensuring the redundancy needed to handle gigabyte files.

Someone had the idea to emulate a database by transforming files into hyper-indexed tables like Parket and emulating schemas with open-source systems like Hadoop or Impala. Being open systems, there were no licensing costs, just the equipment and labor.

We succumbed to the temptation and abandoned our loyal providers to go all in on Big Data.

Soon, we realized that maintaining all that hardware was madness, and we were offered a journey to the clouds. In the cloud, we no longer had to worry about so much hardware, and we also had additional tools to manage our data in S3 buckets.

However, we still needed something to serve as a database or DWH, so we opted for solutions like Snowflake, Cloudera, or Databricks to emulate our DWH. But we also needed an access manager, and a Data Catalog, to replace our ETL with SparkSQL, rewrite our code, and coordinate our processes.

In summary, everything had to be acquired separately. Our old BI systems also needed to be updated or acquire "Cloud Friendly" versions.

Ten years and millions of dollars later, we almost reached the same level of cutting-edge technology we had before.

Bene Archbold

Data & AI Specialist @ IBM

4 个月

great article and well said!

要查看或添加评论,请登录

Pedro Castellanos的更多文章

  • We need to bring back The Glory of God

    We need to bring back The Glory of God

    Imagine walking through the vast halls of the Metropolitan Museum of Art, surrounded by centuries of human creativity…

  • !éramos Felices y No lo Sabíamos!

    !éramos Felices y No lo Sabíamos!

    Hace muchos a?os, contábamos con un gran sistema central y algunos sistemas satelitales que enviaban sus datos de…

  • Look Mama: No Vlookups!!!

    Look Mama: No Vlookups!!!

    My grandma was a living encyclopedia of popular sayings. She was like a Swiss Army knife, ready to whip out the perfect…

    3 条评论
  • “Excel Made me Do it!” A Corporate America horror Story.

    “Excel Made me Do it!” A Corporate America horror Story.

    Any similarity with real life office situations is marely coincidental. I have always had a saying: "All IT projects in…

    3 条评论

社区洞察

其他会员也浏览了