Buy don't build

One of the principles in enterprise architecture is that we should buy applications rather than building them in-house. In data architecture this is also one of the principles. A data platform consists of many components, from data stores, file stores, data ingestion tools, data catalog, reporting tools, etc. So it is clear that it is not possible build them in-house. We have to buy all of those tools and platform from another company, off-the-shelf so to speak.

But the data lake or data warehouse themselves cannot be bought off-the-shelf. Say you are a fashion retailer or a cake factory. There is no off-the-shelf data lake or data warehouse out there which can provide you with analysis of your fashion sales numbers or cake cost breakdown. Why? Because they do not have your sales data or your production cost data. So your company will have to build it yourself. You need to build data pipeline to bring various data to your data lake/warehouse, and build all the reports/dashboards which support your business analysis on sales and costs. That my friends has been going on for 30 years, and will still happen in the next 30 years.

So it is all very well and good to say that your company, as a principle, do not build systems or applications, but you buy them instead. But in reality, there is one category of system which you can't buy, and you have to build it. And that is data warehouse/BI/data lake. The other category which you can't buy is AI system. Whether it is forecasting your sales, clustering your customers and products, or processing credit applications, any AI system has to be built. You can't buy them off-the-shelf, just like the DW/BI/DL. Of course you can hire a company or a team of contractors to build them for you.

So, the principle of buy don't build is applicable for the data stores, file stores, data ingestion tools, data catalog, reporting tools, etc. But the data warehouse/BI/data lake/AI system themselves will have to be built in-house. You cannot buy them off-the-shelf.

Shaun Ryan

Data Eng??DeltaLake??Databricks??AI & BI?? - Views are mine

1 年

Had to laugh once. Someone said to me "or get the off the shelf enterprise data warehouse SAP"... oh yeah that's what they sell you, then pay the cost of a small country in services and plugins configuring it to your specific requirements. You're trading bespoke nimble with fat, over complicated configuration. Nearly every successfull data stack I've worked on (and we all know most of them historically fail) ditched the "enterprise". Model and build data marts using agile for specific business requirements and conform their dimensions together so you can cross drill when you need to. So many "enterprise" data warehouses cost a small fortune, are modelled on the transactional ops process instead of strategic business processes and decision making; and ultimately are a fat waste of money.

John Kirby

Data Consultant, Advisor, Leader, Mentor, Data Architect, Data Engineer, Community Organiser, Charity Trustee

1 年

From a data warehouse/lake point of view, business rules can be so bespoke, that even off the shelf packages require customisation. This can still be better than a build from scratch... until it gets to the point that the vendor needs you to upgrade. This can be just as costly as a build, depending on the level of customisations.

要查看或添加评论,请登录

Vincent Rainardi的更多文章

  • DQ Engineering

    DQ Engineering

    DQ stands for Data Quality. If you don't have a background in data quality, read this first: https://www.

    6 条评论
  • Data Product

    Data Product

    For those of you who don't know what a data product and “data as a product” are, please read this first:…

    13 条评论
  • Snowflake vs SQL Server

    Snowflake vs SQL Server

    Sometimes we need to remind ourselves that Snowflake is not an OLTP database. I know today is the era of Hybrid tables…

    6 条评论
  • Data engineer becoming solution architect

    Data engineer becoming solution architect

    Are you a data engineer thinking about transitioning to a cloud solution architect? Data engineer are good with…

    2 条评论
  • Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    If you work in banking or investment or any other sector in financial services, you might be wondering about the above.…

  • Data Warehousing Basics: Cost

    Data Warehousing Basics: Cost

    If you call yourself a data engineer you need to be aware of 2 additional things compared to a developer. The first one…

    2 条评论
  • My Linkedin post & articles

    My Linkedin post & articles

    The list below goes back to Nov 2024. For older than that see here.

    9 条评论
  • Data Warehousing Basics: Single Customer View

    Data Warehousing Basics: Single Customer View

    Imagine that you work for an insurance company who sell health insurance (HI), life insurance (LI), general insurance…

    2 条评论
  • Data Warehousing Basics: NFR

    Data Warehousing Basics: NFR

    What I’m about to tell you today failed a lot of data warehousing projects which is why it’s worth paying attention so…

    1 条评论
  • ML and AI - What's the difference?

    ML and AI - What's the difference?

    Machine Learning covers about 20-30 algorithms such as Logistic Regression, Decision Tree, Gradient Boosting, Random…

    5 条评论

社区洞察

其他会员也浏览了