Capability vs. Features
ChatGPT, like everybody else.

Capability vs. Features

As many of you know, I've been critical of the fact that a high percentage of data projects seem to fail, which was one of the motivating factors for me developing the Hook approach. However, his post isn't about Hook, but rather why these kinds of projects fail. I've worked in this industry for a loooooong time now, and when I first started back in the early 90s, projects were run very differently. My first job was as a trainee COBOL programmer. The company I worked for had a very simple development process. At any given time there were four major software realeases in flight at any point in time. The development lifecycle consisted of four blocks, each seven weeks long (sprints). Each block represented Analysis (requirement gathering), Design, Build and Test. It was not exactly agile, but it worked, and I don't recall any significant failures.

If this approach was successful why don't we use this waterfall style process any more? We want agile, right? We want fast turnaround and continual delivery. But in my experience, this never seems to happen. Even though modern projects are run in an agile manner, the rate of delivery never seems much better than if we had employed more of a waterfall approach. Why is this? I think I know the answer, but I'm not sure how we can work around it. For me, there are two aspects to the delivery of a data solution. Firstly, there is something I call 'capability' and secondly something we might refer to as 'features'.

Capability is the functionality of the data platform that facilitates the delivery of features; it's the standards, governance processes, metadata management, metadata design and automation tooling that help us to deliver rapidly and consistently. Features are the tables, reports and dashboards we provide to the business for analytical and decision-making purposes. In other words, the things that the business is really interested in and, more importantly, what they are paying for. Typically, the business doesn't care how you deliver these features. The problem is that to deliver features efficiently, we need a well-developed capability. So who pays for that?

I see four potential approaches to developing these two essential data platform components.

Features Only


What if we dispense entirely with capability? We dive in and start building stuff. Well, this happens more frequently than I would care to admit. Essentially, we go into Proof of Concept (PoC) mode, smash out some reports and deliver them to the business in record time. Unfortunately, that sets a very high bar. The first iteration is fast, but what about the second or third? The technical debt baked into the solution in the first iteration starts to rear its ugly head, and before very long, the solution becomes unmaintainable.

Capability --> Features

It would make more logical sense to focus attention initially on building out the capability before attempting to tackle any feature development. I've only ever seen this done once in the real world. That project devoted an entire year to building out foundations upon which to build the reports that the business wanted. It is rare to spend time and thought properly designing a solution that will enable us to get features out the door as quickly as possible. It was a well-run project, but delivery wasn't as prolific as it should have been.

While time is spent building capability, no features are being developed. It may take many months, a year or more, to properly design, develop and test a level of capability that will suit our needs. Are businesses willing to wait that long, and are they willing to pay for it? After all, their immediate need is the features they have been promised.

Even if we do spend the time designing and building capability, how do we know we got it right? We might find very quickly that we have data sources that don't fit our beautifully designed solution, leading to extensive changes and new functionality to be added, which ultimately impacts our ability to deliver features.

Capability and Features In Parallel

How about developing the capability and features in parallel? This seems to be the most common approach, and I'm not sure why. With this approach, we have two streams of work, one for capability and one for features. Do you think this would work any better? Yes, we can get some features out quickly, but is that rate of delivery sustainable? Although we have two parallel work streams, they depend on one another. There will be an impact in both directions. Change to capability may require features to be re-engineered. Similarly, new features might require new functionality to be developed. What tends to happen is that development resources are directed more towards the development of features, leading to ever-growing levels of technical debt and a gradual stalling of delivery pace. We've all seen it.

Front Loading Capability

There is a fourth option, which I haven't seen in practice. It is essentially a hybrid of the approaches already discussed. At the start of the programme of work, the majority of effort is spent on the design and build of capability. Some features can be developed, but they take a lower priority. Changes to capability that impact features must be resolved immediately to prevent the build-up of technical debt. More time can be devoted to feature development as the project proceeds and the capability stabilises. However, it must be recognised that the "capability is king", so if changes are required, they must take priority along with any refactoring effort to existing features. Even though the rate of feature delivery is initially slow, at least there is something coming out, unlike the first approach. Delivery velocity should increase over time.

This hybrid approach does require a different mindset. I would describe it as a defensive approach. In other words, as we design the capability, we assume that we will have to change it at some point, so we create it in such a way that makes it easy to enhance. There is often a temptation to design to the current requirements and this invariably leads to technical debt. Unfortunately, technical debt is inevitable, so we must develop solutions that allow us to tackle it with minimum effort.


So, what thoughts do you have? What approaches have you used to shift the dial away from the woeful, and widely quoted, 75% failure rate for data projects? Is investing in capability worthwhile, and if so, when should we do it?

Front Loading Capability aligns with how I think about having a "product mindset". thank you for this writeup!

回复
Nick Pinfold

Principal Data Analyst at Wellington City Council

1 周

The reality is that requirement gathering is slow, initial sev quick but all the little changes adding to longer times. Automation can help and ai is improving things like comments. We have some users who just want their source data in the data platform so they don't have to load and report. Using fivetran we can easily source their tables, automated views in db replicate views and schemas in dev, uat and prd over tables in raw database. Automated script generation for staging and a dataset layer. Once we can access the data users can view in a few days from fivetran. That is the start for the business and the slow requirements stage BUT users have some data and value quickly. For s3 we use configurations to but all the code for stage, pipe, stream, task,proc so everything if built off the configuration. When we had duplicates in files we could implement a fix and roll out to all sources.

回复

要查看或添加评论,请登录

Andrew Foad的更多文章

  • Is Data Modelling Dead?

    Is Data Modelling Dead?

    I’ve been using data modelling techniques for most of my 35-year career, and I consider it an essential skill for…

    48 条评论
  • Data Vault is Garbage (change my mind)

    Data Vault is Garbage (change my mind)

    If you are unfamiliar with the Hans Christian Andersen story of “The Emperor’s New Clothes”, ChatGPT summarises it…

    7 条评论
  • Introducing HOOK. Part 1 - Logical Architecture

    Introducing HOOK. Part 1 - Logical Architecture

    This article is the first in a short series on the Hook approach to Data Warehousing. As I said in my introduction…

  • Is “Agile Data” an Oxymoron?

    Is “Agile Data” an Oxymoron?

    In last week’s article, I discussed how you can “boil the ocean” when building a HOOK data warehouse. The article…

    11 条评论
  • Always try to boil the ocean!

    Always try to boil the ocean!

    Recently, I have felt very left out by all these Snowflake folk with their superhero avatars. As I don’t work for…

    16 条评论
  • HOOK and the Unified Star Schema

    HOOK and the Unified Star Schema

    First, we should probably understand what a Unified Start Schema is before we start talking about how we can use it in…

    6 条评论
  • HOOK vs Data Vault: Willibald Part 6

    HOOK vs Data Vault: Willibald Part 6

    As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

  • HOOK vs Data Vault: Willibald Part 5

    HOOK vs Data Vault: Willibald Part 5

    As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

    10 条评论
  • The Attempted Assassination of ELM

    The Attempted Assassination of ELM

    Around the middle of the 19th Century, the students of a public boy’s school in the English Midlands invented a game. A…

    13 条评论
  • HOOK vs Data Vault: Willibald Part 4

    HOOK vs Data Vault: Willibald Part 4

    As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

    5 条评论

社区洞察