Coding in the Negative Space

Coding in the Negative Space

In typical software engineering, it has become the de facto approach to build some code, and accompany it with an appropriate test. The test runs at compilation time, and ensures that the code behaves as expected. In my time as a software engineer, I have not seen this paradigm challenged substantially (apart from the obvious chorus of Let's just not write tests!!!).

That was, at least, until I discovered Tiger Style. No, I've not taken up Kung Fu. I was listening to the Software Engineering daily podcast, and the eternally interesting Joran Greef was speaking about his journey with Tiger Beetle, an absurdly fast transaction database. And he mentioned something fascinating that I would like to showcase for you.

Let's begin with a simple code example.

const userInput = '' // Some input from the user.
const letters = userInput.split('')

const invokeService = (letter) => {
    console.log(letter)
}

for(let x = 0; x < letters.length; x++) {
    const letter = letters[x]

    if(letter == 'h' || letter == 'H') {
        invokeService('h')
        invokeService('H')
    }
}        

In this example, we have some simple Javascript. It receives some input from the user, it breaks it up into letters and if a letter is the letter h or H, it invokes some service with both of these values. Testing this would traditionally be done with some external class, something like this:

const testLetterProcessor_normalInput = () => {
    // ... Testing code for normal input
}

const testLetterProcessor_null = () => {
    // ... Testing code for null input
}

const testLetterProcessor_emptyString = () => {
    // ... Testing code for empty string input
}

const testLetterProcessor_longString = () => {
    // ... Testing code for long string input
}        

These tests would be invoked during a CI/CD process and, having served their purpose, would be stripped from the final bundle. This has been the way for as long as I can remember, but what are some of the drawbacks here?

  • Tests are separate from the code that they test, driving drift, and creating two tiers of code quality - production code and testing code.
  • Tests only run at compile / build time, meaning the rigour and safety of those tests stops at runtime. This is better than nothing, but bad code slips through this testing process all the time.
  • Wiring up testing frameworks makes me, and many engineers, consider giving up and becoming a farmer or something. Not only is it serious overhead, most testing frameworks are notoriously reflective and slow.

So how does Tiger Style work?

Tiger Style seeks to embed much of this testing logic into the code itself, using a mainstay of most programming languages, the assert keyword. This keyword tests a given predicate, and if it is false (or indeed falsey) then it triggers an exception. What one does this with this exception is entirely up to you, but this is a very interesting idea.

Let's take the above code and break it down into some simple stages, and think about how we can test the code during runtime. There are some interesting facts about the above code. We are able to compute the upper bound of how many times the service should be invoked. It's possible that a string is inputted with value hhhhhh which would result in two calls per letter, or, in more programmatic terms, len(hhhhhh) * 2. This is exactly where Tiger Style becomes interesting.

const maxIterations = letters.length
const maxServiceCalls = maxIterations * 2

let invocationCount = 0

const invokeService = (letter) => {
    console.log(letter)
}

for(let x = 0; x < maxIterations; x++) {
    const letter = letters[x]

    if(letter == 'h' || letter == 'H') {
        invokeService('h')
        invokeService('H')

        invocationCount += 2
    }
}        

We haven't included our asserts yet, but just to take this in stages, we've added some extra logic. We're counting the number of times the downstream service has been invoked. We also know how many invocations there should be, at max. This enables to make some assertions.

assert(invocationCount <= maxServiceCalls, `Invoked the service more times than expected for the input ${userInput}`)        

This assertions runs every single time the function is invoked. This has some pretty interesting consequences:

  • Your tests are brought into your codebase, minimising the chances of drift.
  • Functionality is literally not possible without the assertions passing successfully. In other words, there's no skipping out on the tests. They're not symbolically part of your production code, they are your production code.
  • Your tests don't run at build time. They run every time. If someone is misusing your system, or causing the code to run in some unintended way, your assertion has a much better chance of detecting it.

Even the error that the code produces is clear:

AssertionError [ERR_ASSERTION]: Invoked the service more times than expected for the input h
    at Object.<anonymous> (/Users/chriscooney/Projects/assert-demo/app.js:25:1)
    at Module._compile (node:internal/modules/cjs/loader:1103:14)
    ...
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='        

Does this have an impact on Observability?

This is where my imagination started to spin a little. This type of functional testing has never been directly in the realm of observability, because it was not something that was checked at runtime. It was checked at build time. Now, there are all sorts of metrics around test failures, but are there decent metrics for misuse?

By introducing the assert keyword, we now have a new stream of telemetry, indicating when the code has functioned in a way in which it was not intended. Rather than tracking a myriad of metrics and computing this in alarms (which works okay!), we can build this logic into the code and our alarms become much simpler! And so, being the nerd I am, I thought I'd see what I could do with this data, in Coralogix.

Visualizing Misuse

My first step was to ingest the data into Coralogix. I did this by deploying a simple open telemetry instance, with a file watcher. I could have transformed the code to log everything in JSON... but where's the fun in that?

It only took a few seconds, and I could see the logs arriving in LiveTail, which gave me a realtime view of my telemetry as it's entering my Coralogix account.

LiveTail in Coralogix provides an instant view of logs, even before indexing.

Now we have our data, our challenge is to parse these logs. They're a little ugly, but luckily Regular expressions are pretty flexible, and the Coralogix platform makes it straight forward to transform these messy logs into something a little prettier.

Coralogix uses a powerful parsing engine to transform messy logs into structured events

Now we know what data we have, it's time to use one of Coralogix's most powerful & unique features - the TCO Optimizer. This will allow me to dictate how valuable this data is and determine if the data should be indexed, or simply pushed to cloud storage in my account.

With one policy, I've just cut down ingestion cost by 70%.

And when we define this policy, the TCO Optimizer instantly reacts, informing me of what I'm saving & how much of my data is being indexed.

No indexing. Living, laughing, loving life. Thriving.

At this stage, we're already in new territory. We have logical errors, caused by poorly configuered code, produced by our application in realtime, and transformed into something that can be analyzed and aggregated. In its current form, for example, we can use Dataprime to build reports from our data, that give us the key insights we need, right in the Coralogix UI:

This is a simple Dataprime query. They get

And if you don't feel like writing any DataPrime, you can always just ask for what you want in plain English, using the Coralogix AI query assistant.

Yes, that's using Coralogix's natural language query!

And of course, we can choose to visualize this data in all sorts of weird and wonderful ways! For example, check out this dashboard:

Coralogix has an awesome custom dashboarding feature too!

But of course, we don't need to stop with Dataprime. Let's run through a few more of the fascinating things we can do with this data, like generating new metrics:

These metrics are also stored in your cloud storage, meaning you never hand your data over to us.

We could define simple alarms, so that we know the second that our code has ran in an unintended way, even if it didn't trigger an error:

This alarm has been configured to immediately trigger if an assertion failure log is found.

Or we can define completely unique alarms, that provide in-stream alerting correlation between multiple services, data types, data sources and more.


This flow alert will draw a line, from detected code misuse, to performance degradation or data exfiltration, and the possibilities are endless!

We can even begin an investigation when our alerts fire, so we can collaborate and get to the bottom of the problem. We can attach alarms, logs, traces, even browser data, to help us to align around the root cause of an issue.

Yes I tagged myself. I didn't have anyone else to talk to :(

All in all, this was fun

Tiger Style has the potential to be remarkable for engineers, because it brings what is typically a build time check and makes it a runtime check. What I haven't yet seen discussed, however, is the potential this has for a whole new school of thought in Observability, around how we track misuse of code in realtime.




要查看或添加评论,请登录

社区洞察

其他会员也浏览了