Architecting a Queuing Solution With Claude Sonnet 3.5

Martin Bechard

发布日期: 2024年12月14日

I had a little problem which, upon reflection, turned into a big problem with lots of structure. To get there, I used my faithful partner in crime, Claude-Bob the AI Code builder. It turns out that AI is particularly good at pulling you into a rabbit hole, where questions answer questions and the sun has time to set without you noticing it. You may also lose some weight.

But - can it actually be useful? Or is it just a sophisticated BS document generator degenerating into nonsensical spaghetti bromides?

After several months of ramping up my AI Coding through progressively more complex projects, I thought this was an interesting opportunity to see if AI can be made to work at a higher level than individual code units, or even refactoring existing code across several files, and truly help create a technical plan, the so-called "Software Architecture" that is practical enough to implement, or better yet, feed it back to get the code out.

However, I've found that going from a plan to the code has its own set of challenges when I create the plan myself, so that's going to be a topic of its own for a later article. This time, I'll be settled if I end up with something I find helpful and compelling that I could use to create the design functionality if I were to code it myself.

The Problem

Imagine developing a SaaS app that incorporates Generative AI, but building it on a serverless stack. After initial experimentation, you start crafting prompts of increasing complexity and length, and chaining them for better accuracy and richer content. Inevitably, you will hit the serverless platform's hard timeout limit on how long you can run your function.

The limits imposed by serverless platforms are ok for more traditional back-end processing such as SQL transactions. But when it comes to AI applications, there is a need to run multiple inference API requests which don't always complete in the same amount of time and not surprisingly can exceed the platform CPU limits. Most of the time spent is just waiting for a response from the LLM so no actual CPU is being used but you get timed out just the same. Or if you're going native cloud you're going to be charged for that time just the same, which is a bummer if you like to make money.

"In a goldrush, sell shovels!" -- Mark Twain

Well, I couldn't find the kind of shovel I needed at the General Store, so after a drink too many down at the saloon, a brilliant idea came to me: why don't I build my own?? And of course, do it with AI!

Concepts of a Big Picture

So I punted the issue to Claude with a pithy but potent description:

Within this little box I found this diagram:

This came with some explanatory text:

Architecture Overview: Frontend submits jobs through an API endpoint Jobs are broken into smaller work items and queued Serverless workers process items within timeout limits Results are stored in a persistent store Status can be polled through API endpoints
Key Components: QueueService: Core service managing queue operations Worker Lambda: Processes queue items until near timeout API Handlers: Endpoints for job submission and status checking DynamoDB Tables: Store job metadata and results SQS Queue: Manage work items
Important Features: Chunking of large jobs into smaller work items Automatic retry with exponential backoff Progress tracking and status updates Configurable timeouts and chunk sizes Proper error handling and status reporting
Scaling Considerations: Multiple workers can process items concurrently SQS handles message distribution and deduplication DynamoDB provides scalable storage Workers respect Lambda timeout limits
Best Practices: Initialization outside handlers for connection reuse Strong typing with Zod validation Proper error handling and logging Status tracking and progress updates Configurable parameters for flexibility

And it goes on generating sample code.

Quite a firehose from one prompt! This output does resemble an architecture and does itemize and flesh out something that looks like a solution to what I asked, though it seems more complicated than I expected. The game was afoot!

A Twist

Nevertheless, I take the win and throw a bit more challenge at it. I like to work on Netlify or Vercel rather than directly in AWS as much as possible. This is because I like not to waste my time. The automation and tooling provided with these platforms is amazing, for the low low price of 0$ when you're still at a prototyping stage. So even though I have to code directly to a lambda to interface with the AWS queuing, I would like this framework to be "deploy and forget", and once the queuing handler is deployed, the rest of the development would happen on these other platforms where the CI/CD and DX are significantly better - sorry Jeff! Vercel even has v0, the original UI-generating AI Coding platform.

In this new box, a new diagram, this one with some colors:

This solution does indeed reflect the big picture I had painted.

Iterating the Details

Full disclosure: I have a bit of imposter syndrome as I am far from a certified AWS architect. Sure, I've fooled around with the API gateway and lambda functions like your average cloud-based dev these days, but having me drive the AI here is a bit of the one-eyed man in the land of the blind AI. Well more like two one-eyed men, considering we both have smatterings of AWS knowledge. Or one-eyed creatures since an AI is not a man. AWS cyclops if you will. Mr. Magoos of the AWS cloud. But I digress.

Showing my superficial undestanding, I ask it if it could just call one side directly from the other and avoid paying API gateway charges if I can avoid it. And it also looks like the application part that will be called back to call the LLM API is not going to be in my serverless platform of choice but instead directly in a lambda.

Ever the agreeable AI Designer, Bob-Claude adjusts the high-level vision of the system and provides some sample code on how to do this:

Sample Direct AWS Access:

2. Callback Mechanism:

But what's this? Netlify Callback URL? This is actually not what I asked with my last requirement. Easy to miss if you read quickly, which I did on both counts. More on this shortly, and I decided to plow along regardless.

Putting it together

Now we're ready to try take diagrams, snippets, and bullets, and make something of a good read out of it. Besides the entertainment value, it will help me keep the AI on the tight-and-narrow path and actually delvier what I need.

Here's what it came up:

Flow diagrams

Pretty, pretty good! I especially like the mention of the AWS permissions which are truly from the nether regions of computing. 'Nuff said.

The diagrams are simplistic, but they give a general idea of how things are supposed to be put together. In fact, this is very important for AI Coding because what AI (Apparent Intelligence) lacks most is intentionality. So although the details are slim, it's the "big picture" that our AI needs to be given in order to avoid going off the hallucinatory rails.

Module Breakdown

The document had identified a structure through the identification of software components, but a component might be big enough that it needs to be implemented as multiple files. Or it could be small enough that it's not worth putting it on its own.

A key task then when developing with AI is to have it itemize the files to create, then iterate to get a sensible breakdown. We especially have to emphasize creating lots of small files over creating few large files, because there are limits to the output which can interrupt the generation process. Also when doing updates, the process still has to generate everything that didn't change so by creating separate units we reduce the amount of unnecessary regeneration.

This makes full updates to the file cheaper in terms of tokens. This is probably a good practice for humans too although it might make it more complex to keep track of what's in what. But considering LLMs are kind of like savants, they can easily manage that. Claude Sonnet 3.5 is particularly good at avoiding the "needle in a haystack" problem, whereby an LLM focuses on certain parts of the information and ignores the rest e.g. the start and end of a large document. When loading the content in the Project Knowledge or attachements, I haven't experienced obvious blind spots in Claude's ingestion of the files.

To ensure the AI still has the full and correct context, I use my trusty Project Packager utility to glom up all the individual files and provide the full code in the Project Knowledge or as an attachment. Alternatively individual files can be provided if the project gets too big to be completely uploaded.

Claude is good at playing Where's Waldo with project files

This is easily achieved with a simple prompt:

Here's the proposed list of framework files to create:

Types and Interfaces: /src/types/QueueTypes.ts - Core type definitions for jobs, work items and results /src/interfaces/IQueueService.ts - Queue service interface /src/interfaces/IStateManager.ts - State management interface
Core Services: /src/services/QueueService.ts - Base queue service implementation /src/services/StateManager.ts - Base state management service /src/services/WorkerService.ts - Worker coordination service /src/services/ResultsHandler.ts - Results processing service
Platform Adapters: /src/adapters/AWSAdapter.ts - AWS native implementations /src/adapters/NetlifyAdapter.ts - Netlify-specific implementations /src/adapters/VercelAdapter.ts - Vercel-specific implementations
Utilities: /src/utils/RetryUtils.ts - Retry and backoff implementations /src/utils/ValidationUtils.ts - Common validation functions /src/utils/ConfigUtils.ts - Configuration management
Configuration: /src/config/QueueConfig.ts - Queue configuration types and validation /src/config/WorkerConfig.ts - Worker configuration types and validation
Error Handling: /src/errors/QueueErrors.ts - Custom error types /src/errors/ErrorHandler.ts - Error handling utilities

Would you like me to proceed with implementing any of these files?

This is a good high-level list, now we drill down. We ask for the files' purpose - in fact the LLM puts them there because they look like the kind of files you would expect, and asking for explanations after the fact is a bit like asking a poet what is meant in hard-to-understand verses.

The honest answer is probably: "I don't really know, but I can see it subconsciously made sense". In fact LLMs are great at coming up with explanations, like a television script writer is good at ret-conning backstories for characters that were originally called "clerk 1", "french-fries attendant", "angry man", "man in black" etc. and fleshing out lots of details that were "clearly" intended - after the fact.

So while the scriptwriter and the LLM are both lying when they give you an "explanation", there is still value in it. In fact, under the guise of explaining its previous action, we're really helping to narrow dow additional details that are plausible because they could have been the reason for the previous answers.

LLMs are story-oriented, maybe because of the sequential nature of its inference process? So we get better results eliciting a story than asking for a cartesian cut-and-dry reasoning. In fact LLMs are incapable of reasoning, they only appear to. In the same way that a poem or song may appear to have an underlying wisdom, you may also get to the end and think, "what was that??".

But it's this LLM propensity to generate meaning out of a stream-of-consciousness flow that we're tapping here and why, even though we know it only created the design through a stochastic process, we can use it to discover the underlying structure and create more levels of detail that it will pretend were there all along, like the ending of Lost.

When we review the document, we can find things that don't make sense. There's no point in expecting perfection, no more than when performing a Google search expecting every result to actually be pertinent to your query - noise gets in, which you can now cleanup before it gets expanded out into code that represents many more tokens. It's truer than ever that catching something in the Design stage is ten times cheaper than catching it at the Coding stage.

When Use Cases Come Last

So now that we have code, we can generate the use cases!

If this seems odd, remember that LLMs don't proceed in rational sequences of though - they work on the principle of winnowing data, channeling patterns.

The use cases we generate now are elaborations of the initial requirements, but they are being generated constrained by the other artifacts, kind of like the midpoint between the departure and the arrival.

In effect, there are already use cases baked into what was already generated, so the ones we are now generating act as a kind of cross-referencing of what was generated versus what was asked.

Why do we need them if we already have a design?

They will help ensure we have complete coverage of what we already intended
They will help reveal some unspoken assumptions that are training patterns clashing with our intent
They save time in refining the functionality further while keeping it from straying

Part of my motivation is that some of the elements in the design so far seem a bit out of left field - automatic "improvements" LLMs are wont to add unbidden - and unwanted! So I make this one of its goals.

The 5 WHYs method is from the Toyota/lean school which I find LLMs are pretty at. In particular I incoporated this in the troubleshooting procedure used for challenging bugs.

I find expliciting the justifications helps me to go through the proposed organization and see if it makes sense or if the justification is tenuous. I'm hoping it will "help" the LLM too! And I've noticed it actually does, especially when applying this to investigation of technical problems.

Now you might find that the style of this document is pretty abrupt. Sometimes LLMs take directions a bit too literally i.e. I did expect more than just the 5 WHYs reasoning, which give a skeletal appearance to this design, leaving an impression that these are notes more than something coherent.

We can also see some of the unspoken design underpinnings that have come unbidden or perhaps through misunderstanding or bad explanations and are pushing the design the wrong way. This is the time to push back and change the shape with a lot less tokens than will be needed once the code is generated.

My next prompt provides the feedback to "steer it straight":

Buzzword Police

Sometimes LLMs like to throw in cool words because they are abundant in their training examples. This is not unlike certain resume artists peppering their bios with unverifiable details to get them through the AI screening. However, here there are real consequences as unchecked this will lead to an unnecessary and inextricable mess of nonsensical code that gives AI a bad name.

One thing that slipped into the design is "Vercel Edge functions". This is a special feature of the Vercel platform that allows you to distribute functions on their Content Delivery Network (CDN) to execute close to the end users' geographical location. However, the intent of the Edge network is to allow distributed execution of simple things such as authentication code that don't need centralized processing like a central database. Therefore this is irrelevant to having a queue so I need to get it removed.

Another thing the LLM has fixated on is the idea of queueing up processing large datasets by chunking up the records and dispatching them to be processed in consecutive jobs. While interesting, this is not a main use case of mine - I must have accidentally mentioned something to the effect of having a lot of data to process which would explain the lengthy execution, and now, like a weed, the unnecessary idea is growing and left unchecked would doubtlessly cause problems. If you don't remove the weeds as soon as you see them, your planet becomes overgrown with Baobabs and a complete mess.

Baobabs left unchecked will ruin your day

Therefore I instruct Claude to do away with these extravagances.

The high-level description is pretty good as an organization description for the whole thing, however I don't see much about security so I get it to elaborate, since this is a particular pain point with AWS, and I saw some vague things mentioned that might turn into something big.

Planning for iterations

Having a big document with a lot of details is not conducive to having an LLM produce the described software. This usually leads to a runaway LLM with endless cycling if left alone. The problem is that the LLM unguided will start randomly straying from the endgoal and go down rabbit holes of unecessary "improvements", in the worst of cases rendering the code unusable and unfixable.

What we need to do is have it evolve the solution from a small kernel to the full thing, adding layers and features step-by-step. The LLM is pretty good a coming up with this type of progressive plan given proper guidance.

I decide that we will start by building a simplified desktop version that I can use for developing the actual application. Then we'll build out the ASW infrastructure, and to keep things simple, a very simplified application also on AWS. Then we'll build the Serverless platform support, one at a time, with sample apps on each platform.

We' use automated tests along the way to verify that the code is always capable of running though incomplete. This also is important to identify when the LLM goes off the rails and starts corrupting the code with illogical changes. Luminaries like Kent Beck have promoted the idea of Test Driven Development for human programmers, but it turns out to be crucial for LLM-driven coding because when LLMs decide to "improve" designs by removing parts, the tests keep them honest.

This is also where we can have the Use Cases intervene to shape the testing.

Future Self

Like a victim of Korsakoff syndrome, LLMs don't remember anything from one chat to the next, but unfortunately the longer the chat, the more tokens accumulate in the context and the costlier each message becomes. In fact, with Claude, if you don't voluntarily stop a chat and start a new one, you will eventually become unable to make any further requests as all context window tokens will have been used up.

Because of this fact of life, I have taken to asking Claude to introspect about the conversation to add details into the document, so that I can start the next chat by having the LLM read the details from the document and be ready to go.

Claude decides to call this a "brief" like something that would handed off to the President, which I find a little pompous, but it reflects the perceived importance of the information to be produced. I chose to say it was for an unspecified AI Coding assistant to avoid an identity crisis and GPU meltdown - too bad it will never know it is the AI assistant!

领英推荐

? Study on operator bugs, 100 million images for just…

Learnk8s 2 周前

Opensource AI-agent-as-a-Service

Unwind AI 2 个月前

Observability Redefined: A New Era in Data and DevOps

Apica 1 年前

Checking it twice

One thing LLMs are good at is cross-referencing one file's content against another. I use this to get it to go through the brief and see if there's anything missing. This is kind of amusing - an AI playing the role of an AI!

Questions per section:

Context and Purpose

What constitutes a "long-running job"? Is it defined solely by duration (>15 minutes) or are there other characteristics?
Should the framework support job cancellation?

Core Design Principles 3. How should data references (IDs) be structured? Are there format requirements or just strings? 4. What's the expected strategy for workers to fetch data using references? Should this be abstracted by the framework?

Technical Constraints 5. For Netlify/Vercel HTTP APIs, are there any retry/throttling requirements for the HTTP calls? 6. What's the expected behavior when approaching the timeout limit? Should work be checkpointed?

Implementation Requirements 7. For progress updates, what granularity is required? Is it percentage-based only? 8. What constitutes "handle failures gracefully"? Need specific error handling requirements 9. What's the maximum expected payload size for results?

Core Data Structures 10. What are all possible values for JobStatus? 11. Should WorkItem include any additional metadata for tracking/debugging?

Platform-Specific Notes 12. For Netlify/Vercel, what's the expected endpoint URL structure? 13. Are there specific AWS service regions to use?

Security Implementation 14. What's the expected token expiry time for time-based tokens? 15. For key rotation, what's the overlap period for old/new keys?

Implementation Approach 16. What level of emulator fidelity is required for local testing? 17. Are there specific AWS service mocking requirements?

Critical Success Factors 18. What metrics define "reliability"? 19. What's the expected logging format/structure?

Resource Requirements 20. Are there specific AWS service tier requirements (SQS standard/FIFO, DynamoDB provisioned/on-demand)? 21. Any specific Node.js version requirements?

Implementation Process 22. What defines "runnable code" at each step? Minimum test coverage?

Testing Requirements 23. Are there performance benchmarks that tests should verify? 24. What's the expected test framework (Jest, Mocha, etc.)?

Known Limitations 25. Should the framework implement any specific cost-control measures? 26. Are there specific rate-limiting requirements for API calls?

Additional General Questions: 27. Should the framework handle job prioritization? 28. Are there specific monitoring/alerting requirements? 29. Should the framework support different environments (dev/staging/prod)? 30. What's the expected deployment strategy for the framework itself?

I've identified these questions that I believe need clarification before starting implementation. Would you like me to:

Prioritize these questions?
Group them by implementation phase?
Add more specific technical details to any question?

Please let me know if you'd like me to continue with any of these approaches or if you want to address the questions first.

I am a bit surprised and a bit disappointed that there are so many questions to answer, but at least it's a sign it's not a slouch. To be fair, if you ask an LLM if it forgot something, it interprets it as: "Can you make up something that resembles something you might have forgotten", so I have to take it with a grain of salt. But for the purposes of this exercise I play along and resolve to skip/be brief on questions that may seem irrelevant.

I can ask it to go though the questions one by one, which is a very powerful and universal trick to gather information from users. It's easy to let the LLM take me through the list, and for each point I can do various adjustments and ask questions, then pick up with the next item whenever I feel ready. This "interview" format is uncannily humanlike and something very useful to bear in mind when designing AI workflows/AI prompts.

Because I don't like repeating myself, I also added that maybe I already answered it earlier in the conversation, and Claude is about to go back and extract that information too, which reduces the number of questions I actually end up having to answer. Now you might expect that it would have done that when it generated the original document, and to some extent it did. But this is an opportunity to do a second pass and infer more information/complement it.

So for this first question, it does find a lot things already mentioned, but I can give it additional information if I find it relevant. I see that the misguided idea of automatically splitting up datasets into separate jobs is still taking root so I tell it not to!

I tell it to "update your answer" but in fact this is just a kind of "virtual" update, no document is immediately modified. But having the data in the chat will allow us later to gather everything and generate a complete document.

The next question about cancellation is insightful because it is indeed something that could be important to support and requires some forethought.

I can provide some additional business rules that it "records".

The next one is pretty details - really implementation details that I don't really care about.

Basically Claude was overcomplicating, so I tell it to fuggetaboutit and move on.

The next question is answers itself - I can still verify though which is good.

Specific call performance isn't a big concern especially up front - this is more something we can ease into once we are looking at the real cloud-based implementations.

Some of the next topics seem a bit overblown for what I want to do in the early stages, but eventually if a lot of users are on the platform concurrency is going to be important. I'll want to tackle them when working on the platform-specific code.

Error handling is important, otherwise a temporary failure of a platform can lead to jobs clogging up the queue, or jobs being lost. Some platform details are unknown to me but from a big picture point of view, I think we can box them in and plan to do further elaboration during the detailed design for that part. Good questions nevertheless!

In the initial design, there was stuff about partitioning and managing large datasets. Though I tried to dispel it, the notion is lurking in the shadows, ready to push unnecessary code, in this instance, unnecessary progress reporting logic. I can shut it down before we go too far.

And so on, and so forth...

Brief wrap-up

Eventually we go through the last question. If I had to go back and pick up all of the little adjustments identified for each, and edit my original document, I would probably spend another 20 - 30 minutes. But now with the magic of Generative AI I just ask it to go through all of what was said and do the edits. By telling it to "regenerate" the brief, it creates a complete document incorporating the information, in a matter of seconds! This is where we see the power of this AI - Apparent Intelligence it may be, but useful nevertheless.

To close the loop, I asked it to do another pass of review. Hopefully there will be few additions to add - although there is a risk that its natural penchant for embroidering answers could add infinite levels of details...

Thankfully we seem to be converging towards a completed document with just a few more questions - which are in fact relevant. I provide some additional thoughts - full disclosure, I had NOT previously thought about it, so the AI is doing its job of helping me come up with a more complete solution. Brainstorming is after all one of the big Gen AI use cases.

This time for sure?

Now I ask it the same question to see if we finally have eliminated ALL ambiguities. Of course it's an impossible goal, but I'll settle for the AI not complaining.

OK this time it's just nit-picking so I close the discussion with a few peremtory answers and have it move on.

In case you're curious, click here to see the full details of the implementation brief.

El plan

Now we go back to the plan that was at the source of these questions and update it with the brief!

A lot of documents, you may say, but they serve their purpose:

The brief summarizes different aspects
The plan is a step-by-step way to put things in place

And as always it's better to have multiple small, independently ingestible units of text than a single long document.

There followed another iteration with more details revealed, which job my own thinking. I have a few notes to improve it.

Unfortunately it decides it is still not ready to generate the plan, and asks me to confirm the plan changes, some of which are shown below:

So no plan but more questions:

Glad to see that the last point is what I actually asked it - to regenerate the plan!

Time to get on with it:

The final plan - was it worth all of the prompting?

This just a portion of the plan, and I can see it's starting to shape up. Like a consultant that borrows your watch to tell you the time and bills you for $200, the LLM is in fact helping me organize my development thoughts and pushing me on questions that I might otherwise overlook, bring to the table mainstream information about it. This is actually pretty valuable, though it was a multi-step process. However, I think I'll be able to use PromptScript to automate it - an experiment for another day.

Perplexity to the rescue:

Now I have a nagging suspicion that having an AWS lambda function blocked on an API call is going to be a problem. If you are a regular reader of this newsletter, you'll know that Perplexity is my goto search engine for research and I recently built a Perplexity Search extension for Claude discussed in a previous article, using the MCP protocol released by Anthropic.

I use Perplexity because of it's quality search results without ads and dummy entries (we're looking at you Google!), as well as its integration with none other than Claude Sonnet 3.5 to summarize the pertinent information and structure the answer accordingly to potentially complex prompts. It also can orchestrate multiple chained search steps, in agentic fashion to provide a truly complete answer. It can even generate code!

reate a function that posts to an AWS queue with the information needed to call OpenAI completion, and a lambda to receive it

(That shouldn't be so surprising as most chatbots are now powered by LLMs, here's a little test I did with Anthropic's help bot:

but I digress!)

As of custom then, I turn to Perplexity and obtain the following information and provide it to Claude to expand the design:

On AWS, If I have a lambda function that makes an API call to an LLM, do I get charged for CPU during the time that the function is just blocked waiting for a response?

Yes, you would be charged for the entire duration that your Lambda function is running, including the time it spends waiting for a response from an external API call[1][4].

AWS Lambda charges based on two main factors:

1. Number of requests

2. Duration of execution

The duration is calculated from the time your code begins executing until it returns or terminates, rounded up to the nearest 1 millisecond[3]. This includes any time spent waiting for external API calls or other operations to complete.

Even if your function is in a waiting state and not actively using CPU cycles while blocked on an API call, AWS still considers this part of the function's execution time[4]. The reasoning behind this is that AWS has allocated resources to your function for the entire duration it's running, regardless of whether it's actively computing or waiting for an external response.

To optimize costs in scenarios like this, you could consider the following approaches:

1. Use asynchronous programming techniques to handle multiple API calls concurrently, reducing overall execution time.

2. Split your function into multiple smaller functions, with one function making the API call and another processing the response.

3. Utilize Step Functions for orchestrating complex workflows that involve waiting for external processes.

4. Optimize your code to minimize execution time before and after the API call.

Remember that Lambda's pricing model is designed to be cost-effective for short-running, event-driven functions. For long-running processes or those with significant wait times, other AWS services might be more suitable and cost-effective.



Can a step function be invoked by SQS instead of a lambda? How does the cost of running a step function compare with the cost of a lambda? How is the step function billed while waiting for an API call response? Do I need to use a specific API library to make such a call?

Yes, a Step Function can be invoked by Amazon SQS (Simple Queue Service) instead of a Lambda function. This integration allows you to trigger Step Functions workflows based on messages in an SQS queue. Here's a comparison of Step Functions and Lambda, along with information on billing and API calls:

## Step Functions Invocation by SQS

You can configure an Amazon EventBridge rule to listen for messages in an SQS queue and trigger a Step Functions execution. This setup allows you to start Step Functions workflows based on queue events without using a Lambda function as an intermediary.

## Cost Comparison: Step Functions vs. Lambda

The cost comparison between Step Functions and Lambda depends on various factors:

1. Step Functions Pricing:

   - Charged per state transition

   - Standard Workflows: $0.025 per 1,000 state transitions

   - Express Workflows: $1.00 per 1 million synchronous executions

2. Lambda Pricing:

   - Charged based on the number of requests and the duration of execution

   - $0.20 per 1 million requests

   - $0.0000166667 per GB-second of execution time

## Billing for API Call Waiting Time

Step Functions bills differently depending on the workflow type:

1. Standard Workflows:

   - Billed per state transition

   - Not charged for time spent waiting for API call responses

   - Ideal for long-running processes or workflows with significant wait times

2. Express Workflows:

   - Billed based on the number of executions and their duration

   - Charged for the entire execution time, including time waiting for API responses

   - Better suited for short-lived workflows

## Making API Calls from Step Functions

You don't need a specific API library to make API calls from Step Functions. Step Functions provides built-in integrations for calling APIs:

1. AWS SDK Integrations:

   - Call over 9,000 API actions across 200+ AWS services directly from your state machine

   - Use the Resource field in your Task state to specify the service and action

2. HTTP/HTTPS API Calls:

   - Use the arn:aws:states:::http:invoke resource in your Task state

   - Specify the API endpoint, method, and other details in the Parameters field

   - Supports various authentication methods, including IAM roles and API keys

3. Optimized Integrations:

   - Customized integrations for specific AWS services

   - Provide special functionality in a workflow context

Here's an example of an HTTP API call in a Step Functions Task state:

{

  "Type": "Task",

  "Resource": "arn:aws:states:::http:invoke",

  "Parameters": {

    "ApiEndpoint": "https://api.example.com/v1/data",

    "Method": "GET",

    "Authentication": {

      "ConnectionArn": "arn:aws:events:us-east-1:123456789012:connection/example-connection"

    }

  }

}

In this example, the Step Function makes an HTTP GET request to the specified API endpoint using the authentication details provided in the EventBridge connection.

When choosing between Step Functions and Lambda, consider factors such as workflow complexity, execution duration, and the need for visual workflow design. Step Functions excel at orchestrating complex workflows and managing long-running processes, while Lambda is ideal for short, event-driven functions.

EventBridge Pipes is a feature of Amazon EventBridge that allows you to create point-to-point integrations between event producers and consumers. Here's an explanation of how they work and their associated costs:

## How EventBridge Pipes Work

EventBridge Pipes consist of four main components:

1. Source: This is where events originate. Supported sources include SQS queues, DynamoDB streams, Kinesis streams, Amazon MQ, and Apache Kafka[1][2].

2. Optional Filtering: You can apply filters to process only a subset of events, reducing costs and unnecessary processing[2].

3. Optional Enrichment: This step allows you to enhance the event data before sending it to the target. You can use Lambda functions, Step Functions, or API Gateway for enrichment[1][2].

4. Target: This is where the processed events are sent. EventBridge Pipes supports 15 different targets, including Lambda functions, Step Functions, Kinesis streams, and more[4].

The flow works as follows:

1. Events enter the pipe from the source.

2. If configured, events pass through a filtering stage.

3. If set up, events go through an enrichment process.

4. Finally, events are sent to the specified target.

## Cost of Using EventBridge Pipes

The pricing for EventBridge Pipes is based on the number of events processed and the optional enrichment invocations. As of November 2024, the pricing structure is as follows:

1. Event Processing: $0.40 per million events processed

2. Enrichment Invocations: $0.20 per million enrichment invocations

Key points about pricing:

- You're only charged for events that pass through the filtering stage, potentially saving costs on unwanted events[2].

- There are no upfront costs or minimum fees.

- You pay only for what you use.

- Costs may vary slightly by AWS region.

It's important to note that while EventBridge Pipes itself has these specific costs, you'll also need to consider the costs of the source and target services you're using (e.g., SQS, Lambda, Step Functions).

## Cost Optimization

To optimize costs when using EventBridge Pipes:

1. Use effective filtering to reduce the number of processed events.

2. Consider batching events where possible to reduce the number of enrichment invocations.

3. Choose the appropriate source and target services based on your use case and budget.

EventBridge Pipes can be a cost-effective solution for many integration scenarios, especially when compared to the alternative of using Lambda functions as intermediaries, which would incur both Lambda execution costs and potential costs for increased code maintenance[4].

I can paste this into a document which I upload to the Project Knowledge so Claude can leverage it.

There's still a need to for some give-and-take with Claude about how to use the information:

One aspect of the architecture is the difference between Development conditions and Production conditions. As a developer, it's a lot more efficient if I can stub out the AWS environment - "mock" it as it were - and do everything locally, which is a big part of this plan. The next level of complexity is actually using the AWS environment to use the real queuing mechanism, but again I want to have a simple first step with a regular lambda function before using the AWS Eventbridge workflow infrastructure on top, to make basic troubleshooting possible and also escalate the solution difficulty progressively rather than jump to the full thing right off the bat. I'm an optimist - but not that much!

Questions, questions and more questions

I have a little humor at Claude's expense by asking it to pretend to be an AI Coding Assistant! In any case we're back to having it check it has everything it needs in the latest documentation

API Call PatternIs there a specific API call pattern we need to support? No we can define it as part of the detailed designDo we need to handle multiple API calls in sequence? NoDo we need to handle API call responses to determine next steps? NoDo we need request/response validation? Only minimal
Step Function State MachineWhat specific states do we need in the state machine? Whatever the detailed design deems necessary to perform the serviceHow do we handle API timeouts in the state machine? We proceed with standard error handling based on options: - discard the message - log the error - if there's an error handler callback provided, call it.Should cancellation trigger state machine termination or a specific cancellation path? like for the lambda: if the job is cancelled by the time it is dequeued, then it gets discarded. If it is running when cancelled, then. it is up to the application callback service to periodically check the state of the jobHow do we map SQS message attributes to Step Function input? No particular requirement, whatever is imposed by the platform.
Development Mode TransitionWhat debugging information needs to be preserved when moving from development to production? No requirementIs there a requirement to run development and production modes side by side? noDo we need to support switching individual queues between modes? We should either have it handled by a lambda or by the eventbridge not at the same time.
Callback HandlingHow should the Step Function handle callbacks to Netlify/Vercel? like the lambda, make an API call but using the step function libraryShould the Step Function retry failed callbacks? NoDo we need to maintain callback order? No
State ManagementWhat specific job metadata needs to be maintained beyond the basic status? Nothing in particular - whatever is needed to support the functionalityShould the Step Function maintain a history of state transitions?if it's helpful but it should get purged short-termDo we need to expose the internal Step Function state to external systems? no, just whether the job is cancelled or not
EventBridge Pipe FilteringWhat are the filtering requirements? None in particularDo we need to support dynamic filter updates?Should filter changes require redeployment?
Error ScenariosWhat happens if the Step Function fails mid-execution? Already discussedHow do we handle partial completions? Discarded, and follow error handlingWhat retry strategies should be implemented for different error types? No retries
Local DevelopmentDo we need to provide Step Function emulation locally? No the functionality should be equivalent to a lambda, we just want it to cost less in productionShould the development mode exactly match production behavior? as a black boxWhat debugging features are required in development mode? Nothing special, just tracing

Get Back, Jack, Do it again...

By now you all know how the chorus goes, and we ask the same question again.

Now I'm starting to see this is drifting into confabulation territory, so I challenge it as to whether we really need this right now or can be left for the detailed design/coding stage:

Kind of funny because since a recent system update, Claude has taken to starting his answers withg "Ah," whenever there is something to correct, but I guess being caught out for such an egregious waste of GPU causes an apology to pop up our of hiding. In any case let's try to generate the final final plan!

Here it is!

# Queue Framework Implementation Plan - Final

## Part 1: Core Framework and Local Development

### Phase 1.1: Framework Foundation
1. Create core types and interfaces:
   ```typescript
   JobConfig, JobStatus, WorkItem, etc.
   ```
   Verifiable by:
   - Creating test job configurations and validating all required fields
   - Attempting invalid configurations and observing type errors
   - Testing state transitions through all possible job statuses
   - Running a test suite that covers all type constraints
   A test program will demonstrate type system effectiveness.

2. Create in-memory queue emulator:
   ```typescript
   LocalQueueService, LocalStateStore
   ```
   Verifiable by:
   - Enqueueing test jobs and observing queue state
   - Verifying FIFO behavior with multiple jobs
   - Confirming job retention and retrieval
   - Testing concurrent access patterns
   Demonstrated through CLI tool showing queue operations.

3. Implement job lifecycle management:
   ```typescript
   Job state transitions, metadata tracking
   ```
   Verifiable by:
   - Creating job and tracking through all states
   - Testing cancellation flows
   - Verifying error state handling
   - Monitoring state consistency
   Interactive test program will show complete lifecycle.

### Phase 1.2: Local Development Environment
4. Create local worker framework:
   ```typescript
   Worker loop, job processing
   ```
   Verifiable by:
   - Starting worker and processing test jobs
   - Observing job pickup and execution
   - Testing worker error handling
   - Verifying worker shutdown
   Demonstrated via test worker processing sample jobs.

5. Implement local callback mechanism:
   ```typescript
   HTTP callback server, result handling
   ```
   Verifiable by:
   - Setting up local HTTP endpoint
   - Receiving job completion callbacks
   - Testing error callbacks
   - Verifying payload handling
   Complete local processing loop observable through logs.

6. Add development test harness:
   ```typescript
   Test scenarios, verification tools
   ```
   Verifiable by:
   - Running standard job scenarios
   - Testing error conditions
   - Verifying cancellation
   - Checking state consistency
   Test suite provides confidence in core functionality.

## Part 2: AWS Implementation

### Phase 2.1: AWS Infrastructure
7. Create Terraform templates:
   ```hcl
   SQS, DynamoDB, IAM roles
   ```
   Verifiable by:
   - Successful infrastructure deployment
   - Validation of all resources
   - Security policy verification
   - Cross-referencing with requirements
   Clean deployment to empty AWS account proves completeness.

8. Implement AWS queue adapter:
   ```typescript
   SQS integration, message handling
   ```
   Verifiable by:
   - Submitting jobs to SQS queue
   - Verifying message format
   - Testing queue attributes
   - Checking message retention
   End-to-end SQS operation with test messages.

9. Create AWS state management:
   ```typescript
   DynamoDB operations, state tracking
   ```
   Verifiable by:
   - Creating and updating job states
   - Testing concurrent updates
   - Verifying state consistency
   - Checking data retention
   Complete state management demonstration.

### Phase 2.2: AWS Processing
10. Implement AWS Lambda worker:
    ```typescript
    SQS trigger, job processing
    ```
    Verifiable by:
    - Deploying test worker
    - Processing real queue messages
    - Handling timeouts
    - Managing concurrency
    Live worker processing demonstration.

11. Create callback mechanism:
    ```typescript
    Lambda function for testing callbacks
    ```
    Verifiable by:
    - Setting up test Lambda endpoint
    - Receiving worker callbacks
    - Testing error scenarios
    - Verifying payload handling
    Complete callback cycle demonstration.

12. Implement development callback testing:
    ```typescript
    ngrok setup, local endpoints
    ```
    Verifiable by:
    - Exposing local endpoint via ngrok
    - Receiving AWS Lambda callbacks
    - Testing local processing
    - Verifying end-to-end flow
    Development workflow demonstration.

### Phase 2.3: AWS Integration
13. Add AWS monitoring:
    ```typescript
    CloudWatch integration, logging
    ```
    Verifiable by:
    - Viewing CloudWatch logs
    - Checking metric collection
    - Testing log queries
    - Verifying alert conditions
    Monitoring dashboard demonstration.

14. Implement error handling:
    ```typescript
    Error states, DLQ
    ```
    Verifiable by:
    - Testing job failures
    - Verifying DLQ behavior
    - Checking error states
    - Testing manual intervention
    Error handling workflow demonstration.

## Part 3: Netlify Example Integration

### Phase 3.1: Netlify Example App
15. Create basic Netlify application:
    ```typescript
    Job submission UI, status tracking
    ```
    Verifiable by:
    - Submitting test jobs
    - Viewing job status
    - Testing cancellation
    - Handling errors
    Complete example application demo.

16. Implement callback handling:
    ```typescript
    Netlify function endpoints
    ```
    Verifiable by:
    - Receiving AWS callbacks
    - Updating UI with results
    - Handling errors
    - Testing timeouts
    End-to-end job lifecycle demo.

17. Add security implementation:
    ```typescript
    Authentication, encryption
    ```
    Verifiable by:
    - Testing secure callbacks
    - Verifying token handling
    - Checking encryption
    - Testing invalid access
    Security features demonstration.

## Part 4: Vercel Example Integration

### Phase 4.1: Vercel Example App
18. Create basic Vercel application:
    ```typescript
    Job submission UI, status tracking
    ```
    Verifiable by:
    - Submitting test jobs
    - Viewing job status
    - Testing cancellation
    - Handling errors
    Complete example application demo.

19. Implement callback handling:
    ```typescript
    Vercel API routes
    ```
    Verifiable by:
    - Receiving AWS callbacks
    - Updating UI with results
    - Handling errors
    - Testing timeouts
    End-to-end job lifecycle demo.

20. Add security implementation:
    ```typescript
    Authentication, encryption
    ```
    Verifiable by:
    - Testing secure callbacks
    - Verifying token handling
    - Checking encryption
    - Testing invalid access
    Security features demonstration.

## Part 5: Framework Completion

### Phase 5.1: Package and Documentation
21. Create npm helper package:
    ```typescript
    Types, utilities, integration helpers
    ```
    Verifiable by:
    - Installing in test project
    - Testing all utilities
    - Verifying type definitions
    - Checking documentation
    Package usage demonstration.

22. Create framework documentation:
    ```markdown
    Usage guides, API docs, examples
    ```
    Verifiable by:
    - Following setup guide
    - Testing code examples
    - Validating API docs
    - Checking troubleshooting
    Documentation completeness verification.

### Phase 5.2: Testing and Validation
23. Create integration test suite:
    ```typescript
    Cross-platform tests
    ```
    Verifiable by:
    - Running full test suite
    - Testing all platforms
    - Verifying error handling
    - Checking performance
    Complete test coverage demonstration.

24. Implement load testing:
    ```typescript
    Concurrent processing, limits
    ```
    Verifiable by:
    - Testing concurrent jobs
    - Checking rate limits
    - Verifying throughput
    - Testing recovery
    Load handling demonstration.

25. Security validation:
    ```typescript
    Penetration testing, security review
    ```
    Verifiable by:
    - Testing all security features
    - Attempting unauthorized access
    - Checking encryption
    - Verifying token handling
    Security robustness demonstration.

### Phase 5.3: Production Readiness
26. Create deployment templates:
    ```hcl
    Production Terraform, configuration
    ```
    Verifiable by:
    - Clean production deployment
    - Verifying all resources
    - Testing scaling
    - Checking monitoring
    Production readiness demonstration.

27. Implement logging and diagnostics:
    ```typescript
    Production logging, troubleshooting
    ```
    Verifiable by:
    - Testing log collection
    - Checking diagnostics
    - Verifying traces
    - Testing problem resolution
    Operational readiness demonstration.

28. Create example applications:
    ```typescript
    Production examples, patterns
    ```
    Verifiable by:
    - Building example apps
    - Testing integrations
    - Verifying patterns
    - Checking documentation
    Complete example suite demonstration.

29. Performance optimization:
    ```typescript
    Tuning, scaling tests
    ```
    Verifiable by:
    - Testing throughput
    - Checking latency
    - Verifying scaling
    - Measuring costs
    Performance characteristics demonstration.

30. Final system validation:
    ```typescript
    Complete system test
    ```
    Verifiable by:
    - Full system testing
    - Cross-platform verification
    - Security validation
    - Documentation review
    Production deployment sign-off.

## Implementation Notes

1. Each step must be fully verifiable before proceeding
2. Development environment prioritizes ease of testing
3. AWS implementation forms the core processing engine
4. Platform examples demonstrate integration patterns
5. Security is integrated throughout all phases

The Verdict

Is this document an architecture? A design? What is software architecture anyway?

Let's just say it's a darn good technical summary and plan to create this Job Queueing framework, and those of my readers that are former colleagues know that I've had plenty of Job Queuing in my life so I ought to know ;)

I think a case can be made there is an advantage to using the latest LLMs in preparing a technical plan and tying together many different aspects that are important for the project.

We're definitely not at a point where the AI can do this on its own, but I would be very reluctant to create such a document without AI assistance having experienced it.

Also this is structured in a way that is suitable for AI-assisted coding next, which can be a bit tricky otherwise.

I still think AI is just Apparent Intelligence and what it does is a simulation. But imagine a video game where you walk into a building, take the elevator, get out at the floor of a software company, walk into an office, sit down with an NPC programmer, and start talking about your project, asking it to type some documents and code snippets to round it up, doing some virtual whiteboarding, then asking it to send you the files - and finding the files in your real email! Well that's what we have, except for the office visuals.

So it's just a simulation, but it can simulate some useful stuff and let you use it.

F1 teams rely on professional drivers to work out strategies and tactics for each circuit they go to, using highly accurate measurements from the circuits and very precise model. Just because it's a simulation doesn't mean it's not useful!

As someone who writes a lot of code, Martin Béchard finds it incredibly useful to develop technical ideas with the help of AI and loves to help clients with finding great technical solutions using the most effective approaches.

Me and my AI coding buddy

343 位关注者

Pronoy kumar Das

Python | Django | Research | Music | Turning Ideas into Engaging Web Experiences - Let's Connect!

2 个月

Sir, System Design And Analysis part are also going to be automated very soon ?

Sanka K.

High Agency CEO/Founder in Accessible Decision Intelligence: Social Impact Leader, Mentor, Podcast Host, Board Director. FEATURED: USA Today, The Drum & HRTech Outlook. CHECKOUT RECOMMENDATIONS BELOW???? ???? ????

2 个月

I may be biased but I ??your newsletters! As someone who is business and tech leaning ??I really appreciate your sense of humour (yeah I catch the references) and accessibility for those who aren’t necessarily coding for a living. I’ve learned a ton of stuff reading these pieces! Keep it up you are filling an important gap between tech and business!

1 次回应

查看更多评论

要查看或添加评论，请登录

Martin Bechard的更多文章

Reasonings found in a bathtub

2025年2月5日

Reasonings found in a bathtub

Since the end of 2024, the latest evolution of Large Language Models is dominated by so-called Reasoning models, with…
ClaudePS: A Prompting Tool for Claude Sonnet

2024年12月19日

ClaudePS: A Prompting Tool for Claude Sonnet

If you are, like me, an extensive user of Claude Sonnet 3.5, you create multiple projects, each having dozens of…
Developing with Anthropic MCP (Part 1)

2024年12月2日

Developing with Anthropic MCP (Part 1)

Anthropic has just released the Model Context Protocol and a new version of Claude Desktop as a new way of integrating…
Cline - New (Old) Kid in Town

2024年11月19日

Cline - New (Old) Kid in Town

There's a new AI Codeslinger in town called Cline. Born ClaudeDev, Cline got a name change for marketing reasons.
Perplexity vs. OpenAI: Battle of the AI Search Titans

2024年11月1日

Perplexity vs. OpenAI: Battle of the AI Search Titans

Earlier today I saw that OpenAI posted on LinkedIn that it had released its much-vaunted "AI Search" which had been in…
Building Swarm-JS (Part 1)

2024年10月28日

Building Swarm-JS (Part 1)

Recently Anthropic released Swarm, an "Agentic" open-source framework in python. As the README says: An educational…
Putting the "New" Claude Sonnet 3.5 through its paces

2024年10月24日

Putting the "New" Claude Sonnet 3.5 through its paces

I was recently hitting the limitations on Claude Sonnet's output on a regular basis, as part of getting Claude to…

1 条评论
Perplexity: Secret Agent Man

2024年10月23日

Perplexity: Secret Agent Man

Perplexity, the leading AI search engine that is becoming the new Google for AI-savvy searchers, is getting on the…
Anthropic Claude's Computer Use Demo is Worth Seeing

2024年10月22日

Anthropic Claude's Computer Use Demo is Worth Seeing

Anthropic just released a new version of Claude Sonnet 3.5 with "Computer Use", intended to allow Claude to take…
An (AI) Diagram is Worth a Thousand Words

2024年10月21日

An (AI) Diagram is Worth a Thousand Words

If like me you've been using Claude Projects for text analysis and generation, you will rapidly create multiple related…

4 条评论

See all articles

Architecting a Queuing Solution With Claude Sonnet 3.5

Martin Bechard

The Problem

Concepts of a Big Picture

A Twist

Iterating the Details

Putting it together

Module Breakdown

When Use Cases Come Last

Buzzword Police

Planning for iterations

Future Self

领英推荐

Checking it twice

Brief wrap-up

This time for sure?

El plan

Perplexity to the rescue:

Questions, questions and more questions

Get Back, Jack, Do it again...

Here it is!

The Verdict

Me and my AI coding buddy

343 位关注者

Martin Bechard的更多文章

社区洞察

其他会员也浏览了

The Database Powering Zepto’s 10-Minute Delivery

Leveraging Machine Learning and Artificial Intelligence in Databricks: Tips & Guidance

Solving Amazon captcha Waf Captcha fully automatically with almost zero experience in development

Our Picks of High-Performance LLM Tech Stacks

Top In-Demand IT Skills in 2023

Praathee’s Tech Bytes - November 2024 Edition II

CSEngineering's ChainOne Software Factory Assessed “Awardable” for Department of Defense work in the CDAO’s Tradewinds Solutions Marketplace

KubeAI: Scalable, Open-Source LLMs for All

Implementing Microservice Communication (Part 2)

Langchain: A Primer

The Problem

Concepts of a Big Picture

A Twist

Iterating the Details

Putting it together

Module Breakdown

When Use Cases Come Last

Buzzword Police

Planning for iterations

Future Self

领英推荐

Checking it twice

Brief wrap-up

This time for sure?

El plan

Perplexity to the rescue:

Questions, questions and more questions

Get Back, Jack, Do it again...

Here it is!

The Verdict

Me and my AI coding buddy

343 位关注者

Martin Bechard的更多文章

Reasonings found in a bathtub

ClaudePS: A Prompting Tool for Claude Sonnet

Developing with Anthropic MCP (Part 1)

Cline - New (Old) Kid in Town

Perplexity vs. OpenAI: Battle of the AI Search Titans

Building Swarm-JS (Part 1)

Putting the "New" Claude Sonnet 3.5 through its paces

Perplexity: Secret Agent Man

Anthropic Claude's Computer Use Demo is Worth Seeing

An (AI) Diagram is Worth a Thousand Words

社区洞察

其他会员也浏览了

The Database Powering Zepto’s 10-Minute Delivery

Leveraging Machine Learning and Artificial Intelligence in Databricks: Tips & Guidance

Solving Amazon captcha Waf Captcha fully automatically with almost zero experience in development

Our Picks of High-Performance LLM Tech Stacks

Top In-Demand IT Skills in 2023

Praathee’s Tech Bytes - November 2024 Edition II

CSEngineering's ChainOne Software Factory Assessed “Awardable” for Department of Defense work in the CDAO’s Tradewinds Solutions Marketplace

KubeAI: Scalable, Open-Source LLMs for All

Implementing Microservice Communication (Part 2)

Langchain: A Primer