MVP of 'Athena Bot' built on AWS Serverless + Python + Flask + HuggingFace + ChatGPT 3.5
Athena - The Cyber Boardroom bot in action

MVP of 'Athena Bot' built on AWS Serverless + Python + Flask + HuggingFace + ChatGPT 3.5

Following from my first Hugging Face bot (see this LinkedIn post about 'Bobby Tables' ), I’ve been doing quite a lot of research on how to create an end-to-end solution on top of chatGPT, i.e. an actual real-world scenario.

I think I was able to create something interesting that will be useful to a very specific target audience :)

Athena the chat bot

Please meet Athena from “The Cyber Boardroom” (CBR), who is designed to provide cyber security advice and guidance to Board Members, NEDs, future Board Members and people presenting to Boards.

More to come over the following weeks on this exciting initiative. The mission is to deliver a one-stop shop that boards of directors can rely on for their cyber readiness.

This is a complete serverless solution, that runs of AWS and Hugging Face with a monthly cost of about $2 (with the majority of that cost coming from AWS DNS domain name registration).

You can try it out at https://www.thecyberboardroom.com/

The login/Account creation workflow

Clicking on Login , you will see the AWS Cognito sign-up:

And if you click on 'Sign up', you will see this page (also Cognito):

As you can see above, this page has a really nasty UX bug, where all the 'password validation' rules are shown as soon as you start typing the username (this is a good example of what not to do, i.e. a good example of bad 'security usability').

This is a fully managed AWS Cognito workflow, which was very easy to setup and even includes email verification (I could had added 2FA, and OAuth logins, but that felt a bit much :) ).

Btw ... IF you don't want to create an account.... but still want to give Athena a go, you can go directly to the Hugging Face page that is hosting the ChatGPT bot at https://huggingface.co/spaces/the-cbr/cbr-hf-gradio

Using Athena

After login you will see the main UI, which highlights the two main sections: Athena (the bot) and Cyber Security content:


The main attraction is of course the Athena bot, which can be accessed via the 'Go' link on the first section/box, via the left hand side menu, or directly at https://www.thecyberboardroom.com/athena

To kickstart the conversation with Athena, I’ve added the nice first question of “Hi, Good morning, who are you, and what do you do?”.

You just need to click on the orange "Submit" button, and you should get an answer like the one below.


From here you can basically ask any question you like, and (if all goes according to plan) Athena should always try to respond it from a Cyber Security and Board advisor point of view.

To kickstart your journey into Athena's world, here are some good questions to ask:

  • I work for the Finance industry, what should I care about?
  • What Questions should I ask my CISO?
  • What are the regulations that I should be aware of, and that we should be doing something about?
  • What are my legal obligations?
  • What is the best way to learn more about Cyber Security?
  • Why Cyber security matters?
  • Can you give me some examples of what happened to companies that did not invest in Cyber Security?
  • How should we tackle the issue of Supply Chain Cyber Security?
  • Is Cyber Security just a cost to normal business operations, or can it also be a business opportunity and enabler?
  • If a company doesn’t have a mature Cyber Security team and program, why is investing in effective incident detection and response, one of the best ROI things to do?


  • What is the average % of the IT budget that companies in the Finance industry should be spending in Cyber Security

10% to 15% would certainly be nice :)
PLEASE: share some of your threads with me, and what you thought of them (so that I can understand better?what works and what doesn’t work)
NOTE: at the moment Athena is using GPT 3.5 (namely gpt-3.5-turbo with temperature=1) , so although you should receive some really cool and impressive answers, it will become even better once I wired in GPT 4.0, and other LLMs like Claude2 or Llama2

Ok, so hopefully you had some fun talking with Athena, but what else is there?

Why don’t you ask why Athena is called Athena?

What about asking if Athena has any siblings?

And (only relevant if you have seen Disney’s Encanto), ask Athena if she has any other siblings :)

... or why those names?


Finally ask Athena about which version she is in at the moment?


The master prompt

I hope that by now you have experienced Athena, and have seen the power of a really useful and friendly bot :)

But how does it work?

How much did I have to do to get ChatGPT to behave like this?

It turns out, not much at all :)

Here is the only prompt that is being used to create Athena:

The Prompt that creates Athena (and the other two bots)

Impressive isn’t it?

Those couple paragraphs were all that was needed to make Athena behave like that. Note how the whole "siblings narrative" and "back story" was created completely by ChatGPT.

From a technical point of view, all we need to do is to add that prompt as a ’system’ prompt on every request and leverage Gradio’s ChatInterface Block.?

Here is all the code that creates Athena's UI:

Here is the code that makes the request to OpenAPI and keeps track of the history: (btw can you spot a major problem with this code, and what will eventually happen?).

For more details check out Gradio (which btw is an awesome API) and this article on how to build a gpt chatbot with gradio .

Athena's front end

Now ... the eagled eyed amongst you, might have noticed that Athena's UI looks very similar to one of Flask's Open-Source Seed Projects !

And they would be are right, since 90% of the current UI is from the Flask Material Lite theme.

Which you can see in action here :

The main change I made was actually to remove ALL database support and made it work in AWS Lambda (i.e. Serverless).

The key is the use of the serverless-wsgi which provides the ability to run a Flask website inside a Lambda!

This is absolutely awesome and insanely powerful!!!!

Anybody who has dealt with the problems of maintaining and scaling web servers like IIS, Apache, Node etc.. will really appreciate the awesomeness of running a highly scalable website 'hosted' inside a serverless environment (with insanely fast response, and zero costs when no web requests are made by users).

Can you tell how excited I am about this? (btw you can do the same thing to easily create serverless REST OpenAPI's services using FastAPI , but that is a topic for another post).

Anyway, here is the main lambda handler/entry-point, where you can see that it is basically just passing the lambda events param (with all the request data) into the serverless-wsgi handle_request method, who will return the data nicely formatted for API Gateway or Lambda's URL Functions:

Here is the run.app code:

... which calls the Flask_Site().app() method:

... which calls the create_app method, who is the one that actually creates and configures the Flask webserver:

Not using API Gateway

For the first couple versions (mainly due to my past experience in AWS), I ended up using the AWS API Gateway, which was OK(ish) but was adding quite a bit of an overhead and complexity (for not much value). But then, while 'working' with ChatGPT on this project, I found out that AWS has since released the Lambda Functions URLs , which are exactly what I was looking for :)

They are easy to setup, and they expose an endpoint like this one: https://qtgpfi7dsxmxadvrfwid2zskwe0nwisj.lambda-url.eu-west-2.on.aws/

Of course that such url / domain is not very user-friendly, so using Route53, I added a A record:

That points to this CloudFront distribution :

That is the one that points to the lambda function (instead of the API Gateway):

Pretty cool setup isn't it? :)

How much does it cost?

OK, although it is pretty awesome that we can run a whole website on top of the highly scalable AWS serverless infrastructure (or one of the other Cloud providers), the important question is: how much does it cost?

After a couple experiments where I did fall for the AWS vertex of services that 'sound serverless' but have associated costs for just 'having it turned-on' (another a topic for a separate post), the final solution, based on CloudFront, S3, Lambda URL Functions, CloudWatch, Cognito and Route53, currently has the highly impressive $0.05 weekly cost. Ok, the site doesn't have a lot of traffic, but these AWS services are all designed to scale in a highly cost-effective way.

What about Hugging Face?

Since the Athena bot, which is the part running on Hugging Face serverless environment, is using the free "CPU basic" setup with no persistent storage:

The current Hugging Face cost is actually $0

It makes sense that Hugging Face costs are really low, since it is really not doing much. It is basically hosting the Gradio code and proxing the requests to OpenAI API's.

So, for the final piece of the cost puzzle, how much is OpenAI costing so far?

Taking into account that the site doesn't have a lot of usage, so far in September I've spent $0.20 in ChatGPT API calls.

All put together we have a really nice serverless solution that is running (without a lot of traffic) at less than $1 per month. The biggest cost so far has been the $15 yearly cost for the DNS registration.

Actually, there is one more piece of the puzzle worth mentioning.

I also modified the original Flask code, to use S3 as the provider for all static assets:

This means that we can get a page loaded and setup in the user's browser in about 158ms (which is pretty good), with Flask only handling/processing the main HTML pages or API calls (the rest comes from CloudFront+S3):

Notice how in the list of requests/resources loaded (shown above), only the first request was an actual HTTP request to the live server.

All the others requests (like /assets/plugins/jquery/dist/jquery.min.js ) were cached, and fetched from https://static.thecyberboardroom.com , which is configured as a CloutFront distribution to point to the S3 bucked mentioned above (i.e. it acts like a CDN).

Hallucinations?

OK, so what about ChatGPTs propensity to go off piste and make stuff up? (specifically on v3.5).

Well ... that it is still there. Although the 'Athena' prompt does help to keep the chats focused on Cyber Security, there are still times where it just makes stuff up.

For example the reason I added the “…and the release notes are: 'Minor back end changes, and new UI..' ” to the line “…your current version is: v0.7.7…”, is because when the prompt only had the version number, and I asked Athena about "what is the current version and its release notes" (i.e when the system prompt text was just "- your current version is: v0.5.7”), ChatGPT produced the gem of an answer you can read below, which is a massive hallucination (i.e. a lie), since ChatGPT completely made up the list of features in this release:


Btw, in a weird way, as also seen in this LinkedIn post (that I shared a while back), somehow, these LLM bots (like ChatGPT or Claude 2) keep making up titles and places that I 'supposedly' worked at (or have worked at :) ).

I have no idea which company is this, and I definitely never worked there

Changing Agent personality via Prompt changes

Now, to see the power of prompt engineering, more specifically the simple prompt used to 'create Athena', let's us see what happens if we change it a bit.

I was showing the bot to a Portuguese friend, and we made two changes to the original prompt:

  • The bot name, which was set to 'Camoes' (big Portuguese poet that we all had to study at school, but very few actually understood what he was talking about :) )
  • The line highlighted bellow, which asks the bot to be "highly sarcastic and jaded about Cyber Security"

Now maybe it is just me, but I find 'Camoes' answer absolutely hilarious, and I think there are lots of us in the Cyber Security field, that will be able to relate to it at a very deep level :)

CI Pipeline, Tagging and Code Coverage

One of the most important components of any project (doesn't matter how small or large), is an effective, easy to use and automated CI (Continuous integration) pipeline AND versioning.

Anybody who has worked with me on a development project, knows that I'm very big on using versions (i.e. Git Tags) to track all Pull Requests and merges into Dev or Main branches.

I kinda follow a variation inspired by the highly influential A successful Git branching model blog post, which visualised it like this:

Here is my approach:

  • There is a main branch which holds the code that is (or about to be) pushed into PROD (in this case, the creation/update of the Lambda function).
  • There is a dev branch, which should always be in a position to be merged into main (as long as all unit and integration tests are passing)
  • When needed, create feature/experiment branches, that when ready are merged into dev

What holds all this together, is the following (automated) Git Tagging strategy that is based on a version numbering of: v{RELEASE}.{MAIN}.{DEV} (note we are still on the first release so all tags start with "v0."):

  • Any merge into dev, where all tests pass, will increase by 1 the DEV number , i.e v{RELEASE}.{MAIN}.{DEV+1}
  • Any merge into main, where all tests pass, will increase by 1 the MAIN number , i.e v{RELEASE}.{MAIN+1}.{DEV}

Let's look at a practical example.

In the commit tree below, we have made some changes locally that were pushed into the upstream dev branch (note that the last tag in this repo is the v0.33.9 ):

The push into the dev branch trigged the following CI pipeline (built using GitHub workflows and actions):

Since these GH actions are trigged using workflow_run events, GitHub (unless I'm missing something obvious) does not provide a nice visualisation of what going on.

So here is the sequence of events:

  • on every push to dev (or merged Pull Request), the Run Unit Tests workflow is executed (which will run the unit tests and push the results to CodeCov)
  • if that completes successfully the Increment Tag - Dev branch workflow will run
  • when that finishes, the Deploy Lambda - DEV (CD) workflow is executed
  • and when that is completed (meaning that we now have the latest code changes deployed into a DEV version of the main Lambda function), the Run Integration Tests (DEV) workflow is executed (this runs the typical end-to-end integration tests, that expects a fully working site/api to be on the other side)

Now, if we look at our Git Commit view, we will notice that there is a new commit AND a new tag in origin/dev (i.e. in GitHub), which is now one commit ahead of my local dev branch

In addition to the new tag number (which has expected, was automatically increased from v0.33.9 into v0.33.10), there were also these code changes made:

This change is important, since not only it provides a nice place/commit to add the new tag number, it updates the main README.md code coverage badge, and (very importantly) it updates the version file that exists at the root of the source code.

This version file is how the website (i.e the Flask server) always 'knows' which version it is running. Something you can see at the footer of every page:

or at https://www.thecyberboardroom.com/version endpoint

Btw, one of the integration tests that is executed by the Run Integration Tests (DEV) workflow, is to check if that url (on the DEV deployment) matches the current version in the source code (which the integration test has access to).

We can also confirm this by opening up the DEV server and noticing (at the bottom of the login page for example) that we are indeed on version v0.33.10:

Ok, so how do we deploy this dev branch into the main brach (and into the PROD Lambda function)?

I have a little bash script called git-merge-main.sh :

Which basically:

  • syncs dev and main with the latest commits (from GitHub)
  • merges dev into main
  • pushes main into GH, using the --no-nf strategy (no-ff = no fast forward)
  • merges back into dev , the commits just made to main

Btw, this is exactly what happens when we go through a Pull Request workflow, but since the main devs in this project is me and ChatGPT, and ChatGPT (for now) does not have access to the repo, I kinda find it easier to just run this script :)

Here is what the repo looks like, after I executed git-merge-main.sh and the main CI workflow has been completed:

Note how we now have a new v0.34.0 tag (from v0.33.10) which is an increase of the {MAIN} number and an reset of the {DEV} number (a main release should always have a .0 {DEV} version).

So... are the changes live on the website?

Actually no, since pushing into the main website has a number of practical implications (and risks), in this case, if we look at the GitHub actions, we will only see two executions: One for the unit tests and one for the tag increase.

For now, the way the main branch is released into production (which in practical terms, means updating the lambda function), is by running the Deploy Lambda - PROD (CI) workflow manually:

Which can be done on the GitHub UI:

And once that workflow is completed, the lambda with the updated code has been deployed:

With the new version (v0.34.0) being live on the https://www.thecyberboardroom.com/ website :

I cannot stress enough the importance of:

  • the tags being added automatically on dev and main commits
  • the dev-to-main tag workflow: v{RELEASE}.{MAIN}.{DEV}

But that is also a topic for another post :)

For reference here is the tree of GitHub actions and workflows I reused in this project, and an example of one of the workflows that is trigged as part of the 'push to dev CI pipeline' (you can ignore the items in the 'workflows_dev' folder, since they are not picked up by GitHub):

Actually, technically, we can say that the 'push to dev' is a CD (Continuous Deployment) pipeline and the 'push to main' is a CI (Continuous Integration) workflow.

The Cyber Boardroom logo

Btw, what do you think of the logo?

Since I'm married to an amazing designer, who just-about had a heart-attack when she saw an AI generated logo for 'The Cyber Boardroom :). I kindly asked her to have a go at the logo, and she created these 3 variations (which are all pretty cool) :

Of which the "seat at the table (2nd one) was the winner :)

Leveraging the OWASP-SBot project

If you are thinking how did I have the time to write all the code required to manage and deploy to AWS (and other coding helpers), my super power was the https://github.com/owasp-sbot APIs that I have been coding and maintaining for the last 5 years:

Namely the https://github.com/owasp-sbot/OSBot-AWS , which dramatically optimises, simplifies and speeds up the use of boto3 (i.e. the AWS Python API):

If you haven't used it, definitely check it out, since it really makes a massive difference.

Using GitHub Projects to manage issues/tasks

Also very important in an effective development workflow, is a solid and effective bug tracking system.

So (after getting some help from my daughter, who really needed to learn GitHub, version control and markdown :) ), I ended up with this nice Kanban board, all created using GitHub projects (which btw, have become really 'feature rich' and powerful).

Note the use of labels to provide good metadata and to define priorities.

The last but not least topic in this section of CI pipelines, is Code Coverage and the use of https://about.codecov.io/ .

At the moment code coverage stands at 85.37% :

This is not very good, since the correct number would be ~100%.

But, that said, at the moment there is some code in the main codebase that is literately impossible to trace with python's coverage.py. That code is using python's sys.settrace(self.trace_calls), which prevents the capture of the execution flow.

This means that the real number is going to be higher that 85%.57 :)

Here is CodeCov breakdown of that 85.37% number:


Here is is a nice sundial view of the current code coverage in the 'apps' folder:

Using ChatGPT to write release notes

Before I get into some more details on the power of ChatGPT (and GitHub CoPilot), one of the coolest experiments that I did (and still need to wire it up automatically to the CI pipeline), was to use ChatGPT to process the completely 'unreadable Git diffs' between two releases, in order to create really nice and well formatted release notes.

For example, here is a small section of what the Git diff looks like between the v0.28.0 and v.030.0) releases, which btw, I can't really read, process and understand what is going on:

After feeding that 'raw diff' to ChatGPT (with a prompt focused on 'writing readable Release notes' + literally the full contents of that diff), here is the technical analysis:


And here is the 'business/product-owner' analysis:

Both are pretty much spot on, and way better than what I would write if asked to do so :)

Honorary co-worker award goes to: ChatGPT

If I was already impressed with ChatGPT before I started, after really working together with ChatGPT in this project, I have to say that I'm massively amazed of the workflows, conversations and even debates that I had with it!

I posted recently here on LinkedIn, the following two examples of the kind of threads I have been having with ChatGPT 4.0 (which is WAY better than 3.5 for this kind of conversations):

What is very important here is that ChatGPT is literally acting as a co-pilot/dev-colleague, who is highly knowledgeable in lots of areas, and really allows me to be super productive.

Now will ChatGPT produce 100% usable and correct code? No!

Depending on ask and task, it can be quite close, but what is really important is that it is "code in context", and code that I can easily change and improve.

Most of the time, the result works out the box, but I did had a couple moments, where I was operating at the limits of ChatGPT ability to apply complex logic to the requirements, and kept making a number of mistakes. As with a workflow with a real person, (i.e. not a bot), the path forward was to break the problem into small components, work on each separately, and evolve the solution into something that worked.

What I have to say, is that the final result was way better than something I could had done with the 'help' of Google or Stack Overflow (and way faster).

I really need to expand some of the workflows, threads and conversations I had with ChatGPT in this project, since some have been quite spectacular.

GitHub CoPilot was still very powerful and useful (and I highly recommend it), but I found it WAY more useful to workshop the code and logic with ChatGPT.

'Missing in action' awards go to: Google and StackOverflow

Although I did not use ChatGPT 100% (there are still some gaps, namely around the cut of point of its dataset), the reality is that I barely used Google or StackOverflow (specially when compared with similar dev projects I did in the past).

The quality and effectiveness of using ChatGPT is crazy, specially when compared with Google's highly inefficient workflow (which demands a much higher level of brain-context-switching).

See this LinkedIn Article I published on Practical example of using ChatGPT instead of Google , for a more in-depth analysis of how primitive and ineffective Google's "legacy" search is (i.e. the one not powered by an LLM).

Wrapping up

If you made it this far, thanks for reading, and I will really appreciate your feedback on this article and on The Cyber Boardroom :)

One more thing ....

If you haven't already, for all sorts of Cyber Security and AI topics, check out the schedule for the next virtual Open Security Summit (Oct 16th to 20th) which is looking REALLY good:

All sessions are free to attend and participate, with the videos of all sessions. posted online (soon after the session completes).

Jatin W.

Head of Data & AI at Givvable

1 年

Thanks for sharing such valuable information

Sayantan Polley

AI Researcher & Security Innovator | Oracle Risk Cloud Certified | SAP Security | GRC | IT GC | SoD | IT Audit Automation | ESG | Explainable AI | Explainable Search | Data Science on Azure, AWS

1 年
回复
Mark Weston

Chairman @ Regulativ.ai | Electronic Engineering, ACA

1 年
回复
Sherif Mansour

Father | Chairman | Director of Infosec | Startup Advisor

1 年

Nicely done Dinis!

Fredrik (Freddie) Hult

Innovation | Digital Products | Chief Information Security Officer (CISO) at PagoNxt.tech making the lives of 150 million people a bit better

1 年

Love where you are going with this. It is a clear gap in our universe and frankly there is no reason why I shouldn’t be able to ask Siri (whenever genai integration happens) about risk to my enterprise. Cyber coffee?

要查看或添加评论,请登录

Dinis Cruz的更多文章

社区洞察

其他会员也浏览了