Analysis: Generative AI and automation: Part 1
Magnus Glantz
Principal Specialist Solution Architect at Red Hat | author | spokesperson | linux | k8s | ansible | security
Hello there,
My social media is getting filled to the brim with influencers which are letting me know that some new AI powered tool will change the world for ever. At the same time, everyone from business leaders to politicians are trying to wrap their head around potential use and dangers of AI. With this in mind, I thought for some time to do an in-depth assessment of specifically generative AI and automation - in order to discover what is hype and what is a currently useful scenarios.
I will do this in an article series, and this is part 1.
In the article series, I'm going to assess:
I'm going to assess three different use-cases:
In this part (1) of the series, we will assess the development use-case.
My vehicle for assessment will be the models ability to generate Ansible automation from normal written text (aka natural language). Tests were performed during the publishing of this article.
Before we get started, we need to point out what the biggest challenge these models struggle with.
Main challenge for Generative AI
The main challenge is that generative AI does not have a proper definition of what is true. There are only predictions being made, about where the best path is between two points,? assessed using mathematics. You will now see what that can lead to in practical examples. This will allow us to pin point key challenges.
Use-case 1: Development
The first use-case is about developing an arbitrary piece of automation. The natural language which the models uses, to come up with the Ansible automation is described below.
Create an Ansible playbook which installs Apache web server on a Red Hat Enterprise Linux server.
It should further configure apache to listen on port 82 and do the required SELinux configuration to allow that.
Lastly, it should ensure firewalld is installed and open up a firewall opening.
Let's note a few things about this text.
With that said, let's start with the assessment of the different models.
Use-case 1: Development. Model: OpenAI's ChatGPT 3.5
Below we have the output from ChatGPT 3.5, which is the standard model for non-paying users of ChatGPT. Sections of interest are marked with red.
Analysis
The automation generated does not perform it's intended purpose.
The first red marker denotes that there is a bad coding practices, which is not using the FQDN of Ansible module names. This will likely build up technical debt as this will be something that needs fixing at some point of time. If generative AI cannot alone maintain your code, this is a real issue. To avoid it, always run some type of code quality test, such as linting or etc on output.
The second red marker points at a very dangerous thing. ChatGPT 3.5 is correctly describing what needs to be done, but it then does something completely different. What makes this especially dangerous is that this is security related automation and that the automation generated actually works - the issue is just that it's something different from what is described "Allow Apache to bind to port 82 (SELinux)".
It's easy to imagine how a generative AI hallucination like this can wreck havoc on your security - If you asked it to generate security hardening for a system and it correctly describes what hardening should be done - but that it does something completely different.
What type of testing would discovery this type of mistake? In a best-case scenario, an integration test, but that is not for certain, as the change done may not break the implementation. This is a tough one.
The third red markup points out another thing which is difficult to deal with - which is that ChatGPT 3.5 forgot to do something vital. It dropped the starting of the Apache web server. This is difficult to deal with, as this is also a mistake which takes a lot of effort to catch during testing or even human code review.
The forth red markup is that we are doing the correct thing, but in a wrong way. The task denoted configures the systems firewall (firewalld), but instead of opening the firewall on the designated port 82, we open it on port 80. This will of-course break the implementation, even though the automated task itself, will succeed. What ChatGPT does, is perhaps one of the most common configurations for a web server, but what we wanted was something non-standard, which points at a weakness when going outside of what is common.
Use-case 1: Development. Model: OpenAI's ChatGPT 4.0
Below we have the output from ChatGPT 4.0, which is the latest model available for paying users of ChatGPT. Sections of interest are marked with red.
Analysis:
The produced automation works but has a dangerous feature which may be unexpected.
The first and second red markups draws your attention to the section where software is installed. The issue here is that instead of just putting in place software, the automation will ensure that the latest available version of the software is installed. This is normally against best practice for good reasons, because it can lead to you accidentally upgrading your systems. Ansible is best written in an idempotent manner, meaning you can re-run it over and over again with the same result. This piece of Ansible would pass any tests for idempotency, as long as new software is not made available between first and second test, which makes this tricky to discover using automated testing.
In a scenario where a person runs the automation to ensure that for example the web server is configured in the right way, that person could then accidentally upgrade both firewalls and web servers - causing an unexpected system outage. In order to catch this, you need to understand this Ansible best practice, of not using "latest" as an argument for software installations.
领英推荐
Use-case 1: Development. Model: Google Bard
Below we have the output from Google Bard. Sections of interest are marked with red. Google marks Bard experimental at this time.
Analysis
The automation generated does not work due to syntax errors.
The first red markup indicates that the model fails to use the module FQDN development best practice, this generates technical dept we'll likely have to solve later.
The second red markup is the first Bard hallucination, and it's a tricky one. The issue is a syntax error, the match argument to the lineinfile module should be regexp or regex. What makes this a tricky hallucination (the models makes up something which is not correct) is that it's a reasonable hallucination, as the value, which is a regular expression, is correct. It would be common to see such a feature being named match, in other cases. But in this case, it's not correct. So, you have to be quite familiar with the lineinfile module, in order to catch this during code review. An automated test could fairly easily catch it though.
The third red markup denotes a bad practice, which is that Ansible is not written in an idempotent way. Instead, when you run this automation, it will reload the Apache web server, each time. An unexpected reload of the web server configuration can further potentially lead to issues, such as configuration getting loaded at an unexpected time.
The fourth red markup is also an hallucination. Bard has in this instance made up two arguments (target, setype) to the SELinux module. This will cause a syntax error and will not work. What is more interesting is what Bard seems to want to do here. Looking at the state argument of the SELinux module, it seems that Bard tries to disable enforcement of SELinux, which is one of Linux fundamentally important security features. This is likely explained by the fact that historically, disabling SELinux has been a common solution, even though it is a very bad one from a security perspective. So, we got lucky that Bard here generated a syntax error and did not succeed in doing what it seems to want to accomplish here.
The fifth red markup is another missing thing - an argument for the service module, which ensure that the web server is running. What is missing is a vital argument which ensure that the web server automatically starts if the computer is restarted. If we did not catch this issue, the web server would die, in case there would be a restart of the system.
Use-case 1: Development. Model: Meta's LLMA v2 (70b)
The 70 billion parameter version of Meta's LLMA version 2 provides the below output. Sections of interest are marked red.
Analysis
The automation generated does not work due to syntax errors.
The first red markup indicates that we again are missing module FQDNs, technical debt we'll have to correct later.
The second red markup is some bad practice, where we are designating what the file permissions will be, but we do not define what owner and group will own the file. This can be considered a security issue, as the key configuration file of the Apache web server normally should be owned by the systems admin user and group (root).
The third red markup indicates two bad practices, in part, there is no name of the automation task created, which means the automation and it's output becomes more difficult to read. Secondly, the automated task is not idempotent. It will reload the configuration of the web server, each time. Like Google Bard did.
The fourth red markup is about the security related SELinux configuration. LLMA v2 (70b) has here just like Bard hallucinated a number of SELinux module arguments, which are invalid and will cause syntax errors. What makes this hallucination more difficult to catch - is that the arguments and values used are familiar to someone who knows about SELinux. Namespace and secontext are both things we talk about regarding SELinux, so if the person knows SELinux and does not know Ansible or the SELinux Ansible module, this may slip past code review. Permissive: false is nonsensical, as SELinux has three modes, permissive, enabled and disabled. What permissive: false would be, I have no idea...
Use-case 1: Development. Model: Ansible Lightspeed
Ansible Lightspeed output is shown below. Sections of interest are marked red.
Analysis
The automation performs as intended and uses best practices. It's almost a home-run.
Ansible Lightspeed, is a Generative AI driven tool available as part of the free Ansible extention to Microsoft VSCode. It cannot generate a full playbook out of natural language, instead it suggests Ansible automation tasks from the description of the task, like shown below.
Below, the end user has approved the suggestion, which in this case is sane.
I have not corrected any of the suggestions that Ansible Lightspeed gave me, but that does not mean it's a flawless solution. Lightspeed can also hallucinate, but it does have an advantage. It's better at Ansible, as the only language it deals with, is Ansible. Furthermore, the output is run through linting, ensuring that the output is syntax correct. Still, many of the issues we've seen in the other solutions, were often syntax correct and wrong at the same time.
Conclusions
In our first use-case, having Generative AI writing automation code for us, we can conclude that:
That was our first part in my three part series on Generative AI and automation. I hope you learned something. Next part will assess the migration use-case, where we'll use the different models to migrate from one automation language to another.
Senior Data Strategy Advisor (Splunk, Ansible) at Orange Cyberdefense
11 个月Thanks for a well written article with interesting models. I often use ChatGpt for inspiration of snippets where I further build upon that code. I’ll immediately give Lightapeed a test??
Great experiment and writeup Magnus Glantz ?? looking forward to the coming parts.