登录查看更多内容

Analysis: Generative AI and automation: Part 1

Magnus Glantz

Principal Specialist Solution Architect at Red Hat | author | spokesperson | linux | k8s | ansible | security

发布日期: 2023年10月30日

Hello there,

My social media is getting filled to the brim with influencers which are letting me know that some new AI powered tool will change the world for ever. At the same time, everyone from business leaders to politicians are trying to wrap their head around potential use and dangers of AI. With this in mind, I thought for some time to do an in-depth assessment of specifically generative AI and automation - in order to discover what is hype and what is a currently useful scenarios.

I will do this in an article series, and this is part 1.

In the article series, I'm going to assess:

OpenAI's ChatGPT 3.5
OpenAI's ChatGPT 4.0
Google's Bard
Meta's LLAMA v2 (70b)
Red Hat's Lightspeed (a generative AI tool for Ansible).

I'm going to assess three different use-cases:

Development (part 1)
Migration (part 2)
Troubleshooting (part 3)

In this part (1) of the series, we will assess the development use-case.

My vehicle for assessment will be the models ability to generate Ansible automation from normal written text (aka natural language). Tests were performed during the publishing of this article.

Before we get started, we need to point out what the biggest challenge these models struggle with.

Main challenge for Generative AI

A picture that traces a line between two points on a geographical landscape

The main challenge is that generative AI does not have a proper definition of what is true. There are only predictions being made, about where the best path is between two points,? assessed using mathematics. You will now see what that can lead to in practical examples. This will allow us to pin point key challenges.

Use-case 1: Development

The first use-case is about developing an arbitrary piece of automation. The natural language which the models uses, to come up with the Ansible automation is described below.

Create an Ansible playbook which installs Apache web server on a Red Hat Enterprise Linux server. 

It should further configure apache to listen on port 82 and do the required SELinux configuration to allow that.

Lastly, it should ensure firewalld is installed and open up a firewall opening.

Let's note a few things about this text.

Technical description: The purpose is to get an Apache web server installed and running on a Linux operating system called Red Hat Enterprise Linux. I'm directing that some non-standard configuration of this services is used (listening to port 82) and that some fundamental security features are used (a firewall and something called SELinux).
It's written by a person (me) who is an expert of the domain to automate (Linux), the automation language (Ansible) and to some degree generative AI in the context of automation.
I have on-purpose written the text so that it includes, for example, some information about specific keys tasks and in what order things needs to be done.
Some information is not described on purpose, as we cannot expect that the person writing the natural language is an expert who knows exactly how to get the job done.

With that said, let's start with the assessment of the different models.

Use-case 1: Development. Model: OpenAI's ChatGPT 3.5

Below we have the output from ChatGPT 3.5, which is the standard model for non-paying users of ChatGPT. Sections of interest are marked with red.

Analysis

The automation generated does not perform it's intended purpose.

The first red marker denotes that there is a bad coding practices, which is not using the FQDN of Ansible module names. This will likely build up technical debt as this will be something that needs fixing at some point of time. If generative AI cannot alone maintain your code, this is a real issue. To avoid it, always run some type of code quality test, such as linting or etc on output.

The second red marker points at a very dangerous thing. ChatGPT 3.5 is correctly describing what needs to be done, but it then does something completely different. What makes this especially dangerous is that this is security related automation and that the automation generated actually works - the issue is just that it's something different from what is described "Allow Apache to bind to port 82 (SELinux)".

It's easy to imagine how a generative AI hallucination like this can wreck havoc on your security - If you asked it to generate security hardening for a system and it correctly describes what hardening should be done - but that it does something completely different.

What type of testing would discovery this type of mistake? In a best-case scenario, an integration test, but that is not for certain, as the change done may not break the implementation. This is a tough one.

The third red markup points out another thing which is difficult to deal with - which is that ChatGPT 3.5 forgot to do something vital. It dropped the starting of the Apache web server. This is difficult to deal with, as this is also a mistake which takes a lot of effort to catch during testing or even human code review.

The forth red markup is that we are doing the correct thing, but in a wrong way. The task denoted configures the systems firewall (firewalld), but instead of opening the firewall on the designated port 82, we open it on port 80. This will of-course break the implementation, even though the automated task itself, will succeed. What ChatGPT does, is perhaps one of the most common configurations for a web server, but what we wanted was something non-standard, which points at a weakness when going outside of what is common.

Use-case 1: Development. Model: OpenAI's ChatGPT 4.0

Below we have the output from ChatGPT 4.0, which is the latest model available for paying users of ChatGPT. Sections of interest are marked with red.

Analysis:

The produced automation works but has a dangerous feature which may be unexpected.

The first and second red markups draws your attention to the section where software is installed. The issue here is that instead of just putting in place software, the automation will ensure that the latest available version of the software is installed. This is normally against best practice for good reasons, because it can lead to you accidentally upgrading your systems. Ansible is best written in an idempotent manner, meaning you can re-run it over and over again with the same result. This piece of Ansible would pass any tests for idempotency, as long as new software is not made available between first and second test, which makes this tricky to discover using automated testing.

In a scenario where a person runs the automation to ensure that for example the web server is configured in the right way, that person could then accidentally upgrade both firewalls and web servers - causing an unexpected system outage. In order to catch this, you need to understand this Ansible best practice, of not using "latest" as an argument for software installations.

领英推荐

3 GenAI prompting methods worth trying

Coursera 2 个月前

Generative A.I. Tools are a big deal in 2023

Michael Spencer 1 年前

?? Pepperoni Hug Spot

Product Hunt 1 年前

Use-case 1: Development. Model: Google Bard

Below we have the output from Google Bard. Sections of interest are marked with red. Google marks Bard experimental at this time.

Analysis

The automation generated does not work due to syntax errors.

The first red markup indicates that the model fails to use the module FQDN development best practice, this generates technical dept we'll likely have to solve later.

The second red markup is the first Bard hallucination, and it's a tricky one. The issue is a syntax error, the match argument to the lineinfile module should be regexp or regex. What makes this a tricky hallucination (the models makes up something which is not correct) is that it's a reasonable hallucination, as the value, which is a regular expression, is correct. It would be common to see such a feature being named match, in other cases. But in this case, it's not correct. So, you have to be quite familiar with the lineinfile module, in order to catch this during code review. An automated test could fairly easily catch it though.

The third red markup denotes a bad practice, which is that Ansible is not written in an idempotent way. Instead, when you run this automation, it will reload the Apache web server, each time. An unexpected reload of the web server configuration can further potentially lead to issues, such as configuration getting loaded at an unexpected time.

The fourth red markup is also an hallucination. Bard has in this instance made up two arguments (target, setype) to the SELinux module. This will cause a syntax error and will not work. What is more interesting is what Bard seems to want to do here. Looking at the state argument of the SELinux module, it seems that Bard tries to disable enforcement of SELinux, which is one of Linux fundamentally important security features. This is likely explained by the fact that historically, disabling SELinux has been a common solution, even though it is a very bad one from a security perspective. So, we got lucky that Bard here generated a syntax error and did not succeed in doing what it seems to want to accomplish here.

The fifth red markup is another missing thing - an argument for the service module, which ensure that the web server is running. What is missing is a vital argument which ensure that the web server automatically starts if the computer is restarted. If we did not catch this issue, the web server would die, in case there would be a restart of the system.

Use-case 1: Development. Model: Meta's LLMA v2 (70b)

The 70 billion parameter version of Meta's LLMA version 2 provides the below output. Sections of interest are marked red.

Analysis

The automation generated does not work due to syntax errors.

The first red markup indicates that we again are missing module FQDNs, technical debt we'll have to correct later.

The second red markup is some bad practice, where we are designating what the file permissions will be, but we do not define what owner and group will own the file. This can be considered a security issue, as the key configuration file of the Apache web server normally should be owned by the systems admin user and group (root).

The third red markup indicates two bad practices, in part, there is no name of the automation task created, which means the automation and it's output becomes more difficult to read. Secondly, the automated task is not idempotent. It will reload the configuration of the web server, each time. Like Google Bard did.

The fourth red markup is about the security related SELinux configuration. LLMA v2 (70b) has here just like Bard hallucinated a number of SELinux module arguments, which are invalid and will cause syntax errors. What makes this hallucination more difficult to catch - is that the arguments and values used are familiar to someone who knows about SELinux. Namespace and secontext are both things we talk about regarding SELinux, so if the person knows SELinux and does not know Ansible or the SELinux Ansible module, this may slip past code review. Permissive: false is nonsensical, as SELinux has three modes, permissive, enabled and disabled. What permissive: false would be, I have no idea...

Use-case 1: Development. Model: Ansible Lightspeed

Ansible Lightspeed output is shown below. Sections of interest are marked red.

Analysis

The automation performs as intended and uses best practices. It's almost a home-run.

Ansible Lightspeed, is a Generative AI driven tool available as part of the free Ansible extention to Microsoft VSCode. It cannot generate a full playbook out of natural language, instead it suggests Ansible automation tasks from the description of the task, like shown below.

Below, the end user has approved the suggestion, which in this case is sane.

I have not corrected any of the suggestions that Ansible Lightspeed gave me, but that does not mean it's a flawless solution. Lightspeed can also hallucinate, but it does have an advantage. It's better at Ansible, as the only language it deals with, is Ansible. Furthermore, the output is run through linting, ensuring that the output is syntax correct. Still, many of the issues we've seen in the other solutions, were often syntax correct and wrong at the same time.

Conclusions

In our first use-case, having Generative AI writing automation code for us, we can conclude that:

The hit rate of the tested Generative AI models were not good
There is an opportunity to use Generative AI to increase developer efficiency
In order to use the models safely, you need to be an expert of both the language generated and the technical domain you are writing solutions for
Saved time is often brutally offset by the advanced level of testing that would be required to ensure that the models hallucinations does not blow your organization to pieces. Issues includes, syntax errors, missing code snippets, correctly documented code which does the wrong thing and plausibly correct syntax errors.
An IDE extension is a useful vehicle for generative AI...

That was our first part in my three part series on Generative AI and automation. I hope you learned something. Next part will assess the migration use-case, where we'll use the different models to migrate from one automation language to another.

Worth noting is perhaps that I do cover AI in my book, Strategy Guide for Automation, in regards to impact on organizations future automation strategy ;)

Roger Lindquist

Senior Data Strategy Advisor (Splunk, Ansible) at Orange Cyberdefense

11 个月

Thanks for a well written article with interesting models. I often use ChatGpt for inspiration of snippets where I further build upon that code. I’ll immediately give Lightapeed a test??

1 次回应

Johan Odell

11 个月

Great experiment and writeup Magnus Glantz ?? looking forward to the coming parts.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Analysis: Generative AI and automation: Part 1

Magnus Glantz

Principal Specialist Solution Architect at Red Hat | author | spokesperson | linux | k8s | ansible | security

Main challenge for Generative AI

Use-case 1: Development

Use-case 1: Development. Model: OpenAI's ChatGPT 3.5

Use-case 1: Development. Model: OpenAI's ChatGPT 4.0

领英推荐

Use-case 1: Development. Model: Google Bard

Use-case 1: Development. Model: Meta's LLMA v2 (70b)

Use-case 1: Development. Model: Ansible Lightspeed

Conclusions

更多精彩文章

社区洞察

其他会员也浏览了

Media Mix Modeling, ML Safety Concerns with LLMs, and Data Engineering Cloud Options

Latest AI, Crypto News Headlines for July 3, 2023

Latest AI, Crypto News Headlines for June 8, 2023

Artificial Intelligence No 96: A three step generic strategy for GPT-3/LLM development

Unlock your Prompting Potential in Google's Gemini: A Guide to Optimising AI Prompts - The Daily Dose of Digital - 16/05/24

Google Gemini Vs. OpenAI's ChatGPT

OpenAI Developers Day – Many good things are coming our way

The Top 50 AI Tools Trending in June 2024: A Comprehensive Guide.

?? Revealed: Top Generative AI Assistant of 2024: The Ultimate Showdown You Can't Afford to Miss!

How to build effective AI automation applications - without the hype

Main challenge for Generative AI

Use-case 1: Development

Use-case 1: Development. Model: OpenAI's ChatGPT 3.5

Use-case 1: Development. Model: OpenAI's ChatGPT 4.0

领英推荐

Use-case 1: Development. Model: Google Bard

Use-case 1: Development. Model: Meta's LLMA v2 (70b)

Use-case 1: Development. Model: Ansible Lightspeed

Conclusions

Logging AAP information on a target system

2024年8月12日

Protecting Ansible code and inventories using signing!

2024年4月22日

YubiKey 2FA for Fedora 39

2024年1月15日

DALL-E: Make it more open source!

2023年11月29日

The secret behind Fedora, CentOS and RHEL

2023年6月30日

The problem with Rocky Linux and free beer

2023年6月22日

Event driven automation with Ansible

2022年11月2日

Open Source and the SolarWinds hack

2020年12月17日

CentOS changes

2020年12月9日

What a serious automation strategy looks like

2020年10月12日

社区洞察

其他会员也浏览了

Media Mix Modeling, ML Safety Concerns with LLMs, and Data Engineering Cloud Options

Latest AI, Crypto News Headlines for July 3, 2023

Latest AI, Crypto News Headlines for June 8, 2023

Artificial Intelligence No 96: A three step generic strategy for GPT-3/LLM development

Unlock your Prompting Potential in Google's Gemini: A Guide to Optimising AI Prompts - The Daily Dose of Digital - 16/05/24

Google Gemini Vs. OpenAI's ChatGPT

OpenAI Developers Day – Many good things are coming our way

The Top 50 AI Tools Trending in June 2024: A Comprehensive Guide.

?? Revealed: Top Generative AI Assistant of 2024: The Ultimate Showdown You Can't Afford to Miss!

How to build effective AI automation applications - without the hype