登录查看更多内容

Rubberducking iptables with ChatGPT

Dan Morgan

Founder/Architect @ MORGANGRAPHICS | Scalable Web Applications

发布日期: 2023年6月26日

TL;DR - ChatGPT's confidently incorrect answers, a known issue with Large Language Models's (LLM's) called hallucinations, make it unsuitable for many applications without some human oversight, intervention, and verification. Jump to the conclusion if you don't want to get into the gory details of iptables.

In recent months you may have heard quite a lot about ChatGPT (and it's brethren) claiming to be, and do everything. You may have heard tale of Artificial Intelligence (AI) replacing certain segments of jobs like Administrative or Paralegal work, developers asking for it to spit out code for API's, or potentially even replacing developers all together.

My exploration with ChatGPT was an attempt to emulate a mentor/mentee relationship often found in software development. This type of relationship can often lead to a programming exercise called rubberducking or Rubber Duck Debugging; the ability to break down a problem into its most basic parts in an attempt to simplify an approach or solution to the issue.

I wanted to work with ChatGPT to suss out some ideas for another article/how-to I am writing. Specifically, with a subject/technology that is objectively hard/difficult to understand, iptables (a Linux kernel firewall).

Background

To provide a little context, after reading some poor advice on StackOverflow, I began writing a How-To guide meant to streamline getting an app, safely, and securely, from a development machine into a production environment.

In addition to the technologies to develop an app, there can be dozens of other technologies and/or concepts a developer may have to learn just to be able to get the app in the cloud.

One of the technologies not listed above, and objectively, a less understood technology for any developer, is iptables. Iptables can be a complicated concept to understand, especially for newcomers.

What needs to be accomplished with iptables?

We need to lock down the virtual machine the app is running on to make it secure as possible. To do this, we want to

Only allow traffic from foo.com on port 80 and block everything else
Route traffic from foo.com on port 80 to port 3000 where the app is hosted

What do I bring to the table?

The general concepts and ideas behind securing a virtual machine, networking, data flow, Network Address Translation (NAT), and Domain Name System (DNS)

foo.com is is behind a load balancer. This will be important later on

Some basics of iptables syntax

Now what?

A very, very quick, introductory primer on the concept of iptables

Internet traffic can flow through one of two paths, then through a series of gates.

Path 1: A => B=> C => D => Z Path 2: A => E => F => G=> Z

Gates A and Z just indicates that traffic starts at and ends at the same place regardless of path.

The gates do different things, like allow, disallow, filter, or redirect traffic to other parts of the network and contain two types of doors per gate, an IN and OUT door. The gating logic follows two basic rules:

Gates can only ever be in the same order as above.
You may omit one or more gates from being used e.g. B => C => Z, however, reordering the gates e.g. A => C => B is not allowed.

Initial interaction with ChatGPT

Below are two iptable filters that ChatGPT made based on the goals above

sudo iptables -A INPUT -p tcp --dport 80 -m string --string "Host: foo.com" --algo bm -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 80 -j DROP

The filters above may be very confusing too many of you not familiar with the intricacies of iptables filter syntax. So just what is going on around here?

The first filter is looking if incoming traffic is coming from foo.com.
The second filter will drop all other incoming traffic

If you are familiar with iptables syntax, you might notice that the second rule will also drop ALL tcp traffic on the host machine, including http/https, ssh, and dns, so we're already off to a rough start. More on this to follow.

What ChatGPT does surprisingly well is summarizing what is going on in the above iptables filters with more natural language, it even points out a potential security issue in one of the filters:

ChatGPT:

The first rule uses the -m string option to match packets that contain the string "Host: foo.com" in the TCP payload. The --algo bm option specifies the Boyer-Moore algorithm to be used for the string search. If a packet contains this string, it is accepted and allowed through the firewall. If not, it is dropped by the second rule.

Note that string matching can be resource-intensive, so it may impact performance if you have a high volume of traffic. Additionally, this method may not be foolproof if an attacker spoofs the Host header in their requests.

Interestingly, ChatGPT understands when a filter is incorrect and offer suggestions for filters that are

Is this a valid iptables rule sudo iptables --insert INPUT --protocol tcp --destination-port 80 -m string --string "foo.com" --algo bm -j ACCEPT -j REDIRECT --to-port 3000

ChatGPT:

No, the rule you provided is not valid. There are a couple of issues with the syntax. Here's the corrected version:

sudo iptables -A INPUT -p tcp --destination-port 80 -m string --string "foo.com" --algo bm -j ACCEPT 
sudo iptables -A INPUT -p tcp --destination-port 80 -j REDIRECT --to-port 3000

Response edited for brevity.

Creating the correct set of filters for the how-to is like generating a travel itinerary. We have a list of all the places we'd like to visit but we're missing a map and train schedule to make the most efficient use of our time and money.

After mashing together a couple of things that might work, I asked

What is this rule doing?

sudo iptables --insert INPUT --protocol tcp --destination-port 80 -m string --string "foo.com" --algo bm --jump REDIRECT --to-port 3000

ChatGPT:

The rule you provided is inserting a new iptables rule into the INPUT chain that handles incoming TCP traffic on port 80. Here's a breakdown of what the rule does:

sudo iptables -I INPUT -p tcp --destination-port 80 -m string --string "foo.com" --algo bm -j REDIRECT --to-port 3000

-I INPUT: Insert the rule at the beginning of the INPUT chain.

-p tcp: Match TCP packets.

--destination-port 80: Match packets with a destination port of 80.

-m string --string "foo.com" --algo bm: Use string matching to search for the string "foo.com" in the packet payload using the Boyer-Moore algorithm. This matches packets containing this specific string.

-j REDIRECT --to-port 3000: If a packet matches the previous conditions, redirect it (-j REDIRECT) to port 3000 (--to-port 3000).

In summary, this rule allows incoming TCP traffic on port 80 that contains the string "foo.com" in the payload to be redirected to port 3000. It is commonly used for scenarios where you want to redirect traffic from a specific domain to a different port for further processing or handling.

ChatCPT's response to the question came as a surprise. It was such an authoritative answer to my question, after all it sounded perfectly reasonable. Only ChatGPT's answer is confidently incorrect (referred to as a hallucination). The -j REDIRECT is only allowed at gate A and we're working at gate B, when testing the rule, it produced the following error.

Redirect target is only valid at gate A

ChatGPT is quick to apologize and offer another solution

ChatGPT:

Apologies for the confusion. You are correct. The --to-port option with the REDIRECT target is only valid at gate A, not at gate B (where the INPUT chain resides).

If you want to redirect incoming TCP traffic on port 80 to port 3000, you can achieve it using the DNAT target gate A. Here's the corrected rule:

sudo iptables -t nat -A PREROUTING -p tcp --destination-port 80 -j DNAT --to-destination :3000

This rule adds a DNAT (Destination NAT) rule to the PREROUTING chain of gate A. It matches incoming TCP packets with a destination port of 80 and changes the destination address to the local IP address and port 3000. This effectively redirects the traffic to port 3000 on the local machine.

Unfortunately, ChatGPT's suggestion is situated at gate A again. What is wrong with gate A? Based on what we already know about iptables, the types of filters allowed at gate A make it impossible for any filter to inspect incoming traffic for foo.com, inspection for foo.com can only happen at gate B.

Would the filter at gate A bypass the iptables filter for gate B?

ChatGPT:

Yes, you're right. If you have a PREROUTING rule gate A that redirects incoming traffic on port 80 to port 3000, it will bypass gate B, including the INPUT rule you mentioned:

sudo iptables --insert INPUT --protocol tcp --destination-port 80 -m string --string "foo.com" --algo bm --jump ACCEPT

The PREROUTING chain gate A is processed before the INPUT chain gate B. So, if a packet is redirected to port 3000 at gate A, it won't reach gate B for further processing, including the ACCEPT rule.

If you want to combine both the string matching and port redirection, you could modify the rule as follows:

sudo iptables -t nat -A PREROUTING -p tcp --destination-port 80 -m string --string "foo.com" --algo bm -j REDIRECT --to-ports 3000

Response edited for brevity.

Another authoritative answer to the question, only ChatGPT's answer is still confidently incorrect. Gate A has not processed/inspected any traffic due to how gate A works, it cannot inspect traffic for foo.com the way ChatGPT is suggesting

PREROUTING packets have not been processed at gate A, how can you match a string?

ChatGPT:

Response edited for brevity.

sudo iptables -t mangle -A PREROUTING -p tcp --dport 80 ! -s foo.com -j MARK --set-mark 1 
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -m mark --mark 1 -j REDIRECT --to-port 3000

Raj Gupta 9 个月前

ChatGPT Pro Will Arrive Soon & How much Will it cost?

Mehedi Hasan 1 年前

Not tinker, nor tailor, not soldier, but maybe spy –…

Ben Cole 1 年前

Let's break down the rules:

The first rule in the mangle table marks all TCP packets with a destination port of 80 (--dport 80) that are not from foo.com (! -s foo.com). The packets are marked with the value 1 using the MARK target (-j MARK --set-mark 1).

The second rule in the nat table checks for packets in the PREROUTING chain that are destined for port 80 (--dport 80) and have the mark value of 1 (-m mark --mark 1). These packets are then redirected to port 3000 using the REDIRECT target (-j REDIRECT --to-port 3000).

These rules ensure that TCP traffic on port 80, except for traffic from foo.com, is marked and then redirected to port 3000.

Another authoritative answer to the question, however, there are a few things to be concerned about based on what I already know about iptables

We should be working in gateB and ChatGPT is still suck in gate A. With the way the filter is written, all traffic bypasses gate B altogether
the -s does a DNS lookup which is not recommended by the iptables documentation (please note that specifying any name to be resolved with a remote query such as DNS is a really bad idea). -s will resolve to, and then store the filter as an IP address. Do you remember above where foo.com is behind a load balancer? This rule would only ever work if the traffic was coming from the original IP address the DNS lookup resolved to. I'd have to run the rule multiple times to get all the possible IP address and that list of IP addresses would likely be out of date very quickly, so using -s is a no go
ChatGPT added some logic complexity to the filter, while it might be valid, it is not ideal for my needs, if foo.com adds a second domain like dev-foo.com, I'd have to duplicate the filters to accommodate the second domain.

Doesn't -s require a DNS lookup?

ChatGPT:

Response edited for brevity.

sudo iptables -A INPUT -p tcp --dport 80 -m string --string "Host: foo.com" --algo bm -j ACCEPT 
sudo iptables -A INPUT -p tcp --dport 80 -j DROP 
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination :3000

Let's go through each rule:

The first rule matches incoming TCP packets on port 80 that contain the string "Host: foo.com" in the payload (HTTP Host header) using string matching. If a packet matches, it is accepted (-j ACCEPT). Modify foo.com to the desired domain name.

The second rule drops (-j DROP) all other incoming TCP packets on port 80 that did not match the first rule. This effectively blocks those packets.

The third rule in the nat table's PREROUTING chain redirects incoming TCP packets on port 80 to port 3000 using DNAT (Destination NAT). This rule applies to all traffic, regardless of the source, since it is in the PREROUTING chain.

With these rules, incoming TCP traffic on port 80 is filtered based on the presence of the "Host: foo.com" string. Only packets with the matching string are allowed and forwarded to port 3000, while all other packets are dropped.

At this point, I realize my issues with ChatGPT are 100% related to the way I phrased the original question. Where as a human would be able to infer what I was asking, ChatGPT cannot, and hasn't asked additional questions to figure it out. Yes, the answer provided by ChatGPT addresses at least one part of the issue, however,

it completely misses the intent of the question
it ignores how iptables actually works
only 1 filter actually does what it's supposed to do

At this point I realize that ChatGPT doesn't have any context as to how network traffic flows through iptables. When I point out that the solution provided above will not work, ChatGPT offers another, very similar solution, that still will not work.

Back to the drawing board

These hallucinations were getting me nowhere. After doing a bunch of reading and looking at a number of iptables and network flow diagrams I piece together a new recipe that should address the task at hand. I use ChatGPT to validate these assumptions

I believe that the path must be

1. Gate A, Input Door, redirect port 80 to port 3000

2. Gate B, Input Door, allow foo.com only on port 3000

3.Gate B, Input Door, drop incoming traffic for everyone else

4. Gate B, Output Door, allow foo.com only on port 3000

5. Gate A, Output Door, redirect port 3000 to port 80

As you can see, the level of detail and complexity in my question is increasing. I have to be very specific as to what I want from ChatGPT to prevent further hallucinations

ChatGPT:

Response edited for brevity.

1. NAT PREROUTING - Redirect incoming traffic on port 80 to port 3000:

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 3000

2. FILTER INPUT - Allow incoming traffic on port 3000 from foo.com:

sudo iptables -A INPUT -p tcp --dport 3000 -s foo.com -j ACCEPT

3. FILTER INPUT - Drop incoming traffic on port 3000 from all other sources:

sudo iptables -A INPUT -p tcp --dport 3000 -j DROP

4. NAT OUTPUT - Redirect outgoing traffic on port 3000 to port 80:

sudo iptables -t nat -A OUTPUT -p tcp --dport 3000 -j REDIRECT --to-port 80

5. FILTER OUTPUT - Allow outgoing traffic on port 80 to foo.com:

sudo iptables -A OUTPUT -p tcp --dport 80 -d foo.com -j ACCEPT

These rules work as follows:

Rule 1 redirects incoming traffic on port 80 to port 3000 using NAT PREROUTING.

Rule 2 allows incoming traffic on port 3000 from foo.com using FILTER INPUT.

Rule 3 drops all other incoming traffic on port 3000 using FILTER INPUT.

Rule 4 redirects outgoing traffic on port 3000 to port 80 using NAT OUTPUT.

Rule 5 allows outgoing traffic on port 80 to foo.com using FILTER OUTPUT.

These answers appear, at first glance, to be a little more in line with the initial intent.

Do the iptable rules work?

Yes and No. But mostly No.

Rules 1 and 4 work as expected

Rule 2 and 5 are using -s, which we already know is not an option

Rule 3 is a problem. It blocks ALL tcp traffic, even the traffic we are allowing from foo.com. This is due to a subtlety in how iptables works with new and existing connections. What I didn't know at the time of asking, and what ChatGPT didn't mention, is that I need to keep track of existing connections! Otherwise I'll be kicked out of my SSH session and my scripts wont work as expected.

So what now? What is the answer?

The answer to that question is:

There is still no substitute for doing things the old fashioned way, at least in this particular instance. Reading, experimentation, and scripting with iptables to learn what was happening to validate multiple scenarios was the way to go.

The answer I came up with looks very, very different than what ChatGPT was offering as a solution based on the original question. There are a lot of moving parts to keep track of, some of which are a bit of a black box unrelated to iptables itself, but with 3rd party scripts.

# NAT
iptables -A PREROUTING --table nat --protocol tcp --destination-port 80 --jump REDIRECT --to-port 3000

# INPUT
iptables -A INPUT --protocol tcp --match tcp --match multiport --destination-ports 22,53,443 --jump ACCEPT
iptables -A INPUT --protocol tcp --destination-port 3000 --match string --algo kmp --icase --string foo.com --jump ACCEPT
iptables -A INPUT --match conntrack --ctstate NEW,ESTABLISHED --jump ACCEPT 
iptables -A INPUT --jump DROP

# OUTPUT
iptables -A OUTPUT --protocol tcp --match tcp --match multiport --destination-ports 22,53,443 --jump ACCEPT
iptables -A OUTPUT --match conntrack --jump ACCEPT --ctstate NEW,ESTABLISHED
iptables -A OUTPUT --jump DROP

# NAT
iptables -A OUTPUT --table nat --protocol tcp --source-port 3000 --jump REDIRECT --to-port 80

If you are wondering what is going on here, copy and paste this into ChatGPT and ask it to explain what is going on.

The kicker is, this answer only partially works! Yes, after all this, this isn't a fully functioning answer to the problem at hand.

humanum solutionem (the human solution)

To get some insight as to why the solution was only partially working the way I wanted, I reached out to a human who has experience in networking and iptables. One of the first questions they asked me was

"Why are you inspecting traffic for foo.com at this level?" It would be certainly easier to do it at the application layer.

They were absolutely correct! My previous background with networking was primarily focused on Allow and Deny Lists, that was best practices back in the day and still is in many instances. However, this is not really one of instances. I was so focused on the iptables solution I hadn't considered anything else and ChatGPT didn't ask me any questions as to my end goals.

Conclusion

After working with ChatGPT for more than a month, we're a ways off from the SkyNet apocalypse where ChatGPT will replace developers outright. Even OpenAI, the people behind ChatGPT acknowledge that it can be, and is often wrong. Not only wrong, but confidently incorrect, which may be considered a variation on the Dunning-Kruger effect (emphasis mine). These wrong answers are referred to as hallucinations and are the result of inherent issues in training Large Language Models (LLM's) and a difficult problem to solve in AI.

ChatGPT may produce inaccurate information about people, places, or facts.?ChatGPT May 12 Version

So, how was my ruberducking experience with ChatGPT? I wouldn't do it again, at least not with something as complicated as iptables.

The Good

ChatGPT does a very good job summarizing things, even complex concepts, in an easy to understand way. If I had found an iptable rule out on the interwebs and I didn't know what it was doing, ChatGPT would explain it in a way I'd be able to understand, and that was helpful
This exercise did provide me with enough information to accomplish one of the goals
ChatGPT did point out one potential security issue early on
ChatGPT can be a tool in the developers toolbox
ChatGPT is polite when wrong

The Bad

ChatGPT's confidently incorrect answers made this entire process considerably harder than it needed to be. These kinds of answers are a variation on the Dunning-Kruger effect. (a variation, unlike humans, ChatGPT has no sense of self, ego, or bias to speak of).
An unfortunate result of this exercise is that the suspicion of anything ChatGPT provides to me is very real
ChatGPT can and does provide code with security flaws
ChatGPT is a poor substitute for real life, experienced humans

The Ugly

ChatGPT has a long way to go. It is good for some things like summarization, less so for others like rubberducking.
I wouldn't use ChatGPT again for this kind of exercise, at least for something as complicated as iptables. In the end, I came up with a partially working solution the old fashioned way by learning on my own, using a variety of different sources, experimenting, testing, failing, and testing again, etc.

Recommendations

Come to the table with some knowledge. Preferably more than just an introductory knowledge about the subject matter you are looking to learn about. This is because ChatGPT is a poor substitute for actual knowledge and real world experience.
Don't trust and definitely verify. If you are not validating what ChatGPT is telling you, you're gonna have a bad time, otherwise you may wind up looking foolish
Try a different approach, like the Pareto Principal a.k.a. the 80/20 rule. There are a few good articles (here, and here) that highlight what ChatGPT is very good at. Admittedly, my experience is not entirely the fault of ChatGPT. I was not asking the correct questions in the correct way to maximize my time and effectiveness with ChatGPT.

Have you used #chatgpt as a development tool? Did #chatgpt live up to your expectations? What were your revelations? Pain points? I'm interested in learning more about your experiences with #chatgpt. Reach out and comment below.

Stanley Russel

1 年

I found your experience with OpenAI's chatGPT very intriguing! I'm curious if you believe that using AI-driven prompt engineering could lead to an increased efficiency in managing complex programming tasks. I'd also be interested in hearing more about how you think the AI-generated "hallucinations" could shape our perception of LMLs and other automated tools. Could this be a revolutionary way of developing software? Fascinating stuff!

Mike Morgan PhD

Emerging Science & Technology Professional

1 年

I found the article interesting. I've been working directly with LLMs and am also intrigued by the notions of "AI Hallucinations." A couple things I've read recently. 1. "All LLMs will hallucinate" regardless of how they were trained and on what data. 2. "AI Hallucinations" are primarily driven by the 'prompt" in other words, if you intentionally want the LLM to hallucinate you can leverage the prompt do so. There has been a lot of recent effort to develop technologies to rid LLMs of AI Hallucinations. Good article!

查看更多评论

要查看或添加评论，请登录

Rubberducking iptables with ChatGPT

Dan Morgan

Founder/Architect @ MORGANGRAPHICS | Scalable Web Applications

Background

What needs to be accomplished with iptables?

A very, very quick, introductory primer on the concept of iptables

Initial interaction with ChatGPT

领英推荐

Back to the drawing board

Do the iptable rules work?

So what now? What is the answer?

humanum solutionem (the human solution)

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Claude 3.5 vs ChatGPT 4o

Two of my recent encounters with ChatGPT

ChatJPT

Siri vs Jarvis: How to integrate Siri with ChatGPT

Understanding Temperature in LLMs

Is AI Replacing Me?

Uses of ChatGPT...

Explain OpenAi’s ChatGPT to me, like I’m 10!

Weirdly ChatGPT has turned into my personal Rubber Duck

chatGPT and large inputs – does chatGPT remember?

Background

What needs to be accomplished with iptables?

A very, very quick, introductory primer on the concept of iptables

Initial interaction with ChatGPT

领英推荐

Back to the drawing board

Do the iptable rules work?

So what now? What is the answer?

humanum solutionem (the human solution)

Conclusion

Meetups - One of the best experiences in my life

2016年2月25日

TOR network traffic visualization with JS

2016年1月19日

社区洞察

其他会员也浏览了

Claude 3.5 vs ChatGPT 4o

Two of my recent encounters with ChatGPT

ChatJPT

Siri vs Jarvis: How to integrate Siri with ChatGPT

Understanding Temperature in LLMs

Is AI Replacing Me?

Uses of ChatGPT...

Explain OpenAi’s ChatGPT to me, like I’m 10!

Weirdly ChatGPT has turned into my personal Rubber Duck

chatGPT and large inputs – does chatGPT remember?