Rubberducking iptables with ChatGPT
TL;DR - ChatGPT's confidently incorrect answers, a known issue with Large Language Models's (LLM's) called hallucinations, make it unsuitable for many applications without some human oversight, intervention, and verification. Jump to the conclusion if you don't want to get into the gory details of iptables.
In recent months you may have heard quite a lot about ChatGPT (and it's brethren) claiming to be, and do everything. You may have heard tale of Artificial Intelligence (AI) replacing certain segments of jobs like Administrative or Paralegal work, developers asking for it to spit out code for API's, or potentially even replacing developers all together.
My exploration with ChatGPT was an attempt to emulate a mentor/mentee relationship often found in software development. This type of relationship can often lead to a programming exercise called rubberducking or Rubber Duck Debugging; the ability to break down a problem into its most basic parts in an attempt to simplify an approach or solution to the issue.
I wanted to work with ChatGPT to suss out some ideas for another article/how-to I am writing. Specifically, with a subject/technology that is objectively hard/difficult to understand, iptables (a Linux kernel firewall).
Background
To provide a little context, after reading some poor advice on StackOverflow, I began writing a How-To guide meant to streamline getting an app, safely, and securely, from a development machine into a production environment.
In addition to the technologies to develop an app, there can be dozens of other technologies and/or concepts a developer may have to learn just to be able to get the app in the cloud.
One of the technologies not listed above, and objectively, a less understood technology for any developer, is iptables. Iptables can be a complicated concept to understand, especially for newcomers.
What needs to be accomplished with iptables?
We need to lock down the virtual machine the app is running on to make it secure as possible. To do this, we want to
What do I bring to the table?
The general concepts and ideas behind securing a virtual machine, networking, data flow, Network Address Translation (NAT), and Domain Name System (DNS)
foo.com is is behind a load balancer. This will be important later on
Some basics of iptables syntax
Now what?
A very, very quick, introductory primer on the concept of iptables
Internet traffic can flow through one of two paths, then through a series of gates.
Path 1: A => B=> C => D => Z Path 2: A => E => F => G=> Z
Gates A and Z just indicates that traffic starts at and ends at the same place regardless of path.
The gates do different things, like allow, disallow, filter, or redirect traffic to other parts of the network and contain two types of doors per gate, an IN and OUT door. The gating logic follows two basic rules:
Initial interaction with ChatGPT
Below are two iptable filters that ChatGPT made based on the goals above
sudo iptables -A INPUT -p tcp --dport 80 -m string --string "Host: foo.com" --algo bm -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 80 -j DROP
The filters above may be very confusing too many of you not familiar with the intricacies of iptables filter syntax. So just what is going on around here?
If you are familiar with iptables syntax, you might notice that the second rule will also drop ALL tcp traffic on the host machine, including http/https, ssh, and dns, so we're already off to a rough start. More on this to follow.
What ChatGPT does surprisingly well is summarizing what is going on in the above iptables filters with more natural language, it even points out a potential security issue in one of the filters:
ChatGPT:
The first rule uses the -m string option to match packets that contain the string "Host: foo.com" in the TCP payload. The --algo bm option specifies the Boyer-Moore algorithm to be used for the string search. If a packet contains this string, it is accepted and allowed through the firewall. If not, it is dropped by the second rule.
Note that string matching can be resource-intensive, so it may impact performance if you have a high volume of traffic. Additionally, this method may not be foolproof if an attacker spoofs the Host header in their requests.
Interestingly, ChatGPT understands when a filter is incorrect and offer suggestions for filters that are
D:
Is this a valid iptables rule sudo iptables --insert INPUT --protocol tcp --destination-port 80 -m string --string "foo.com" --algo bm -j ACCEPT -j REDIRECT --to-port 3000
ChatGPT:
No, the rule you provided is not valid. There are a couple of issues with the syntax. Here's the corrected version:
sudo iptables -A INPUT -p tcp --destination-port 80 -m string --string "foo.com" --algo bm -j ACCEPT
sudo iptables -A INPUT -p tcp --destination-port 80 -j REDIRECT --to-port 3000
Response edited for brevity.
Creating the correct set of filters for the how-to is like generating a travel itinerary. We have a list of all the places we'd like to visit but we're missing a map and train schedule to make the most efficient use of our time and money.
After mashing together a couple of things that might work, I asked
D:
What is this rule doing?
sudo iptables --insert INPUT --protocol tcp --destination-port 80 -m string --string "foo.com" --algo bm --jump REDIRECT --to-port 3000
ChatGPT:
The rule you provided is inserting a new iptables rule into the INPUT chain that handles incoming TCP traffic on port 80. Here's a breakdown of what the rule does:
sudo iptables -I INPUT -p tcp --destination-port 80 -m string --string "foo.com" --algo bm -j REDIRECT --to-port 3000
-I INPUT: Insert the rule at the beginning of the INPUT chain.
-p tcp: Match TCP packets.
--destination-port 80: Match packets with a destination port of 80.
-m string --string "foo.com" --algo bm: Use string matching to search for the string "foo.com" in the packet payload using the Boyer-Moore algorithm. This matches packets containing this specific string.
-j REDIRECT --to-port 3000: If a packet matches the previous conditions, redirect it (-j REDIRECT) to port 3000 (--to-port 3000).
In summary, this rule allows incoming TCP traffic on port 80 that contains the string "foo.com" in the payload to be redirected to port 3000. It is commonly used for scenarios where you want to redirect traffic from a specific domain to a different port for further processing or handling.
ChatCPT's response to the question came as a surprise. It was such an authoritative answer to my question, after all it sounded perfectly reasonable. Only ChatGPT's answer is confidently incorrect (referred to as a hallucination). The -j REDIRECT is only allowed at gate A and we're working at gate B, when testing the rule, it produced the following error.
D:
Redirect target is only valid at gate A
ChatGPT is quick to apologize and offer another solution
ChatGPT:
Apologies for the confusion. You are correct. The --to-port option with the REDIRECT target is only valid at gate A, not at gate B (where the INPUT chain resides).
If you want to redirect incoming TCP traffic on port 80 to port 3000, you can achieve it using the DNAT target gate A. Here's the corrected rule:
sudo iptables -t nat -A PREROUTING -p tcp --destination-port 80 -j DNAT --to-destination :3000
This rule adds a DNAT (Destination NAT) rule to the PREROUTING chain of gate A. It matches incoming TCP packets with a destination port of 80 and changes the destination address to the local IP address and port 3000. This effectively redirects the traffic to port 3000 on the local machine.
Unfortunately, ChatGPT's suggestion is situated at gate A again. What is wrong with gate A? Based on what we already know about iptables, the types of filters allowed at gate A make it impossible for any filter to inspect incoming traffic for foo.com, inspection for foo.com can only happen at gate B.
D:
Would the filter at gate A bypass the iptables filter for gate B?
ChatGPT:
Yes, you're right. If you have a PREROUTING rule gate A that redirects incoming traffic on port 80 to port 3000, it will bypass gate B, including the INPUT rule you mentioned:
sudo iptables --insert INPUT --protocol tcp --destination-port 80 -m string --string "foo.com" --algo bm --jump ACCEPT
The PREROUTING chain gate A is processed before the INPUT chain gate B. So, if a packet is redirected to port 3000 at gate A, it won't reach gate B for further processing, including the ACCEPT rule.
If you want to combine both the string matching and port redirection, you could modify the rule as follows:
sudo iptables -t nat -A PREROUTING -p tcp --destination-port 80 -m string --string "foo.com" --algo bm -j REDIRECT --to-ports 3000
Response edited for brevity.
Another authoritative answer to the question, only ChatGPT's answer is still confidently incorrect. Gate A has not processed/inspected any traffic due to how gate A works, it cannot inspect traffic for foo.com the way ChatGPT is suggesting
D:
PREROUTING packets have not been processed at gate A, how can you match a string?
ChatGPT:
Response edited for brevity.
sudo iptables -t mangle -A PREROUTING -p tcp --dport 80 ! -s foo.com -j MARK --set-mark 1
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -m mark --mark 1 -j REDIRECT --to-port 3000
领英推荐
Let's break down the rules:
The first rule in the mangle table marks all TCP packets with a destination port of 80 (--dport 80) that are not from foo.com (! -s foo.com). The packets are marked with the value 1 using the MARK target (-j MARK --set-mark 1).
The second rule in the nat table checks for packets in the PREROUTING chain that are destined for port 80 (--dport 80) and have the mark value of 1 (-m mark --mark 1). These packets are then redirected to port 3000 using the REDIRECT target (-j REDIRECT --to-port 3000).
These rules ensure that TCP traffic on port 80, except for traffic from foo.com, is marked and then redirected to port 3000.
Another authoritative answer to the question, however, there are a few things to be concerned about based on what I already know about iptables
D:
Doesn't -s require a DNS lookup?
ChatGPT:
Response edited for brevity.
sudo iptables -A INPUT -p tcp --dport 80 -m string --string "Host: foo.com" --algo bm -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 80 -j DROP
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination :3000
Let's go through each rule:
The first rule matches incoming TCP packets on port 80 that contain the string "Host: foo.com" in the payload (HTTP Host header) using string matching. If a packet matches, it is accepted (-j ACCEPT). Modify foo.com to the desired domain name.
The second rule drops (-j DROP) all other incoming TCP packets on port 80 that did not match the first rule. This effectively blocks those packets.
The third rule in the nat table's PREROUTING chain redirects incoming TCP packets on port 80 to port 3000 using DNAT (Destination NAT). This rule applies to all traffic, regardless of the source, since it is in the PREROUTING chain.
With these rules, incoming TCP traffic on port 80 is filtered based on the presence of the "Host: foo.com" string. Only packets with the matching string are allowed and forwarded to port 3000, while all other packets are dropped.
At this point, I realize my issues with ChatGPT are 100% related to the way I phrased the original question. Where as a human would be able to infer what I was asking, ChatGPT cannot, and hasn't asked additional questions to figure it out. Yes, the answer provided by ChatGPT addresses at least one part of the issue, however,
At this point I realize that ChatGPT doesn't have any context as to how network traffic flows through iptables. When I point out that the solution provided above will not work, ChatGPT offers another, very similar solution, that still will not work.
Back to the drawing board
These hallucinations were getting me nowhere. After doing a bunch of reading and looking at a number of iptables and network flow diagrams I piece together a new recipe that should address the task at hand. I use ChatGPT to validate these assumptions
D:
I believe that the path must be
1. Gate A, Input Door, redirect port 80 to port 3000
2. Gate B, Input Door, allow foo.com only on port 3000
3.Gate B, Input Door, drop incoming traffic for everyone else
4. Gate B, Output Door, allow foo.com only on port 3000
5. Gate A, Output Door, redirect port 3000 to port 80
As you can see, the level of detail and complexity in my question is increasing. I have to be very specific as to what I want from ChatGPT to prevent further hallucinations
ChatGPT:
Response edited for brevity.
1. NAT PREROUTING - Redirect incoming traffic on port 80 to port 3000:
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 3000
2. FILTER INPUT - Allow incoming traffic on port 3000 from foo.com:
sudo iptables -A INPUT -p tcp --dport 3000 -s foo.com -j ACCEPT
3. FILTER INPUT - Drop incoming traffic on port 3000 from all other sources:
sudo iptables -A INPUT -p tcp --dport 3000 -j DROP
4. NAT OUTPUT - Redirect outgoing traffic on port 3000 to port 80:
sudo iptables -t nat -A OUTPUT -p tcp --dport 3000 -j REDIRECT --to-port 80
5. FILTER OUTPUT - Allow outgoing traffic on port 80 to foo.com:
sudo iptables -A OUTPUT -p tcp --dport 80 -d foo.com -j ACCEPT
These rules work as follows:
Rule 1 redirects incoming traffic on port 80 to port 3000 using NAT PREROUTING.
Rule 2 allows incoming traffic on port 3000 from foo.com using FILTER INPUT.
Rule 3 drops all other incoming traffic on port 3000 using FILTER INPUT.
Rule 4 redirects outgoing traffic on port 3000 to port 80 using NAT OUTPUT.
Rule 5 allows outgoing traffic on port 80 to foo.com using FILTER OUTPUT.
These answers appear, at first glance, to be a little more in line with the initial intent.
Do the iptable rules work?
Yes and No. But mostly No.
Rules 1 and 4 work as expected
Rule 2 and 5 are using -s, which we already know is not an option
Rule 3 is a problem. It blocks ALL tcp traffic, even the traffic we are allowing from foo.com. This is due to a subtlety in how iptables works with new and existing connections. What I didn't know at the time of asking, and what ChatGPT didn't mention, is that I need to keep track of existing connections! Otherwise I'll be kicked out of my SSH session and my scripts wont work as expected.
So what now? What is the answer?
The answer to that question is:
There is still no substitute for doing things the old fashioned way, at least in this particular instance. Reading, experimentation, and scripting with iptables to learn what was happening to validate multiple scenarios was the way to go.
The answer I came up with looks very, very different than what ChatGPT was offering as a solution based on the original question. There are a lot of moving parts to keep track of, some of which are a bit of a black box unrelated to iptables itself, but with 3rd party scripts.
# NAT
iptables -A PREROUTING --table nat --protocol tcp --destination-port 80 --jump REDIRECT --to-port 3000
# INPUT
iptables -A INPUT --protocol tcp --match tcp --match multiport --destination-ports 22,53,443 --jump ACCEPT
iptables -A INPUT --protocol tcp --destination-port 3000 --match string --algo kmp --icase --string foo.com --jump ACCEPT
iptables -A INPUT --match conntrack --ctstate NEW,ESTABLISHED --jump ACCEPT
iptables -A INPUT --jump DROP
# OUTPUT
iptables -A OUTPUT --protocol tcp --match tcp --match multiport --destination-ports 22,53,443 --jump ACCEPT
iptables -A OUTPUT --match conntrack --jump ACCEPT --ctstate NEW,ESTABLISHED
iptables -A OUTPUT --jump DROP
# NAT
iptables -A OUTPUT --table nat --protocol tcp --source-port 3000 --jump REDIRECT --to-port 80
If you are wondering what is going on here, copy and paste this into ChatGPT and ask it to explain what is going on.
The kicker is, this answer only partially works! Yes, after all this, this isn't a fully functioning answer to the problem at hand.
humanum solutionem (the human solution)
To get some insight as to why the solution was only partially working the way I wanted, I reached out to a human who has experience in networking and iptables. One of the first questions they asked me was
"Why are you inspecting traffic for foo.com at this level?" It would be certainly easier to do it at the application layer.
They were absolutely correct! My previous background with networking was primarily focused on Allow and Deny Lists, that was best practices back in the day and still is in many instances. However, this is not really one of instances. I was so focused on the iptables solution I hadn't considered anything else and ChatGPT didn't ask me any questions as to my end goals.
Conclusion
After working with ChatGPT for more than a month, we're a ways off from the SkyNet apocalypse where ChatGPT will replace developers outright. Even OpenAI, the people behind ChatGPT acknowledge that it can be, and is often wrong. Not only wrong, but confidently incorrect, which may be considered a variation on the Dunning-Kruger effect (emphasis mine). These wrong answers are referred to as hallucinations and are the result of inherent issues in training Large Language Models (LLM's) and a difficult problem to solve in AI.
ChatGPT may produce inaccurate information about people, places, or facts.?ChatGPT May 12 Version
So, how was my ruberducking experience with ChatGPT? I wouldn't do it again, at least not with something as complicated as iptables.
The Good
The Bad
The Ugly
Recommendations
??? Engineer & Manufacturer ?? | Internet Bonding routers to Video Servers | Network equipment production | ISP Independent IP address provider | Customized Packet level Encryption & Security ?? | On-premises Cloud ?
1 年I found your experience with OpenAI's chatGPT very intriguing! I'm curious if you believe that using AI-driven prompt engineering could lead to an increased efficiency in managing complex programming tasks. I'd also be interested in hearing more about how you think the AI-generated "hallucinations" could shape our perception of LMLs and other automated tools. Could this be a revolutionary way of developing software? Fascinating stuff!
Emerging Science & Technology Professional
1 年I found the article interesting. I've been working directly with LLMs and am also intrigued by the notions of "AI Hallucinations." A couple things I've read recently. 1. "All LLMs will hallucinate" regardless of how they were trained and on what data. 2. "AI Hallucinations" are primarily driven by the 'prompt" in other words, if you intentionally want the LLM to hallucinate you can leverage the prompt do so. There has been a lot of recent effort to develop technologies to rid LLMs of AI Hallucinations. Good article!