A Simpler Way to Write Regular Expressions: Leveraging OpenAI's GPT-4 Model

Regular expressions (regex) are incredibly powerful, but can be difficult to master. Their nuanced syntax and special characters can often be a source of frustration. To alleviate this, we leveraged OpenAI's GPT-4 model to create a Python script that interprets natural language descriptions into regex patterns.

The idea is simple yet powerful: we send the OpenAI GPT-4 model a natural language description of the pattern we want to match, and it returns a Python-compatible regex. For example, we could ask for a regex that "matches a valid email address", and the model would return something like ^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$.

Moreover, the model can generate regex with groupings based on our description. So, if we need a regex to "match a date in the format YYYY-MM-DD and groups the year, month, and day separately", the model can handle that too!

Below is the Python script which does exactly this:


import openai
import re


def run_gpt(prompt, role, model='gpt-4'):
? ? response = openai.ChatCompletion.create(model=model,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? messages=[{"role": "system", "content": role},
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? {"role": "user", "content": prompt}
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ])
? ? respuesta = response["choices"][0]["message"]["content"]
? ? return respuesta


def generate_regex(description, model='gpt-4'):
? ? prompt = f"I need a regular expression that {description}"
? ? regex = run_gpt(prompt, "You generate python compatible regular expressions that are directly used from within python. Your responses are 100% a python regex. Do not output anything that is not a regex string. DO not use code blocks.", model)
? ? if regex[0] == '"' and regex[-1] == '"':
? ? ? ? regex = regex[1:-1]
? ? return regex


def apply_regex(test_string, regex):
? ? pattern = re.compile(regex)
? ? match = pattern.match(test_string)
? ? return match is not None


def generate_regex_groups(description, model='gpt-4'):
? ? prompt = f"I need a regular expression that {description} and groups the year, month, and day separately"
? ? regex = run_gpt(prompt, "You generate python compatible regular expressions that are directly used from within python. Your responses are 100% a python regex. Do not output anything that is not a regex string. DO not use code blocks.", model)
? ? if regex[0] == '"' and regex[-1] == '"':
? ? ? ? regex = regex[1:-1]
? ? regex = regex.encode('unicode_escape').decode()
? ? return regex


def apply_regex_groups(test_string, regex):
? ? pattern = re.compile(r'{}'.format(regex))
? ? match = pattern.match(test_string)
? ? if match is not None:
? ? ? ? return match.groups()
? ? return None


regex = generate_regex("matches a valid email address")
print(f"Generated regex: {regex}")
print(apply_regex("[email protected]", regex))
print(apply_regex("not an email", regex))


regex = generate_regex_groups("matches a date in the format YYYY-MM-DD")
print(f"Generated regex: {regex}")
print(apply_regex_groups("2023-07-29", regex))        

This script takes the struggle out of writing regular expressions, allowing you to describe the patterns you want in plain English. It's a testament to how artificial intelligence can simplify complex tasks and make our lives easier.

Now combine this with the Dyson decorator:

Expanding The Possibilities: Dyson and RegexGenerator Decorators

After exploring the process of generating regular expressions using natural language descriptions through OpenAI's GPT-4 model, we turn to an intriguing addition to this approach: introducing the Dyson and RegexGenerator decorators. These ingenious creations, first introduced by Arturo 'Buanzo' Busleiman, further bridge the gap between Python programming and AI capabilities.

Revisiting the Dyson Decorator

The Dyson decorator's purpose is to extract essential information from Python functions and use this information to interact with the OpenAI model. This decorator takes a function and its arguments, then formulates a "role" for the AI model based on the function's docstring.

In the context of regular expression generation, the Dyson decorator could be used to wrap a function that validates email addresses, dates, or other string patterns. It enhances Python functions by leveraging the power of AI, making it possible to generate complex regular expressions based on human language descriptions.

Introducing the RegexGenerator Decorator

Complementing the Dyson decorator, the RegexGenerator decorator presents a convenient way to generate Python-compatible regular expressions. It operates in concert with a helper function, generate_regex, which is responsible for translating a natural language description into a regex pattern, using the OpenAI API.

The true power of the RegexGenerator becomes apparent when it's used to decorate a function, as seen in the following example:


@RegexGenerator("matches a valid email address", "Generate a Python compatible regex"
def check_email(regex: str, email: str):
? ? pattern = re.compile(regex)
? ? return bool(pattern.match(email))

)        

In this example, when check_email("[email protected]") is called, the RegexGenerator generates a regex that matches a valid email address. The check_email function then uses this regex to verify the email address, making the process seamless and intuitive.

Taking It a Step Further

The Dyson and RegexGenerator decorators showcase the exciting potential of AI in Python programming. By using these decorators, Python functions can utilize the capability of AI models to assist in programming tasks, simplifying complex operations such as generating regular expressions.

Expanding on our previous examples, we could integrate these decorators into our Python scripts, making the task of generating and applying regular expressions even simpler. With tools like these, Python programming becomes more powerful, reducing the cognitive load on developers and enhancing productivity.

So, the next time you find yourself wrestling with a particularly stubborn regular expression, remember that AI might offer a helping hand. Stay tuned for more exciting developments in this area of AI-assisted programming!







要查看或添加评论,请登录

Arturo B.的更多文章

社区洞察

其他会员也浏览了