Using ChatGPT's Code Interpreter to take a large group of emails from a To: field and generate LinkedIn profile links
Here's a real-world example, where I used ChatGPT's Code Interpreter plug-in to learn more about a large group of people on an email thread. I was invited to speak to them about AI, and I wanted to know my audience. There were about 40 people cc'd on the email, and I knew none of them. This is a transcript of the chat, with personal information redacted, as I used ChatGPT to create links to quickly find everyone's LinkedIn profile.
UPDATE: After posting this, I easily managed to fit the request into one consolidated prompt (purposefully being a bit lazy with the language). Code Interpreter did *better* with the consolidated prompt, returning almost twice as many viable URLs... from 27 links to 51 links. I'll paste the update to the bottom of the article.
First, you can review the more brute force approach I used to kick the tires.
I'm including the Python code ChatGPT generated along the way - skip over it if you're not technical. Feel free to critique it, if you know code.
Note the amount of self-reflection by ChatGPT along the way
ME: If I give you a big pile of comma delimited emails from the to and cc fields of an email that I received, can you clean it up to only have names and email addresses? For example, most of the emails are formatted like this "REDACTED REDACTED" <[email protected]>. I'd like to get rid of the quotes and the < and >. The final result should be just name and email. I'd like one name and email per line. A few of the emails don't have proper formatting. For example, "REDACTED" <[email protected]> has the username as the name. If you run into this, just keep the username as the name. No worries.
CHATGPT: Sure, I can help you with that. Please upload the file containing the comma-delimited emails.
ME: (Pastes in a pile of emails from the TO and CC field)
CHATGPT:
import re
# Define the email strings
email_strings = [REDACTED]
# Initialize an empty list to store cleaned emails
cleaned_emails = []
# Define the regular expression pattern for extracting name and email address
pattern = r"\"(.+?)\" <(.+?)>"
# Iterate over each email string
for email_string in email_strings:
# Extract all matches
matches = re.findall(pattern, email_string)
# Iterate over each match
for match in matches:
# Extract the name and email
name, email = match
# Check if the name is a valid email
if re.match(r"[^@]+@[^@]+\.[^@]+", name):
# If it is, use the part before the @ as the name
name = name.split("@")[0]
?# Add the cleaned email to the list
????cleaned_emails.append(f"{name} {email}")
# Display the cleaned emails
cleaned_emails
ChatGPT: The cleaned email addresses and names from the given list are as follows:
’REDACTED,?REDACTED.REDACTED [email protected]’
ChatGPT: It seems there was an issue with the cleaning process, as only one email was returned. Let me correct this.
# Reinitialize the cleaned_emails list
cleaned_emails = []
# Iterate over each email string
for email_string in email_strings:
??# Remove the "To: " and "Cc: " prefixes
??email_string = re.sub(r"^(To|Cc): ", "", email_string)
??# Split the string into individual emails
??emails = email_string.split(", ")
??# Iterate over each email
??for email in emails:
????# Extract the name and email address
????match = re.search(pattern, email)
????if match:
??????# Extract the name and email
??????name, email = match.groups()
??????# Check if the name is a valid email
??????if re.match(r"[^@]+@[^@]+\.[^@]+", name):
????????# If it is, use the part before the @ as the name
????????name = name.split("@")[0]
??????# Add the cleaned email to the list
??????cleaned_emails.append(f"{name} {email}")
????else:
??????# If the email does not match the pattern, add it as is
??????cleaned_emails.append(email)
# Display the cleaned emails
cleaned_emails
ChatGPT: The cleaned email addresses and names from the given list are as follows:
CHATGPT returns a long list of names and emails in two columns that I didn't paste here
ChatGPT: The emails with quotes and angle brackets ("<" and ">") are not fully cleaned. This may be due to the presence of additional names or other text in the name field. Let's clean these up as well.
# Reinitialize the cleaned_emails list
cleaned_emails = []
# Iterate over each email string
for email_string in email_strings:
??# Remove the "To: " and "Cc: " prefixes
??email_string = re.sub(r"^(To|Cc): ", "", email_string)
??# Split the string into individual emails
??emails = email_string.split(", ")
??# Iterate over each email
??for email in emails:
????# Extract the name and email address
????match = re.search(pattern, email)
????if match:
??????# Extract the name and email
??????name, email = match.groups()
??????# Check if the name is a valid email
??????if re.match(r"[^@]+@[^@]+\.[^@]+", name):
????????# If it is, use the part before the @ as the name
????????name = name.split("@")[0]
??????else:
????????# Remove any extra information in the name
????????name = re.split(r", | \(", name)[0]
??????# Add the cleaned email to the list
??????cleaned_emails.append(f"{name} {email}")
????else:
??????# If the email does not match the pattern, add it as is
??????cleaned_emails.append(email)
# Display the cleaned emails
cleaned_emails
CHATGPT returns a clean list that I'm not pasting here. Upon inspection, it appears to be the same list as before. While it might have been redundant and technically inefficient, it only took a few seconds and didn't impact the workflow
--At this point I realize what I want to do with the names--
ME (starts a new chat): I'd like to paste in some data and extract the names from it. Can you help?
CHATGPT: Of course, I'd be glad to help you with that. Please go ahead and paste the data here. Also, could you specify what you mean by "names"? Are you referring to people's names, company names, or something else? Please provide as much detail as possible to ensure I can assist you effectively.
ME: People's names
CHATGPT: Alright. Please go ahead and paste the data from which you'd like me to extract the people's names.
ME: (pastes the list of names and emails from the first chat, not shown)
CHATGPT: This appears to be a list of names, followed by email addresses. The names are mostly in the format "First name Last name," although some are in the format "firstname," "first initial last name," or "firstnamelastname."
To extract the names, I will use a simple strategy: I will split each line by whitespace and consider everything before the email as the name. This won't perfectly handle every case, but it should work for the majority of the entries in your list.
Let's start with that.
# input data
data = """
REDACTED
领英推荐
# split data into lines
lines = data.strip().split("\n")
# extract names
names = [line.split()[0:-1] for line in lines]
# convert list of words in names to single string
names = [" ".join(name) for name in names]
names
ChatGPT: Here are the names extracted from your data:
CHATGPT writes large list of names that I'm not pasting here
Please note that some entries (like "REDACTED", "REDACTED", "REDACTED", etc.) do not seem to be complete names. This might be because they are usernames or initials.
ME: Can you redo this list and remove those names that are not complete?
CHATGPT: Sure, I can refine the extraction process to only include entries that appear to be full names. I'll use the assumption that a full name must consist of at least two words. This won't be perfect, as it will exclude names that are correctly formatted but only consist of one word (like "REDACTED" in the original list), but it should help to exclude usernames and initials.
Let's proceed with this approach.
# filter out names that do not appear to be full names
full_names = [name for name in names if len(name.split()) > 1]
full_names
ChatGPT: Here are the names that appear to be complete, with at least two words:
CHATGPT writes large list of names that I'm not pasting here
This list excludes any entries with only one word, which might not be complete names.
ME: I'm going to give you a web address with a dynamic value at the end. The format is as follows:
https://www.dhirubhai.net/search/results/all/?keywords=firstname%20lastname
I'd like you to take the list of names from above and for each name, generate the web address so that there is a unique web address for each name. Put the first name from your list into the "firstname" spot and the last name from your list into the "lastname" spot. Ignore any middle names or initials.
CHATGPT:
# Function to generate LinkedIn search URLs
def generate_linkedin_url(full_name):
??first_name, last_name = full_name.split()[0], full_name.split()[-1]
??url = f"https://www.dhirubhai.net/search/results/all/?keywords={first_name}%20{last_name}"
??return url
# Generate URLs for each full name
linkedin_urls = {name: generate_linkedin_url(name) for name in full_names}
ChatGPT: Here are the LinkedIn search URLs generated for each name
CHATGPT writes out long list of hyperlinked first and last names that each click through to their corresponding LinkedIn search URL. It's easy for me to open them all in a new tab.
---
UPDATED CONSOLIDATED PROMPT EXPERIENCE
ME: I'd like to give you a big pile of comma delimited emails from the to and cc fields of an email that I received.?For example, most of the emails are formatted like this?"REDACTED REDACTED" <[email protected]>.??A few of the emails don't have proper formatting.?For example,?"REDACTED" <[email protected]> has the username as the name.?If you run into this, you can skip that record.?No worries.??
I'm going to give you a web address with a dynamic value at the end.?The format is as follows:
https://www.dhirubhai.net/search/results/all/?keywords=firstname%20lastname
I'd like you to take the list of names from above and for each name, generate the web address so that there is a unique web address for each name.??Put the first name from your list into the "firstname" spot and the last name from your list into the "lastname" spot.?Ignore any middle names or initials.??
CHATGPT: Sure, I'd be happy to help with that. Please provide the list of email addresses so I can get started.
ME: Pastes in the TO and CC field from the email
CHATGPT:
import re
# Extract names from emails
def extract_names(emails):
??pattern = r'"([^"]*)"'?# Pattern to match names between double quotes
??return re.findall(pattern, emails)
# Generate LinkedIn URLs
def generate_urls(names):
??urls = []
??for name in names:
????name_split = name.split()?# Split name into parts
????# Check if name_split has at least 2 parts (first name and last name)
????if len(name_split) >= 2:
??????# URL encoding spaces as %20
??????firstname = name_split[0].replace(' ', '%20')
??????lastname = name_split[-1].replace(' ', '%20')
??????url = f"https://www.dhirubhai.net/search/results/all/?keywords={firstname}%20{lastname}"
??????urls.append(url)
??return urls
emails = """
REDACTED
"""
# Extract
CHATGPT: Here are the first 5 LinkedIn URLs generated from the email addresses you provided:
REDACTED
It appears that there's an issue with the fourth URL, as it contains an email address instead of a last name. This likely occurred because of the comma in the name string. I can modify the function to better handle such situations if you would like.
ME: yes please
CHATGPT: Here are the first 5 LinkedIn URLs generated after improving the name extraction process:
REDACTED
As you can see, the issue with the fourth URL has been resolved. The function now correctly handles names containing commas.
ME: There should be over 25 URLs. Can you check the work and see if you can generate all of the URLs. I'm expecting around 27 results.
CHATGPT:
# Count the total number of URLs generated
len(urls_v2)
RESULT
51
CHATGPT: I have generated a total of 51 URLs. This is more than the 27 you expected. It's possible that some entries in the original email list were split into multiple names due to the presence of commas. Please let me know how you'd like to proceed.
ME: Let's see the 51 URLs. I'll review them.
CHATGPT:
# Return all the generated URLs
urls_v2
CHATGPT: Here are the 51 LinkedIn URLs generated from the email addresses:
CHATGPT writes 51 viable LinkedIn URLs
CHATGPT: Please review these URLs and let me know if you have any questions or need further assistance.
----
Mission complete... I can easily check out all of the attendees who will be in the audience during the panel.
Senior Consultant - Local Media at Magid
1 年Impressive
People who don't use BCC are a bigger threat to society than AI will ever be.
Official Snack Food of the Cretaceous, Author, Artist, Entrepreneur
1 年I've been using it to reply to business spam with customized pitches to shop on my site. No sales so far but so many happy replies from people. ??
Vice President, Draper Media
1 年Like any good iteration, I am pretty sure I can condense this into a single prompt. I like how that works. You talk through it once, long form, to kick the tires. Then you give it the full pedal to the metal. UPDATE: I managed to fit the request into one consolidated prompt (purposefully being a bit lazy with the language). Code Interpreter did *better* with the consolidated prompt, returning almost twice as many viable URLs... from 27 links to 51 links. I'll paste the update to the bottom of the article above...