Mastering Sentiment Analysis with ChatGPT, OpenAI, and Python
Photo by Tengyart on Unsplash

Mastering Sentiment Analysis with ChatGPT, OpenAI, and Python

Here is Part 2 of my first article about using the ChatGPT API for automated sentiment analysis of customer product reviews.

In this guide, I'll walk you through developing an AI-driven product review & improvement analysis program that will take your company's raw product reviews and turn them into actionable recommendations for improving the product. By the end of the guide, you'll know how to integrate the ChatGPT API with Python using Google Colab.

*** Note: You can find the first part of the tutorial that explains the how and why of sentiment analysis on my Medium blog @courtlinholt here: Analyze Customer Product Reviews Using ChatGPT OpenAi API

Summary of Part 1 (previous tutorial)

In the previous tutorial (Part 1 link), we used Python and Google Colab to access OpenAI’s ChatGPT API to perform sentiment analysis and summarization of raw customer product reviews. In this tutorial, we will expand upon that by using the ChatGPT API to extract a list of pros and cons from the customer reviews and generate a list of prioritized product improvement suggestions. Let’s get to work!

About The Dataset

The dataset that I’ll be using for this tutorial was created by me specifically for this example. The dataset is entirely fabricated and does not represent any particular real-life product. The hypothetical product on which the reviews are based is a SaaS landing page builder app specialized for creating e-commerce pages. I have tried to include a variety of different review lengths and star ratings to simulate actual product reviews. If you want to use the code below for your own projects, simply provide a list of your customer reviews in CSV format in a file named “reviews.csv”. The CSV file should consider of two columns, one named “Number” that numbers each review and another column named “Product_Review” that contains the actual review. Here’s a screenshot of what you input file should look like.

Machine Learning Natural Language Processing (NLP) of Customer Reviews With Open AI

Here’s a quick breakdown of how we will be using the ChatGPT API in this tutorial:

  1. To generate a list of pros and cons from our raw customer reviews
  2. To generate a list of product improvement suggestions from the raw reviews
  3. To consolidate the list of improvement suggestions to just the Top 10
  4. To explain why ChatGPT selected these improvements as the most important
  5. To rank prioritize the list of suggestions based on the estimated effort to complete them

Build a Sentiment Analysis System with ChatGPT OpenAI API and Python

Sentiment Analysis & Summarization Background

Part 1 of this tutorial explained the how and why of sentiment analysis with chatGPT. In case you don’t have time to read it, here’s a condensed version of the code from the first tutorial:

Install the required libraries that are not pre-installed in Google Colab

!pip install pandas openai requests
!pip install tqdm
!pip install python-docx
import pandas as pd
import openai
import requests
from tqdm import tqdm
import time
import docx

# Enter your OpenAI API private access key here. IMPORTANT - don't share your code online if it contains your access key or anyone will be able to access your openai account
openai.api_key = "<Replace this text into the angle brackets with your openAI API key. Make sure to leave the quotes around your key>"        

Using the ChatGPT OpenAI API with Python for Sentiment Analysis

# Use this code block if you ONLY want to know the sentiment for each review. This code will NOT try to summarize each review.

# Create a custom function that will call the openAI API and send your reviews data to it one review at a time
# We will use the tqdm library to create a progress tracker so we can see if there are any problems with the openAI API processing our requests
def analyze_my_review(review):
    retries = 3
    sentiment = None

    while retries > 0:
        messages = [
            {"role": "system", "content": "You are an AI language model trained to analyze and detect the sentiment of product reviews."},
            {"role": "user", "content": f"Analyze the following product review and determine if the sentiment is: positive, negative or neutral. Return only a single word, either POSITIVE, NEGATIVE or NEUTRAL: {review}"}
        ]

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            # We only want a single word sentiment determination so we limit the results to 3 openAI tokens, which is about 1 word. 
            # If you set a higher max_tokens amount, openAI will generate a bunch of additional text for each response, which is not what we want it to do
            max_tokens=3,
            n=1,
            stop=None,
            temperature=0
        )

        response_text = completion.choices[0].message.content
        # print the sentiment for each customer review, not necessary but it's nice to see the API doing something :)
        print(response_text)

        # Sometimes, the API will be overwhelmed or just buggy so we need to check if the response from the API was an error message or one of our allowed sentiment classifications.
        # If the API returns something other than POSITIVE, NEGATIVE or NEUTRAL, we will retry that particular review that had a problem up to 3 times. This is usually enough.
        if response_text in ["POSITIVE", "NEGATIVE", "NEUTRAL"]:
            sentiment = response_text
            break
        else:
            retries -= 1
            time.sleep(0.5)
    else:
        sentiment = "neutral"

    retries = 3
   
    # OpenAI will limit the number of times you can access their API if you have a free account. 
    # If you are using the openAI free tier, you need to add a delay of a few seconds (i.e. 4 seconds) between API requests to avoid hitting the openai free tier API call rate limit.
    # This code will still work with an openAI free tier account but you should limit the number of reviews you want to analyze (<100 at a time) to avoid running into random API problems.

    time.sleep(0.5)

    return sentiment

# Define the input file that contains the reviews you want to analyze
input_file = "reviews.csv"
# Read the input file into a dataframe
df = pd.read_csv(input_file)

# Analyze each review using ChatGPT and save the results in a list called sentiments so we can access the results later
sentiments = []

# Here we loop through all of the reviews in our dataset and send them to the openAI API using our custom function from above
for review in tqdm(df["Product_Review"], desc="Processing reviews"):
    sentiment = analyze_my_review(review)
    sentiments.append(sentiment)

# Now let's save the openAI API results as an additional column in our original dataset
df["sentiment"] = sentiments

# Save the results to a new Excel file (not a CSV file this time so it's easier for non-python users to work with)
output_file = "reviews_analyzed_full_sentiment.xlsx"
df.to_excel(output_file, index=False)


# Let's also save the results to a new Word file just in case people prefer to use that instead of Excel
output_file = "reviews_analyzed_full_sentiment.docx"
doc = docx.Document()

# Now that the Word doc has been created, we can add a table with headers
table = doc.add_table(rows=1, cols=2)
header_cells = table.rows[0].cells
header_cells[0].text = 'Product_Review'
header_cells[1].text = 'Sentiment'

# Now we add the table content to show each review and the associated sentiment that chatGPT determined
for index, row in df.iterrows():
    row_cells = table.add_row().cells
    row_cells[0].text = str(row['Product_Review'])
    row_cells[1].text = row['sentiment']

doc.save(output_file)        

Here’s the output, a sentiment classification for each review that is either POSITIVE, NEGATIVE or NEUTRAL. For the 104 reviews in the sample dataset, the API took 1 minute and 22 seconds to analyze all of them.

No alt text provided for this image

How Chatgpt Can Be Used to Summarize A Review

Keep in mind we have not tried to do any fine-tuning of the language model developed by openai by providing a customized dataset specific to customer review sentiment classification for a particular industry or type of business. Doing so would probably further increase the accuracy of the sentiment classification. One of the benefits of using a pre-trained AI model is that you don’t need to provide massive amounts of labeled training data because openAI already did that for you.

# Use this code block if you ONLY want to summarize long reviews into short, 75 word versions. This code will NOT try to classify the sentiment of each review
def summarize_review(review):
    retries = 3
    summary = None

    while retries > 0:
        # This time, we are only summarizing the reviews, not determining the sentiment so we change the prompt (ie command) for chatGPT to the following
        messages = [
            {"role": "system", "content": "You are an AI language model trained to analyze and summarize product reviews."},
            {"role": "user", "content": f"Summarize the following product review, highlighting pros and cons: {review}"}
        ]

        
        completion2 = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            # We want to limit the summarizes to about 75 words (which is around 100 openAI tokens). If you want longer review summaries, increase the max_tokens amount
            max_tokens=100,
            n=1,
            stop=None,
            temperature=0.8
        )

        response_text = completion2.choices[0].message.content
        # This is optional but it's nice to see how the reviews are being summarized to make sure something isn't wrong with the input file or API results
        print(response_text)
        
        # This is our quality control check. If the API has an error and doesn't generate a summary, we will retry the review 3 times. 
        if response_text:
            summary = response_text
            break
        else:
            retries -= 1
            time.sleep(0.5)
    else:
        summary = "Summary not available."

    # OpenAI will limit the number of times you can access their API if you have a free account. 
    # If you are using the openAI free tier, you need to add a delay of a few seconds (i.e. 4 seconds) between API requests to avoid hitting the openai free tier API call rate limit.
    # This code will still work with an openAI free tier account but you should limit the number of reviews you want to analyze (<100 at a time) to avoid running into random API problems.

    time.sleep(0.5)

    return summary

# Define the input file that contains the reviews you want to analyze
input_file = "reviews.csv"
# Read the input file into a dataframe
df = pd.read_csv(input_file)

# After chatGPT summarizes the review, we save the summary to a list called summaries
summaries = []

for review in tqdm(df["Product_Review"], desc="Processing reviews"):
    summary = summarize_review(review)
    summaries.append(summary)

# Now we add the review summaries to the original input dataframe
df["summary"] = summaries

# Save the results to a new Excel file
output_file = "reviews_analyzed_full_summaries.xlsx"
df.to_excel(output_file, index=False)


# Save the results to a new Word file

output_file = "reviews_analyzed_full_summaries.docx"
doc = docx.Document()

# Add table with headers
table = doc.add_table(rows=1, cols=2)
header_cells = table.rows[0].cells
header_cells[0].text = 'Product_Review'
header_cells[1].text = 'Summary'

# Add table content
for index, row in df.iterrows():
    row_cells = table.add_row().cells
    row_cells[0].text = str(row['Product_Review'])
    row_cells[1].text = row['summary']

doc.save(output_file)        

Here is a sample output from the summarization code block. For 104 reviews, the chatGPT API took 6 minutes and 25 seconds to summarize everything.

No alt text provided for this image

Use the OpenAI API to generate list of product pros and cons

Now that you’re caught up from what we covered in the first tutorial, we can move on to the new functionality. In the code below, we use the chatgpt model to take our customer review text data and generate responses in the form of a list of product pros and cons. Although some reviews explicitly mention pros and cons, most do not so we need to leverage ChatGPT’s artificial intelligence to infer the product’s pros and cons.

# This code block will read in approximately 1,800 words worth of review data at a time and determine the pros and cons people have mentioned from that block of text. 
# The code will then move onto the next chunk of 1,800 words and extract the pros and cons from it, repeating as necessary until all of the reviews have been processed.
# This is necessary because of the limits on how much input text chatGPT can handle at one time 

# Generate a list of pros and cons from all of the raw user reviews
def generate_proscons_list(text):
    word_blocks = text.split(' ')
    block_size = 2500
    blocks = [' '.join(word_blocks[i:i + block_size]) for i in range(0, len(word_blocks), block_size)]

    proscons = []

    for block in tqdm(blocks, desc="Processing blocks", unit="block"):
        messages = [
            {"role": "system", "content": "You are an AI language model trained to create a list of the most common pros and cons for products based on product review summaries."},
            {"role": "user", "content": f"Based on the following product review summaries, create a list of the most common pros and cons for the product: {block}"}
        ]

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            # You can change the max_tokens amount to increase or decrease the length of the results pros and cons list. If you increase it too much, you will exceed chatGPT's limits though.
            max_tokens=300,
            n=1,
            stop=None,
            # You can adjust how "creative" (i.e. true to the original reviewer's intent) chatGPT will be with it's summary be adjusting this temperature value. 0.7 is usually a safe amount
            temperature=0.7
        )

        procon = completion.choices[0].message.content
        proscons.append(procon)

    # Combine the pros and cons that chatGPT found into a list 
    combined_proscons = "\n\n".join(proscons)
    return combined_proscons

# Read the reviews data from the CSV input file and then create a dataframe to hold the review data
input_file = "reviews.csv"
df = pd.read_csv(input_file)

# Combine all of the pros and cons from the various review chunks into one list 
all_reviews = "\n".join(df["Product_Review"].tolist())

# This is the call to the function we created above that will trigger the API call
summary_proscons = generate_proscons_list(all_reviews)

# Print the list of pros and cons (optional step)
print(summary_proscons)



# Save the resulting list of pros and cons to a new Excel file for further offline processing
df_proscons = pd.DataFrame()
list_proscons = []
list_proscons.append(summary_proscons)
df_proscons["pros_cons"] = list_proscons
output_file_proscons = "reviews_analyzed_full_proscons.xlsx"
df_proscons.to_excel(output_file_proscons, index=False)


# Also, save the results to a new Word file
output_file_proscons = "reviews_analyzed_full_proscons.docx"
doc = docx.Document()

# Create a table within the new Word doc with the following header: Pros & Cons
table = doc.add_table(rows=1, cols=1)
header_cells = table.rows[0].cells
header_cells[0].text = 'Pros & Cons'

# Add the results of our API call to the table
for index, row in df_proscons.iterrows():
    row_cells = table.add_row().cells
    row_cells[0].text = str(row['pros_cons'])

doc.save(output_file_proscons)        

Here is the output of the pros and cons code block. For our review dataset, the API required only 24 seconds to extract and generate this list of product pros and cons! That’s a lot faster than it would take you to read through 100+ customer reviews yourself.

No alt text provided for this image

Using ChatGPT to build a list of product improvement suggestions

Knowing how your customers feel about a product’s pros and cons is insightful and likely valuable. However, the next questions that your stakeholders are going to ask will probably be “so what?” and “what can I do with this information?” Although interesting findings can help to start a discussion about customer product feedback, we can do better than that. One way to make our analysis more actionable is to ask ChatGPT to suggest ways in which the product can be improved to increase customer satisfaction based on what it found in the customer review data.


# This code block will read the review data in chunks of about 1,800 words and generate improvement suggestions from each chunk of review data.
# It will process each 1,800 word chunk until it reads all of the reviews and then suggest a list of product improvements based on customer feedback

def generate_improvement_suggestions(text):
    # This code splits the total reviews text into blocks of 2,500 tokens (about 1,800 words) to comply with openAI's API limits
    word_blocks = text.split(' ')
    block_size = 2500
    blocks = [' '.join(word_blocks[i:i + block_size]) for i in range(0, len(word_blocks), block_size)]

    suggestions = []

    for block in tqdm(blocks, desc="Processing blocks", unit="block"):
        # Here we specify the role for chatGPT to assume and give it the command to suggest 10 product improvements based on the block of customer review data it just read
        messages = [
            {"role": "system", "content": "You are an AI language model trained to analyze product reviews and generate suggestions for product improvements."},
            {"role": "user", "content": f"Based on the following product reviews, suggest 10 product improvements: {block}"}
        ]

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            max_tokens=300,
            n=1,
            stop=None,
            temperature=0.7
        )

        suggestion = completion.choices[0].message.content
        suggestions.append(suggestion)

    # Combine all suggestions together
    combined_suggestions = "\n\n".join(suggestions)
    return combined_suggestions

# Read the input Excel file containing user reviews and save it into a dataframe
input_file = "reviews.csv"
df = pd.read_csv(input_file)

# Combine all of the suggestions into a single text
review_improvements = "\n".join(df["Product_Review"].tolist())

# Call the custom function we created above to generate improvement suggestions from the review data
improvement_suggestions = generate_improvement_suggestions(review_improvements)

# Print the improvement suggestions (optional step, you can remove it)
print(improvement_suggestions)

# Save the resulting list of product improvement suggestions to a new Excel file
df_improvements = pd.DataFrame()
list_improvements = []
list_improvements.append(improvement_suggestions)
df_improvements["improvement_suggestions"] = list_improvements
# Define the name of the Excel file that will contain the output
output_file_improvements = "reviews_analyzed_full_improvements.xlsx"
df_improvements.to_excel(output_file_improvements, index=False)

# Also, save the improvement suggestions to a new Word file
output_file_improvements = "reviews_analyzed_full_improvements.docx"
doc = docx.Document()

# Inside of the Word doc, create a table to contain the suggestions
table = doc.add_table(rows=1, cols=1)
header_cells = table.rows[0].cells
header_cells[0].text = 'Improvement Suggestions'

# Fill the table created above with the review improvement suggestions
for index, row in df_improvements.iterrows():
    row_cells = table.add_row().cells
    row_cells[0].text = str(row['improvement_suggestions'])

doc.save(output_file_improvements)        

Here is a sample of one set of product improvements that chatGPT was able to infer from the product reviews. For 104 customer reviews, chatGPT took 30 seconds to generate this list of suggestions.

No alt text provided for this image


Use OpenAI to Select The Most Important Product Improvement Suggestions

Based on the number of customer reviews in your dataset, ChatGPT many generate a long list of product improvement suggestions. To help us summarize these suggestions into something more manageable, we can ask ChatGPT to analyze the suggestions it created and pick out the most important 10.


# This block of code will consolidate the list of improvement suggestions created in the previous step int a Top 10 list. 
# This is helpful in case you have many reviews and chatGPT generated a long list of improvement suggestions

def generate_improvement_suggestions_prioritized(text):
    word_blocks = text.split(' ')
    block_size = 2500
    blocks = [' '.join(word_blocks[i:i + block_size]) for i in range(0, len(word_blocks), block_size)]

    suggestions_prioritized = []

    for block in tqdm(blocks, desc="Processing blocks", unit="block"):
        # Here we tell chatGPT which role to assume and what to do, i.e. tell me the top 10 improvement suggestions to make now.
        # If you want more than 10 top suggestions, change the prompt accordingly to ask for 20, 30 etc. 
        messages = [
            {"role": "system", "content": "You are an AI language model trained to analyze product reviews and generate suggestions for product improvements."},
            {"role": "user", "content": f"Using this list of suggested improvements above {block}, tell me the top 10 improvements to make now."}
        ]

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            # We set max_tokens to 2,000 so we don't exceed the input + output token limit of the chatGPT API. Remember above, we split the text blocks into chunks of 2,500 tokens
            # so we need to limit the output to 2,000 tokens
            max_tokens=2000,
            n=1,
            stop=None,
            temperature=0.7
        )

        consolidated = completion.choices[0].message.content
        suggestions_prioritized.append(consolidated)

    # Combine all of the Top 10 suggestions from the various blocks (in case there is more than 1) into a single text string
    combined_suggestions_prioritized = "\n\n".join(suggestions_prioritized)
    return combined_suggestions_prioritized

# This calls the custom function we defined above that will ask chatGPT to consolidate the list of improvement suggestions
improvement_suggestions_prioritized = generate_improvement_suggestions_prioritized(improvement_suggestions)

# Print the improvement suggestions (optional step)
print(improvement_suggestions_prioritized)

# Save our results to a new Excel file
df_improvements_prioritized = pd.DataFrame()
list_improvements_prioritized = []
list_improvements_prioritized.append(improvement_suggestions_prioritized)
df_improvements_prioritized["improvement_suggestions_prioritized"] = list_improvements_prioritized
# Set the name for your output Excel file here
output_file_improvements_prioritized = "reviews_analyzed_full_improvements_prioritized.xlsx"

df_improvements_prioritized.to_excel(output_file_improvements_prioritized, index=False)

# Save the results to a new Word file as well
# Set the name for the Word file here
output_file_improvements_prioritized = "reviews_analyzed_full_improvements_prioritized.docx"
doc = docx.Document()

# Add table a table to the Word doc with the heading: Top Improvement Suggestions
table = doc.add_table(rows=1, cols=1)
header_cells = table.rows[0].cells
header_cells[0].text = 'Top Improvement Suggestions'

# Add the Top Improvement Suggestions to the table, one per row
for index, row in df_improvements_prioritized.iterrows():
    row_cells = table.add_row().cells
    row_cells[0].text = str(row['improvement_suggestions_prioritized'])

doc.save(output_file_improvements_prioritized)        

Below is the consolidated list of Top 10 product improvements suggested by chatGPT for our hypothetical product based on the customer feedback:

No alt text provided for this image

Output: Top 10 Suggested Product Recommendations For Sample Product Reviews:

Based on the collective feedback provided in the reviews, here are the top 10 improvements that could be made to the landing page builder:


1. More customization options to give users greater control over the design of their landing pages.

2. Increased number of integrations with popular marketing tools to make it easier to connect with other platforms.

3. A/B testing functionality to allow users to test different versions of their landing pages and improve conversion rates.

4. More advanced analytics to help users better understand their audience and optimize their campaigns.

5. Additional templates and design options to give users more choices and variety.

6. Improved performance optimization to further increase loading speeds and reduce bounce rates.

7. More comprehensive documentation and tutorials to help users get the most out of the platform.

8. Enhanced collaboration features to make it easier for teams to work together on landing page projects.

9. More advanced automation and personalization options to help users create more targeted and effective campaigns.

10. Integration with more payment gateways and e-commerce platforms to make it easier to sell products and services directly from landing pages.

Conversational Data Science: Asking ChatGPT To Explain Why It Responded In A Certain Way

One of the truly fascinating aspects of ChatGPT is the ability to explain why it provided a certain response. In our case, we will ask ChatGPT to explain why it selected these particular product improvements as the most important ones. As you’ll see, ChatGPT considers each product improvement in the larger context of how a product could improve customer satisfaction rather than simply counting the number of times a certain type of improvement was mentioned.

# This code block asks chatGPT to explain the reasonining behind why it thought these Top 10 improvement suggestions were the most important ones.
# Each of the top selections is explained in a few sentences for better understanding 

def generate_improvement_suggestions_prioritized_explained(text):
    word_blocks = text.split(' ')
    block_size = 2500
    blocks = [' '.join(word_blocks[i:i + block_size]) for i in range(0, len(word_blocks), block_size)]

    suggestions_prioritized_explained = []

    for block in tqdm(blocks, desc="Processing blocks", unit="block"):
        # Here we set the role for chatGPT and tell it to use the list of prioritized Top 10 suggestions it made in the previous code block.
        # Then we instruct chatGPT to explain why it selected each of the Top 10 improvements from above
        messages = [
            {"role": "system", "content": "You are an AI language model trained to analyze product reviews and generate suggestions for product improvements."},
            {"role": "user", "content": f"Using this list of most important suggested improvements above {block}, explain why each of the suggested improvements was selected."}
        ]

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            max_tokens=3500,
            n=1,
            stop=None,
            temperature=0.7
        )

        consolidated_explained = completion.choices[0].message.content
        suggestions_prioritized_explained.append(consolidated_explained)

    # Combine all of the explanations into one list
    combined_suggestions_prioritized_explained = "\n\n".join(suggestions_prioritized_explained)
    return combined_suggestions_prioritized_explained

# This calls the custom function we created above and asks chatGPT to explain why it selected each of the Top 10 improvement suggestions
improvement_suggestions_prioritized_explained = generate_improvement_suggestions_prioritized_explained(improvement_suggestions_prioritized)

# Print the improvement suggestions explanations (optional step)
print(improvement_suggestions_prioritized_explained)

# Save the results to a new Excel file
df_improvements_explained = pd.DataFrame()
list_improvements_explained = []
list_improvements_explained.append(improvement_suggestions_prioritized_explained)
df_improvements_explained["improvements_explained"] = list_improvements_explained
output_file_improvements_explained = "reviews_analyzed_full_improvements_explained.xlsx"
df_improvements_explained.to_excel(output_file_improvements_explained, index=False)

# Save the results to a new Word file
output_file_improvements_explained = "reviews_analyzed_full_improvements_explained.docx"
doc = docx.Document()

# Add a table to the Word doc with the header Improvement Suggestions Explained
table = doc.add_table(rows=1, cols=1)
header_cells = table.rows[0].cells
header_cells[0].text = 'Improvement Suggestions Explained'

# Add our explanation results to the table we just created
for index, row in df_improvements_explained.iterrows():
    row_cells = table.add_row().cells
    row_cells[0].text = str(row['improvements_explained'])

doc.save(output_file_improvements_explained)        

Why did chatGPT choose these particular improvement suggestions rather than others? Let’s ask it!

No alt text provided for this image


The Power of ChatGPT AI to Understand And Reason

For the final part of this tutorial, we’ll ask ChatGPT to rank the Top 10 list of product improvement suggestions for us based on the estimated difficulty of implementing each suggestion. This is important for any company with limited resources and serves as a reasonableness check on ChatGPT’s improvement suggestions.

# This code block will consider each of the 10 improvement suggestions chatGPT suggested and then rank them in terms of time to complete, starting with the fastest one listed first.
def generate_improvement_suggestions_effort_required(text):
    word_blocks = text.split(' ')
    block_size = 2500
    blocks = [' '.join(word_blocks[i:i + block_size]) for i in range(0, len(word_blocks), block_size)]

    suggestions_effort_required = []

    for block in tqdm(blocks, desc="Processing blocks", unit="block"):
        # Here we tell chatGPT to use the Top 10 suggestions list and rank it in order of time to complete each improvement and explain why the ranking was chosen. 
        # The shortest time to complete is listed first.
        messages = [
            {"role": "system", "content": "You are an AI language model trained to analyze product reviews and generate suggestions for product improvements."},
            {"role": "user", "content": f"Using this list of most important suggested improvements above {block}, rank them in terms of time to complete, starting with the fastest one and explain your reasoning."}
        ]

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            max_tokens=3000,
            n=1,
            stop=None,
            temperature=0.7
        )

        effort_required = completion.choices[0].message.content
        suggestions_effort_required.append(effort_required)

    # Combine all rank suggestions into one list
    combined_suggestions_effort_required = "\n\n".join(suggestions_effort_required)
    return combined_suggestions_effort_required

# This calls the custom function we created above and passes the Top 10 improvement suggestions list to the function
improvement_suggestions_effort_required = generate_improvement_suggestions_effort_required(improvement_suggestions_prioritized)

# Print the improvement suggestions (optional step)
print(improvement_suggestions_effort_required)

# Save the ranked results to a new Excel file
df_improvements_effort = pd.DataFrame()
list_improvements_effort = []
list_improvements_effort.append(improvement_suggestions_effort_required)
df_improvements_effort["improvement_effort"] = list_improvements_effort
output_file_improvements_effort = "reviews_analyzed_full_improvements_effort.xlsx"
df_improvements_effort.to_excel(output_file_improvements_effort, index=False)

# Also, save the ranked results to a new Word file
output_file_improvements_effort = "reviews_analyzed_full_improvements_effort.docx"
doc = docx.Document()

# Add a table to the Word doc with the heading: Improvement Suggestions Effort Required (Ranked Most to Least Effort Required)
table = doc.add_table(rows=1, cols=1)
header_cells = table.rows[0].cells
header_cells[0].text = 'Improvement Suggestions Effort Required (Ranked Most to Least Effort Required)'

# Populate the table with the ranked improvement suggestions
for index, row in df_improvements_effort.iterrows():
    row_cells = table.add_row().cells
    row_cells[0].text = str(row['improvement_effort'])

doc.save(output_file_improvements_effort)        

As the final step of our analysis in this tutorial, let’s ask chatGPT to estimate the amount of difficulty (i.e. effort) that would be required to implement the suggested improvements, ranked from easiest to hardest. This would be helpful for a marketing manager or product manager to understand what can realistically be accomplished in a given amount of time.

No alt text provided for this image


Conclusion

In this tutorial, we performed a variety of natural language processing tasks that would have previously required a decent amount of Python coding and data science experience. However, thanks to the ChatGPT openAI API, we can accomplish all of this in a few blocks of reusable code. You should now feel confident using the chatGPT API to perform the following tasks on your company’s own product reviews:


  1. Generating a list of pros and cons from our raw customer reviews
  2. Generating a list of product improvement suggestions from the raw reviews
  3. Consolidating the list of improvement suggestions to just the Top 10
  4. Asking ChatGPT to explain why it selected these improvements as the most important
  5. Asking ChatGPT to rank and prioritize the list of suggestions based on the estimated effort to complete them

In the next part of this tutorial, we will use ChatGPT to help us create a summary presentation for senior management to explain our findings and recommendations. If you like this tutorial, make sure to follow me for all future articles and projects!

Here are some other articles you may like:

  1. Practical Examples of AI and Machine Learning in Business
  2. 5 Reasons Why Business Data Science Projects Fail
  3. Practical Application of Logistic Regression for Business

I’m happy to answer any questions you have. If you enjoyed this article, follow me on?Medium?for more ideas how to apply data science to solve real business challenges.

Eitan Rosenfelder

Statistics Masters degree student at Hebrew University of Jerusalem. Working on a thesis in Machine Learning and Data Science. With a Bachelors in Statistics, Economics and emphasis in Data Science

1 年

Looks awesome, thanks for sharing. I have many reviews (a few million of one sentence reviews ) the sleep is problematic, do you or someone else here have an idea how to deal with it? By the way my data is financial.

回复
??t Nguy?n

Treasury Management Specialist at Nguyen Hoang Group

1 年

It's so interesting??

要查看或添加评论,请登录

Courtlin Holt-Nguyen的更多文章

社区洞察

其他会员也浏览了