Comprehensive Guide to ChatGPT Conversation Analysis - A quick 5-Minute Exercise

Comprehensive Guide to ChatGPT Conversation Analysis - A quick 5-Minute Exercise

Introduction

Analyzing your conversations with ChatGPT can offer fascinating insights into your chat patterns, topics of interest, and engagement habits. This process involves extracting raw data from your ChatGPT sessions and processing it with Python on your on system to derive meaningful statistics. Find my stats at the end.


If you have already Phyton ready this will take you not more then 5 minutes to get interesting insights about your ChatGPT usage.


Privacy Note:

Always keep privacy in mind. If your chat data contains personal or sensitive information, ensure that your analysis respects privacy considerations and that the data is securely handled.

These analyses could provide both valuable insights into your interaction styles and preferences, as well as a bit of fun in looking back at your conversational journey with ChatGPT.



For me Chat-GPT has become an co-pilot on different projects & tasks.

It also assisted me to make this analysis. It provided the phyton code below that helped to analyze the conversations raw data.


Step 1: Obtaining your Raw Data

  1. Access Chat Data: You can download your ChatGPT conversations from your GPT profile.
  2. Export Data: Under "Settings & Beta" go to "Data controls" and click "Export" button.
  3. File Format: The data you will download is zipped. Don't worry it has a wired long cryptic name. You need to extract and will find different JSON files, which is ideal for analysis in Python. You onls need to use the conversations.json for the analysis

Step 2: Setting Up Python Environment

  1. Install Python: Ensure Python is installed on your system. It can be downloaded from python.org.
  2. If pip is not installed (you'll know because the command below will return an error), you can install it by: Downloading get-pip.py from https://bootstrap.pypa.io/get-pip.py. In your CLI, navigate to the folder where get-pip.py is downloaded. Run python get-pip.py.
  3. Open Command Prompt (Windows) or Terminal (Mac/Linux):On Windows, you can press Win + R, type cmd, and press Enter.On a Mac, press Cmd + Space, type Terminal, and press Enter.
  4. Check Python Version:Type python --version and press Enter.If Python is installed correctly, you should see the version number.
  5. Install Libraries: Use pip to install necessary Python libraries:'pip install pandas nltk'
  6. Optional - Jupyter Notebook: For a user-friendly analysis experience, install Jupyter Notebook:'pip install notebook'

Step 3: Initial Analysis with Python

Use the following Python script as your starting point. This script calculates the total number of main and sub-chats, total messages, 'As an AI' warnings, thank yous, and an estimated total duration of your chats:

import json

def load_data(file_path):
    with open(file_path, 'r') as file:
        data = json.load(file)
    return data

def parse_conversations(data, avg_time_per_message=30):
    stats = {
        'total_main_chats': 0,
        'total_sub_chats': 0,
        'total_messages': 0,
        'as_ai_warnings': 0,
        'thank_yous': 0,
        'estimated_total_duration': 0  # in seconds
    }

    for conversation in data:
        mapping = conversation.get('mapping', {})
        for conv_id, conv_data in mapping.items():
            message = conv_data.get('message')
            if message:
                num_messages = len(conv_data.get('children', []))
                stats['total_messages'] += num_messages
                content = message.get('content', {}).get('parts', [])
                if content:
                    text = extract_text(content)
                    stats['as_ai_warnings'] += text.count('as an AI')
                    stats['thank_yous'] += text.lower().count('thank you')

                # Identify main chats
                parent_id = conv_data.get('parent')
                is_main_chat = not parent_id or mapping.get(parent_id, {}).get('parent') is None
                if is_main_chat:
                    stats['total_main_chats'] += 1
                else:
                    stats['total_sub_chats'] += 1

                # Estimate duration for each message
                stats['estimated_total_duration'] += num_messages * avg_time_per_message

    return stats

def extract_text(content_parts):
    text_parts = []
    for part in content_parts:
        if isinstance(part, str):
            text_parts.append(part)
        elif isinstance(part, dict) and 'text' in part:
            text_parts.append(part['text'])
    return " ".join(text_parts)

def main():
    file_path = 'path_to_your_json_file.json'  # Replace with the actual file path
    data = load_data(file_path)
    stats = parse_conversations(data)

    # Convert estimated total duration from seconds to hours
    stats['estimated_total_duration'] = stats['estimated_total_duration'] / 3600

    print(f"Total Main Chats: {stats['total_main_chats']}")
    print(f"Total Sub Chats: {stats['total_sub_chats']}")
    print(f"Total Messages: {stats['total_messages']}")
    print(f"'As an AI' Warnings: {stats['as_ai_warnings']}")
    print(f"Thank Yous: {stats['thank_yous']}")
    print(f"Estimated Total Duration (hours): {stats['estimated_total_duration']}")

if __name__ == "__main__":
    main()
        

Replace 'path_to_your_json_file.json' with the actual path to your JSON file.

Step 4: Further Analysis

Once you have an initial understanding of your chat data, you can extend your analysis to explore the most active hours, average message length, and common themes.

import json
from datetime import datetime
import pandas as pd
from collections import Counter
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

def load_data(file_path):
    with open(file_path, 'r') as file:
        data = json.load(file)
    return data

def analyze_data(data):
    timestamps = []
    message_lengths = []
    words = []

    for conversation in data:
        mapping = conversation.get('mapping', {})
        for conv_id, conv_data in mapping.items():
            message = conv_data.get('message')
            if message and message.get('author', {}).get('role') == 'user':
                content = message.get('content', {}).get('parts', [])
                if content:
                    text = " ".join([part for part in content if isinstance(part, str)])
                    message_lengths.append(len(text.split()))
                    words.extend(text.lower().split())
                create_time = message.get('create_time')
                if create_time:
                    timestamps.append(datetime.fromtimestamp(create_time))

    return timestamps, message_lengths, words

def main():
    file_path = 'path_to_your_json_file.json'  # Replace with your file path
    data = load_data(file_path)
    timestamps, message_lengths, words = analyze_data(data)

    times_df = pd.DataFrame({'timestamp': timestamps})
    times_df['hour'] = times_df['timestamp'].dt.hour
    most_active_hour = times_df['hour'].mode()[0]
    avg_message_length = sum(message_lengths) / len(message_lengths)

    stop_words = set(stopwords.words('english'))
    filtered_words = [word for word in words if word not in stop_words and word.isalpha()]
    word_freq = Counter(filtered_words)
    most_common_words = word_freq.most_common(10)

    print(f"Most Active Hour: {most_active_hour}")
    print(f"Average Message Length: {avg_message_length} words")
    print("Most Common Words:", most_common_words)

if __name__ == "__main__":
    main()
        

Replace 'path_to_your_json_file.json' with the path to your JSON file containing the chat data.

Conclusion

This guide offers a structured approach to analyzing your ChatGPT conversation data. Starting from extracting raw data to performing initial analysis with Python, you can gain valuable insights into your interaction with AI.

You can also go further and analyze e.g. an hourly activity heatmap showing when you are most active in your chats, a histogram of message lengths to visualize the distribution of your message lengths or a bar chart displaying the most common words used in your chats. For additional insights you will need to change the code. GPT will help you with that ;-)


My Results:

I use Chat-GPT for a really long time since Autum 2022. I had the pleasure to early test and also see the platform adapting and evolving. Naturally it hooked me to try out much more and to use it more often to assist me on my different projects & tasks.


  1. Total Main Chats: I had 190 main chat sessions. These are individual sessions or conversations I started that deal with a specific topic or task.
  2. Total Sub Chats: Within these main chats, there are 5705 sub-chats. These include the follow-up messages or interactions within each main chat session.
  3. Total Messages: The total count of messages is 5705, which aligns with the number of sub-chats. This means each sub-chat corresponds to a single message.
  4. 'As an AI' Warnings: I have encountered 12 instances where the system provided a warning or clarification about its AI nature. This is relatively low, suggesting that most of my interactions were straightforward in terms of AI capabilities and limitations.
  5. Thank Yous: I have thanked ChatGPT 196 times. I am of course polite and appreciative.
  6. Estimated Total Duration: I have spent about a minmum of 47.54 hours chatting. This estimation is based on an average duration per message, which was set to 30 seconds. Depending on your actual interaction speed, the real total duration could be higher or lower. I assume my value is to low.
  7. Most Active Hour: 11 PM This suggests that I am most active in my interactions with ChatGPT during the late-night hours (11 PM). When kids sleep and it is quiet it is very effective to engage with Chat-GPT.
  8. Average Message Length: 94.9 words An average message length of approximately 95 words indicates a tendency towards detailed and comprehensive queries or responses. This suggests that my use of ChatGPT involves thorough explanations or complex queries, which is quite common in professional or technical contexts.
  9. Most Common Words: The list of most common words provides a window into the main topics and themes of my conversations: 'Data' and 'Marketing' being at the top mirrors my strong focus on marketing data or data-driven marketing strategies. 'User', 'Martech', 'Customer': These words align well with a focus on marketing technology (MarTech) and customer/user-oriented discussions.

Interpretation :

  • The usage pattern shows a healthy level of engagement with the ChatGPT system, indicated by a substantial number of main chats and messages.
  • The 'As an AI' warnings and 'Thank Yous' give a glimpse into the nature of my interactions, suggesting a mix of queries and courteous exchanges.
  • The estimated duration of nearly 47.5 hours indicates significant time spent interacting with ChatGPT, suggesting either extensive sessions or high frequency of use.
  • My activity peak at 11 PM point to after-hours research or work, suggesting a high level of dedication or interest in the subjects you're discussing with ChatGPT.
  • The length of my messages and the presence of technical and marketing-related terms indicate a professional use of the tool, likely exploring complex concepts or strategies in marketing technology.


Note:

Remember, the estimated total duration is based on a preset average time per message. If your actual message interaction time differs from this average, the total estimated hours might vary.


Like this? Follow me.

I'm Stephan G?tze - a?MarTech HERO?and I help companies to "Unlock Marketing" providing strategies for Modern Marketing Leaders in the Data-Driven Age.

Click my name + Follow + ??



Udo Kiel

????Vom Arbeitswissenschaftler zum Wissenschaftskommunikator: Gemeinsam für eine sichtbarere Forschungswelt

11 个月

Wow, this sounds fascinating! Can't wait to uncover the hidden patterns in my AI conversations. ??

Woodley B. Preucil, CFA

Senior Managing Director

11 个月

Stephan Goetze Fascinating read. Thank you for sharing

要查看或添加评论,请登录

Stephan Goetze的更多文章

社区洞察

其他会员也浏览了