Comprehensive Guide to ChatGPT Conversation Analysis - A quick 5-Minute Exercise
Introduction
Analyzing your conversations with ChatGPT can offer fascinating insights into your chat patterns, topics of interest, and engagement habits. This process involves extracting raw data from your ChatGPT sessions and processing it with Python on your on system to derive meaningful statistics. Find my stats at the end.
If you have already Phyton ready this will take you not more then 5 minutes to get interesting insights about your ChatGPT usage.
Privacy Note:
Always keep privacy in mind. If your chat data contains personal or sensitive information, ensure that your analysis respects privacy considerations and that the data is securely handled.
These analyses could provide both valuable insights into your interaction styles and preferences, as well as a bit of fun in looking back at your conversational journey with ChatGPT.
For me Chat-GPT has become an co-pilot on different projects & tasks.
It also assisted me to make this analysis. It provided the phyton code below that helped to analyze the conversations raw data.
Step 1: Obtaining your Raw Data
Step 2: Setting Up Python Environment
Step 3: Initial Analysis with Python
Use the following Python script as your starting point. This script calculates the total number of main and sub-chats, total messages, 'As an AI' warnings, thank yous, and an estimated total duration of your chats:
import json
def load_data(file_path):
with open(file_path, 'r') as file:
data = json.load(file)
return data
def parse_conversations(data, avg_time_per_message=30):
stats = {
'total_main_chats': 0,
'total_sub_chats': 0,
'total_messages': 0,
'as_ai_warnings': 0,
'thank_yous': 0,
'estimated_total_duration': 0 # in seconds
}
for conversation in data:
mapping = conversation.get('mapping', {})
for conv_id, conv_data in mapping.items():
message = conv_data.get('message')
if message:
num_messages = len(conv_data.get('children', []))
stats['total_messages'] += num_messages
content = message.get('content', {}).get('parts', [])
if content:
text = extract_text(content)
stats['as_ai_warnings'] += text.count('as an AI')
stats['thank_yous'] += text.lower().count('thank you')
# Identify main chats
parent_id = conv_data.get('parent')
is_main_chat = not parent_id or mapping.get(parent_id, {}).get('parent') is None
if is_main_chat:
stats['total_main_chats'] += 1
else:
stats['total_sub_chats'] += 1
# Estimate duration for each message
stats['estimated_total_duration'] += num_messages * avg_time_per_message
return stats
def extract_text(content_parts):
text_parts = []
for part in content_parts:
if isinstance(part, str):
text_parts.append(part)
elif isinstance(part, dict) and 'text' in part:
text_parts.append(part['text'])
return " ".join(text_parts)
def main():
file_path = 'path_to_your_json_file.json' # Replace with the actual file path
data = load_data(file_path)
stats = parse_conversations(data)
# Convert estimated total duration from seconds to hours
stats['estimated_total_duration'] = stats['estimated_total_duration'] / 3600
print(f"Total Main Chats: {stats['total_main_chats']}")
print(f"Total Sub Chats: {stats['total_sub_chats']}")
print(f"Total Messages: {stats['total_messages']}")
print(f"'As an AI' Warnings: {stats['as_ai_warnings']}")
print(f"Thank Yous: {stats['thank_yous']}")
print(f"Estimated Total Duration (hours): {stats['estimated_total_duration']}")
if __name__ == "__main__":
main()
Replace 'path_to_your_json_file.json' with the actual path to your JSON file.
Step 4: Further Analysis
Once you have an initial understanding of your chat data, you can extend your analysis to explore the most active hours, average message length, and common themes.
领英推荐
import json
from datetime import datetime
import pandas as pd
from collections import Counter
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
def load_data(file_path):
with open(file_path, 'r') as file:
data = json.load(file)
return data
def analyze_data(data):
timestamps = []
message_lengths = []
words = []
for conversation in data:
mapping = conversation.get('mapping', {})
for conv_id, conv_data in mapping.items():
message = conv_data.get('message')
if message and message.get('author', {}).get('role') == 'user':
content = message.get('content', {}).get('parts', [])
if content:
text = " ".join([part for part in content if isinstance(part, str)])
message_lengths.append(len(text.split()))
words.extend(text.lower().split())
create_time = message.get('create_time')
if create_time:
timestamps.append(datetime.fromtimestamp(create_time))
return timestamps, message_lengths, words
def main():
file_path = 'path_to_your_json_file.json' # Replace with your file path
data = load_data(file_path)
timestamps, message_lengths, words = analyze_data(data)
times_df = pd.DataFrame({'timestamp': timestamps})
times_df['hour'] = times_df['timestamp'].dt.hour
most_active_hour = times_df['hour'].mode()[0]
avg_message_length = sum(message_lengths) / len(message_lengths)
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word not in stop_words and word.isalpha()]
word_freq = Counter(filtered_words)
most_common_words = word_freq.most_common(10)
print(f"Most Active Hour: {most_active_hour}")
print(f"Average Message Length: {avg_message_length} words")
print("Most Common Words:", most_common_words)
if __name__ == "__main__":
main()
Replace 'path_to_your_json_file.json' with the path to your JSON file containing the chat data.
Conclusion
This guide offers a structured approach to analyzing your ChatGPT conversation data. Starting from extracting raw data to performing initial analysis with Python, you can gain valuable insights into your interaction with AI.
You can also go further and analyze e.g. an hourly activity heatmap showing when you are most active in your chats, a histogram of message lengths to visualize the distribution of your message lengths or a bar chart displaying the most common words used in your chats. For additional insights you will need to change the code. GPT will help you with that ;-)
My Results:
I use Chat-GPT for a really long time since Autum 2022. I had the pleasure to early test and also see the platform adapting and evolving. Naturally it hooked me to try out much more and to use it more often to assist me on my different projects & tasks.
Interpretation :
Note:
Remember, the estimated total duration is based on a preset average time per message. If your actual message interaction time differs from this average, the total estimated hours might vary.
Like this? Follow me.
I'm Stephan G?tze - a?MarTech HERO?and I help companies to "Unlock Marketing" providing strategies for Modern Marketing Leaders in the Data-Driven Age.
Click my name + Follow + ??
????Vom Arbeitswissenschaftler zum Wissenschaftskommunikator: Gemeinsam für eine sichtbarere Forschungswelt
11 个月Wow, this sounds fascinating! Can't wait to uncover the hidden patterns in my AI conversations. ??
Senior Managing Director
11 个月Stephan Goetze Fascinating read. Thank you for sharing