Harnessing the Potential of Noteable and ChatGPT, PT2
Hello and welcome to the 2nd part of our journey on verbatim analysis using Noteable and chatGPT. In our previous article, we scraped and performed some high-level Latent Dirichlet Allocation (LDA) analysis. However, we encountered some issues with the notebook. This time, we are taking a different approach. We will read the dataset uploaded in the project, declare a dataframe (df), and proceed with a more detailed verbatim analysis.
First, I scraped the comments from Roblox using the code provided by chatGPT in the first notebook. To enable chatGPT to recognize the file path, you need to specify the project link and the file name in the prompt. Since this approach focuses more on the cohesiveness of the topics rather than visuals, I decided to modify the df initially.
chatGPT successfully created the required columns. Once again, I simulated the Net Promoter Score (NPS) by using the score, and I created a cleaned column by preprocessing the verbatim using Gensim to remove stop words and unhelpful text for our analysis. With this column, I was able to fit the LDA model.
While setting up the LDA model, I asked chatGPT to run a coherence test to determine the recommended number of topics within the corpus. I added a conditional statement: if the result is 2, proceed with 4 topics; otherwise, create the recommended number of topics from the test.
The test revealed a total of 12 different topics. So, we proceeded to create a new column for high-level categories, which grouped these 12 topics. chatGPT suggested using 5 high-level categories, and I accepted the suggestion. Having the data grouped into smaller samples made it easier to identify high-level insights. It was found that game experience was the top topic among the detractors. However, I noticed that some comments provided a score of 5 but still mentioned glitches or opportunities. This led me to wonder about the sentiment of these comments.
As the next step, we performed sentiment analysis on the verbatim using TextBlob and categorized the sentiments into three common categories. This allowed me to compare how aligned the sentiment was with the scores for the app. I successfully identified comments that provided a positive score but had a negative sentiment. These customers mostly talked about lag, crashes, issues after updates, and were requesting the developer to fix the problems. Another common feedback found was related to in-game purchases. Interestingly, people in the comments were also discussing their bad experiences with the chat feature and other users.
领英推荐
But was that all? I wasn't satisfied with these results, even though they provided a high-level overview. Therefore, I decided to check the semantic distance between each other using a scatter plot instead of pyLDAvis. Due to LDA's nature, the first thing I noticed was the overlapping of verbatims. The 12 clusters provided by LDA were scattered, and the semantic split was not evident. So, I thought, why not try a different algorithm for comparison? We proceeded with Non-Negative Matrix Factorization (NMF). For this, we created 5 topics and compared the verbatim topics with the high-level grouping from LDA. This comparison yielded only a 14% match.
After visualizing this data in a scatter plot, I noticed that the split by cluster was well-defined with a reduced level of overlapping, which is acceptable for an unsupervised machine learning algorithm. Here are the topics identified:
To delve into further details and test the capabilities of Noteable, I proceeded with an LDA for each topic. However, Noteable encountered some issues. Right after the command, chatGPT overwrote the file for some unknown reason, resulting in the loss of the columns created and reverting the df to its original state.
Seeing the errors and the loss of the created columns didn't shock me. I knew the steps required to quickly get back to the same stage. chatGPT started to get confused to the point where it even created a new notebook without informing me. The most peculiar thing was that it overwrote the df with one of those free ML datasets containing petal information for no reason at all. This is when I decided to stop and continue this journey in a third part.
As we conclude this article, our journey through verbatim analysis using Noteable and chatGPT has provided valuable insights. We explored the verbatim, created conditional columns, performed sentiment analysis, and compared sentiment alignment with NPS categories. With the help of LDA and NMF models, we gained high-level insights and validated our findings. Although we encountered some challenges with chatGPT's erratic behavior, our enthusiasm remains high for the upcoming third part. In the next installment, we will interpret the results and apply filters to the dataset. Our score of 9/10 reflects the promising nature of this exercise, despite being in the testing stage. We eagerly await the final attempt, which promises further progress and learning. Stay tuned!
Thanks for Sharing! ??