登录查看更多内容

How to easily classify customer feedback w/ Machine Learning

Rohan Attravanam

?? DM "Nurture", to get my free training on how to nurture your database to get 3 additional deals a month ?? GoHighLevel Certified Admin

发布日期: 2018年3月13日

We collect customer feedback all the time. If your traffic is in millions, you end up having tens of thousands of comments to read through. It’s a challenge to efficiently identify what the key themes of those comments are. One way of accomplishing this is by bucketing the comments into themes of concerns and prioritizing based on which concern is bothering the maximum number of customers. This can be easily accomplished by employing text classification techniques using the scikit-learn packages in python. Let me describe how we went about accomplishing this.

You can employ a semi-supervised learning technique here. It means, we train a model using a training dataset, which contains customer comments/concerns and the theme each comment should be associated with. This needs to be manually tagged for a small sample of comments, say about 100. The themes could be, bad.site.experience, slow.speeds, incorrect.data, spam, unsubscribe etc.

Once the 100 comments are tagged, we train an Machine Learning model using SVM (Support Vector Machine) technique. It’s described in detail on the scikit-learn page here. Once trained, this model is now capable of tagging any given comment. 100 is a rather small data set to learn from, so we expand the data set by bootstrapping to about 250 comments. (Bootstrap in statistics is essentially filling missing data). To bootstrap, we run the model on 250 comments (100 of which we already tagged) we have 250 comments with tags now. Not all the comments would be perfect, but this is a quick way of increasing the sample data set.

The 250 comments dataset serves as a training and test data set (80:20) to improve our model on. Once we are satisfied with the accuracy, you can run the fitted model on a much larger dataset, like six-month data. I have about ~65% accuracy. For 10 different categories, a random pick is 10% accuracy. This is ~6.5 times better. A larger training data set (in 1000s) will improve accuracy. We did not want a very accurate model for our exercise. We picked a couple of categories and dug deeper into specifics. For example, if "feature.request" is a big theme, we dig deeper into what types of feature requests are popping up more often? This makes it more actionable for the product teams to pursue and build their pipeline.

For the full code, check out here

The core ML code is ...

#SVM - Code for classification
text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')),
                     ('tfidf', TfidfTransformer()),
                     ('clf', SGDClassifier(loss='hinge', penalty='l2',
                                           alpha=1e-3, random_state=42,
                                           max_iter=5, tol=None))])
#Fit the model
text_clf.fit(X_train, y_train)


#Load data on which to predict
data_all = pd.read_csv("/Users/.../six_months_data.csv")
#Predict - Classify 
predicted = text_clf.predict(data_all)

Almustafa Azhari

Sr. Software Engineer, Elevate (YC W22)

3 个月

Loved it. Thanks for sharing this

Rohan Attravanam

?? DM "Nurture", to get my free training on how to nurture your database to get 3 additional deals a month ?? GoHighLevel Certified Admin

6 年

A few of you have reached out asking for more info and explanation on the article. For others... please feel free to reach out if you want to learn more about the technique or code. Happy to teach.

Rohan Attravanam

?? DM "Nurture", to get my free training on how to nurture your database to get 3 additional deals a month ?? GoHighLevel Certified Admin

6 年

Patty... Does Jeff use anything similar?

Patty Tran

?? traveler & trailer runner. ex Apple, KPMG. CPA. Bitcoin ??

6 年

Jeff Lo

1 次回应

查看更多评论

要查看或添加评论，请登录

Rohan Attravanam的更多文章

Here's how to handle the objection "I want to wait until the market crashes"

2022年8月27日

Here's how to handle the objection "I want to wait until the market crashes"

It's 2022 August and the world is headed towards a slowdown. Buyers have been frustrated for over 2 years and have been…

1 条评论
Need help with Leads?

2020年12月16日

Need help with Leads?

The basic leadgen advice is the same no matter where you read it from. Very few sources call out the specifics.
REALTORS: Get crazy ROI for your marketing spend

2020年7月7日

REALTORS: Get crazy ROI for your marketing spend

Here are some results first!! ?? INSANE ROI ?? 16 days in and we have 13 qualified leads and Live Transfers and Booked…
Compress your timeline

2020年7月5日

Compress your timeline

I learnt something this week. Like the "lightbulb" moment.
Qualifying Leads vs. Nurturing them

2020年5月24日

Qualifying Leads vs. Nurturing them

"I need good leads so I don't waste time qualifying them" Every day I chat with about 6-10 realtors and over and over I…
4 things nobody tells you about Startup Weekend

2018年5月31日

4 things nobody tells you about Startup Weekend

Startup Weekend. Some of you might have heard of the event.

1 条评论

See all articles

How to easily classify customer feedback w/ Machine Learning

Rohan Attravanam

?? DM "Nurture", to get my free training on how to nurture your database to get 3 additional deals a month ?? GoHighLevel Certified Admin

Rohan Attravanam的更多文章

社区洞察

其他会员也浏览了

Early adopter version of my book - mathematical foundations of data science - key maths ideas you should know for mathematical foundations of AI

Understanding CatBoost!

Issue #306 - The ML Engineer ??

7 Machine Learning Algorithms Made Easy

SpeedML

Data Phoenix Digest - ISSUE 15.2023

Dive Deep into Machine Learning Algorithms with Our New YouTube Playlist

Cluster bugs using ML (K-Means Clustering Algorithm) – A step-by-step approach

Evaluating Linear Regression Models

Data Phoenix Digest - ISSUE 2.2023

Rohan Attravanam的更多文章

Here's how to handle the objection "I want to wait until the market crashes"

Need help with Leads?

REALTORS: Get crazy ROI for your marketing spend

Compress your timeline

Qualifying Leads vs. Nurturing them

4 things nobody tells you about Startup Weekend

社区洞察

其他会员也浏览了

Early adopter version of my book - mathematical foundations of data science - key maths ideas you should know for mathematical foundations of AI

Understanding CatBoost!

Issue #306 - The ML Engineer ??

7 Machine Learning Algorithms Made Easy

SpeedML

Data Phoenix Digest - ISSUE 15.2023

Dive Deep into Machine Learning Algorithms with Our New YouTube Playlist

Cluster bugs using ML (K-Means Clustering Algorithm) – A step-by-step approach

Evaluating Linear Regression Models

Data Phoenix Digest - ISSUE 2.2023