登录查看更多内容

Practial Guide on Text Mining and Feature Engineering in R

Manish Saraswat

Senior Machine Learning Engineer

发布日期: 2017年4月10日

The ability to deal with text data is one of the important skills a data scientist must posses. With advent of social media, forums, review sites, web page crawlers companies now have access to massive behavioural data of their customers.

Yes, companies have more of textual data than numerical data. No doubt, this data will be messy. But, beneath it lives an enriching source of information, insights which can help companies to boost their businesses.

That is the reason, why natural language processing (NLP) a.k.a Text Mining as a technique is growing rapidly and being extensively used by data scientists.

In this tutorial, you'll about text mining from scratch. We'll follow a stepwise pedagogy to understand text mining concepts. Later, we'll work on a current kaggle competition data set to gain practical experience, which is followed by two practice exercises.

For this tutorial, the programming language used is R. However, the techniques explained below can be implemented in any programming language.

Make sure you've finished the regular expression tutorial before starting with text mining.

What are Regular Expressions ? When do you use them ?
What is String Manipulation ?
List of String Manipulation Functions
List of Regular Expression CommandsMetacharacters
Sequences
Quantifiers
Character Classes
POSIX character classes
Practice Examples on Regular Expressions

Read Tutorial

Feel free to drop your suggestions, experience or any new technique you've used while dealing with string variables in a data set.

要查看或添加评论，请登录

Manish Saraswat的更多文章

Start with Deep Learning & Parameter Tuning with MXnet, H2o Package in R

2017年1月31日

Start with Deep Learning & Parameter Tuning with MXnet, H2o Package in R

Introduction Deep Learning isn't a recent discovery. The seeds were sown back in the 1950s when the first artificial…

2 条评论
Practical Guide to Clustering Algorithms & Evaluation in R

2017年1月19日

Practical Guide to Clustering Algorithms & Evaluation in R

Introduction Clustering algorithms are a part of unsupervised machine learning algorithms. Why unsupervised ? Because…
How can R Users Learn Python for Data Science ?

2017年1月13日

How can R Users Learn Python for Data Science ?

Introduction This article is meant to help R users to enhance their set of skills and learn Python for data science…

9 条评论
Practical Guide to Logistic Regression Analysis in R

2017年1月5日

Practical Guide to Logistic Regression Analysis in R

Introduction Recruiters in analytics/data science industry expect you to know atleast two algorithms: Linear Regression…
SQL Tutorial on Data Analysis in R

2016年12月28日

SQL Tutorial on Data Analysis in R

Introduction Many people are pursuing data science as a career (to become a data scientist) choice these days. With the…
XGBoost Tutorial in R (from Scratch)

2016年12月20日

XGBoost Tutorial in R (from Scratch)

Introduction Lately, I've come to know that a lot of newbies in R are keen to use xgboost package at best. And, why…

2 条评论
Tutorial on Random Forest and Parameter Tuning in R

2016年12月14日

Tutorial on Random Forest and Parameter Tuning in R

Introduction Random Forest is one of the most versatile machine learning algorithms available today. With its built-in…

1 条评论
Beginners Guide to Regression Analysis and Plot Interpretations

2016年12月6日

Beginners Guide to Regression Analysis and Plot Interpretations

"The Road to Machine Learning starts with Regression. Are you ready?" If you are aspiring to become a data scientist…
Machine Learning Project on Imbalanced Data set in R

2016年9月21日

Machine Learning Project on Imbalanced Data set in R

Lot of us get rejected during data science / machine learning interviews. Do you know why? Because, their resumes never…
Questions on Machine Learning & Statistics - Can you answer?

2016年9月16日

Questions on Machine Learning & Statistics - Can you answer?

With this article, I've tried to summarize the extensive machine learning subject, into 40 tricky & thoughtful…

7 条评论

See all articles

Practial Guide on Text Mining and Feature Engineering in R

Manish Saraswat

Senior Machine Learning Engineer

Table of Contents

Manish Saraswat的更多文章

社区洞察

其他会员也浏览了

Accelerating Transformer Inference with Grouped Query Attention (GQA)

Opinion Mining and Sentiment Analysis: The Next Step for Marketers

BERT - Next Generation topic detection and sentiment analysis explained to business people

Dense Passage Retrieval for Open-Domain Question Answering

The ABCs of BERTopic: A Beginner's Guide

Quantum Computing and Its Potential for NLP

Word2Vec: The Basics

Text Analytics

Maximizing The Potential Of Latent Dirichlet Allocation In Natural Language Processing for Topic Modelling

Dense Passage Retrieval for Open-Domain Question Answering

Table of Contents

Manish Saraswat的更多文章

Start with Deep Learning & Parameter Tuning with MXnet, H2o Package in R

Practical Guide to Clustering Algorithms & Evaluation in R

How can R Users Learn Python for Data Science ?

Practical Guide to Logistic Regression Analysis in R

SQL Tutorial on Data Analysis in R

XGBoost Tutorial in R (from Scratch)

Tutorial on Random Forest and Parameter Tuning in R

Beginners Guide to Regression Analysis and Plot Interpretations

Machine Learning Project on Imbalanced Data set in R

Questions on Machine Learning & Statistics - Can you answer?

社区洞察

其他会员也浏览了

Accelerating Transformer Inference with Grouped Query Attention (GQA)

Opinion Mining and Sentiment Analysis: The Next Step for Marketers

BERT - Next Generation topic detection and sentiment analysis explained to business people

Dense Passage Retrieval for Open-Domain Question Answering

The ABCs of BERTopic: A Beginner's Guide

Quantum Computing and Its Potential for NLP

Word2Vec: The Basics

Text Analytics

Maximizing The Potential Of Latent Dirichlet Allocation In Natural Language Processing for Topic Modelling

Dense Passage Retrieval for Open-Domain Question Answering