登录查看更多内容

NLP for SPAM detection (preview of the next post)

Pablo Casas

Co-founder @ Edvai: Aprendizaje hiper-personalizado con AI ??

发布日期: 2019年12月13日

+ 关注

Lately I was working on NLP, so I decided to write about it.

I'm finishing a series of two blog posts about the incredible ULMFiT from fastai.

At the same time, it will be my first post in Python ??, and I'm writing it in collaboration with Pablo Zivic.

First, the intuition behind ??

Imagine you are a lawyer, that wants to study medicine; although it is a huge change, the underlying idea is you know how to speak in English, know the semantics to create a text, and the language rules.

So when you jump into medicine, you don't have to learn from scratch that after the word "They", it comes the word "were" (not "was").

You only learn the particularities of the domain field (medicine).

But what is ULMFit? ????

ULMFit stands for Universal Language Model Fine-tuning, and its implementation is in `fastai` pythons library.

It was developed by Jeremy Howard and Sebastian Ruder.

Check the paper: https://arxiv.org/pdf/1801.06146.pdf

ULMFit contains a network that was trained on a corpus of 103MM Wikipedia articles. So it already knows how to speak "neutral".

Why is it useful?

It allows us to save time when creating an NLP project, thanks to the **transfer learning** technique, we do only need to **fine-tune** the network to our data. Let's say, it learns the domain field words.

Especially handy if we don't have lots of data.

Next post: SPAM detection ??????♀?

Based on an SMS public database, that was labeled as SPAM / NOT SPAM, we will build a classifier to spot spam messages.

To this end, we will create a language model (what we've talked ??) and a classification model. It will be created on Google Collab so you can play-&-learn (just as I did with other projects!).

A real example from the post!

Following sentences were completed randomly (based on an input string):

Did you notice that, for some sentences, the language seems to be from an SMS text? ??

That's all for now!

Pablo Casas

Co-founder @ Edvai: Aprendizaje hiper-personalizado con AI ??

5 年

It's done! ??https://blog.datascienceheroes.com/spam-detection-using-fastai-ulmfit-part-1-language-model/

1 次回应

Hernán Escudero

Founder & CCO @ deployr / Sociólogo y Machine Learning Engineer

5 年

Looking forward to the full post! Seems like you got entangled in Python's web as well :p

查看更多评论

要查看或添加评论，请登录

Pablo Casas的更多文章

?Fuiste a la universidad? ?Te ense?aron grafos?

2020年9月2日

?Fuiste a la universidad? ?Te ense?aron grafos?

IF-TRUE, te lo ense?aron super abstracto, ?no? Definiciones tras definiciones, teoremas, demostraciones, mucho papel…

3 条评论
El contagio más sano, estudiar data science

2020年8月5日

El contagio más sano, estudiar data science

"Gracias por contagiarme las ganas de seguir aprendiendo." Qué fantástico! un mensaje que recibí hoy.

10 条评论
Correlación y causalidad en la vida cotidiana #ML

2020年6月7日

Correlación y causalidad en la vida cotidiana #ML

??Correlación y causa, un ejemplo de la vida cotidiana. Estuve 3 meses tratando de darme de baja de un servicio.

6 条评论
Data Scraping + Data Science | Reverse Engineering Case study

2020年5月7日

Data Scraping + Data Science | Reverse Engineering Case study

This is a short story about how to mix data scrapping and data science???????. The other day Yohn Jairo Parra Bautista…
??Algunas consideraciones para imputar valores NULL/missing en las variables numéricas #ML

2020年5月4日

??Algunas consideraciones para imputar valores NULL/missing en las variables numéricas #ML

Esta respuesta le di que me parece útil compartirla, por ser un tema común. Pregunta: "Quiero imputar valores perdidos…

1 条评论
?Podrá un meme ayudarnos a engancharnos con la matemática?

2019年12月30日

?Podrá un meme ayudarnos a engancharnos con la matemática?

Hoy día esta circulando el meme: El cual ejemplifica de manera graciosa la función compuesta. Básicamente una función…

10 条评论
Machine Learning: 3 técnicas en detección de outliers

2019年8月30日

Machine Learning: 3 técnicas en detección de outliers

Hace 3 semanas compartí un enfoque no-técnico sobre que es un outliers: ?Cuál es el límite entre clima cálido y clima…

10 条评论
?Cuál es el límite entre clima cálido y clima frío? - Un enfoque intuitivo al análisis de outliers en ML

2019年8月2日

?Cuál es el límite entre clima cálido y clima frío? - Un enfoque intuitivo al análisis de outliers en ML

Hagamos esta sección más filosófica. Algunos buenos matemáticos también fueron filósofos, como es el caso de Pitágoras…

1 条评论
Lanzamiento?? Libro Vivo de Ciencia de Datos ?? - 315 páginas open-source

2019年4月8日

Lanzamiento?? Libro Vivo de Ciencia de Datos ?? - 315 páginas open-source

Pueden acceder a la versión on-line, completa en: ?? LibroVivoDeCienciaDeDatos.ai ?? El Data Science Live Book (versión…
Exploratory Data Analysis using 'funModeling' in R

2018年1月24日

Exploratory Data Analysis using 'funModeling' in R

This package contains a set of functions related to exploratory data analysis, data preparation, and model performance.…

1 条评论

See all articles

NLP for SPAM detection (preview of the next post)

Pablo Casas

Co-founder @ Edvai: Aprendizaje hiper-personalizado con AI ??

First, the intuition behind ??

But what is ULMFit? ????

Why is it useful?

Next post: SPAM detection ??????♀?

A real example from the post!

Pablo Casas的更多文章

社区洞察

其他会员也浏览了

Countdown to the launch of our API: the Vulavula API release

?? ALBERT: Transforming NLP with Lightweight Innovation ??

Survey Analysis of Areas of Life Satisfaction using NLP and Logistic Regression

Curious case of credit covenants: Chapter 1A — Enter the Jargon

Your social media posts VS Controversial CEOs' - with NLP

Data Science Case Study 2: NLP Complaint Classification

Classifying Social Media Posts as Hateful using NLP

Foundational Papers in NLP: Bi-Directional Attention Flow (BIDAF) network - Seo et al 2016.

HOW TO USE TRANSFORMER FOR REAL LIFE PROBLEMS USING DIFFERENT MODELS OF TRANSFORMER

The Power of NLP with Python: Unlocking Hidden Insights from Text Data

First, the intuition behind ??

But what is ULMFit? ????

Why is it useful?

Next post: SPAM detection ??????♀?

A real example from the post!

Pablo Casas的更多文章

?Fuiste a la universidad? ?Te ense?aron grafos?

El contagio más sano, estudiar data science

Correlación y causalidad en la vida cotidiana #ML

Data Scraping + Data Science | Reverse Engineering Case study

??Algunas consideraciones para imputar valores NULL/missing en las variables numéricas #ML

?Podrá un meme ayudarnos a engancharnos con la matemática?

Machine Learning: 3 técnicas en detección de outliers

?Cuál es el límite entre clima cálido y clima frío? - Un enfoque intuitivo al análisis de outliers en ML

Lanzamiento?? Libro Vivo de Ciencia de Datos ?? - 315 páginas open-source

Exploratory Data Analysis using 'funModeling' in R

社区洞察

其他会员也浏览了

Countdown to the launch of our API: the Vulavula API release

?? ALBERT: Transforming NLP with Lightweight Innovation ??

Survey Analysis of Areas of Life Satisfaction using NLP and Logistic Regression

Curious case of credit covenants: Chapter 1A — Enter the Jargon

Your social media posts VS Controversial CEOs' - with NLP

Data Science Case Study 2: NLP Complaint Classification

Classifying Social Media Posts as Hateful using NLP

Foundational Papers in NLP: Bi-Directional Attention Flow (BIDAF) network - Seo et al 2016.

HOW TO USE TRANSFORMER FOR REAL LIFE PROBLEMS USING DIFFERENT MODELS OF TRANSFORMER

The Power of NLP with Python: Unlocking Hidden Insights from Text Data