NLP for SPAM detection (preview of the next post)
Lately I was working on NLP, so I decided to write about it.
I'm finishing a series of two blog posts about the incredible ULMFiT from fastai.
At the same time, it will be my first post in Python ??, and I'm writing it in collaboration with Pablo Zivic.
First, the intuition behind ??
Imagine you are a lawyer, that wants to study medicine; although it is a huge change, the underlying idea is you know how to speak in English, know the semantics to create a text, and the language rules.
So when you jump into medicine, you don't have to learn from scratch that after the word "They", it comes the word "were" (not "was").
You only learn the particularities of the domain field (medicine).
But what is ULMFit? ????
ULMFit stands for Universal Language Model Fine-tuning, and its implementation is in `fastai` pythons library.
It was developed by Jeremy Howard and Sebastian Ruder.
Check the paper: https://arxiv.org/pdf/1801.06146.pdf
ULMFit contains a network that was trained on a corpus of 103MM Wikipedia articles. So it already knows how to speak "neutral".
Why is it useful?
It allows us to save time when creating an NLP project, thanks to the **transfer learning** technique, we do only need to **fine-tune** the network to our data. Let's say, it learns the domain field words.
Especially handy if we don't have lots of data.
Next post: SPAM detection ??????♀?
Based on an SMS public database, that was labeled as SPAM / NOT SPAM, we will build a classifier to spot spam messages.
To this end, we will create a language model (what we've talked ??) and a classification model. It will be created on Google Collab so you can play-&-learn (just as I did with other projects!).
A real example from the post!
Following sentences were completed randomly (based on an input string):
Did you notice that, for some sentences, the language seems to be from an SMS text? ??
That's all for now!
Co-founder @ Edvai: Aprendizaje hiper-personalizado con AI ??
5 年It's done! ??https://blog.datascienceheroes.com/spam-detection-using-fastai-ulmfit-part-1-language-model/
Founder & CCO @ deployr / Sociólogo y Machine Learning Engineer
5 年Looking forward to the full post! Seems like you got entangled in Python's web as well :p