Steps to Basic NLP Preprocessing
Nlp is natural language processing which is used to understand & translate the human language for further process. It is simply a conversion of text into number.
1.Sentence Segmentation: It breaks paragraph into separate sentences. eg 'I Love my Country. It has lots of religions and festivals, India is among fastest developing country'. Now, break this into diff sentences.
i) I Love my Country
ii) It has lots of religions and festivals
iii) India is among fastest developing country
2. Tokenization: It breaks sentences into diff words eg. 'I', 'Love', 'my', 'country'
3. Stemming: It is used to normalize word into its root form. eg 'playing', 'played', 'players' all this has root word 'play'.
4. Stopwords: It helps to remove all less important words like is, i, want, that, then,etc.
5. Apply Techniques to convert text into numbers. their are diff techniques
i) Bag of words ii) Tf idf Vectorizer iii) word 2 vec iv) Globe model
6. After performing all above steps data is converted into numbers. it is ready to apply Model on it.