登录查看更多内容

Data Preprocessing using NLTK for NLP

Sayali Shinde

SSR Software Engineer | Globant | React Native

发布日期: 2019年3月31日

What is Python nltk package?

Natural Language Tool Kit (NLTK) is a Python library to make codes that work with natural language. It provides a user-friendly interface to datasets. The library can perform different operations such as tokenizing, stemming, classification, tagging, semantic reasoning, etc. The latest version is NLTK 3.3. It is an Open Source and free library. It is available for Windows, Mac OS, and Linux.

Tokenization : Tokenization is a process of splitting the text into smaller pieces called tokens. Words, numbers, punctuation marks, and others can be considered as a single tokens. Here split() function is used to split the text input tokens:

Stemming : Stemming is a process of reducing words to their word stem, base or root form, for example friendship:friend, books:book, etc. Here we are using main two algorithms which are Porter stemming algorithm which removes common morphological and inflexional endings from words and Lancaster stemming algorithm which is a more aggressive stemming algorithm.

Lemmatization : Lemmatization reduces the inflected words properly ensuring that the root word belongs to the language. In Lemmatization root word is called Lemma.

Part of speech tagging : Part-of-speech tagging aims to assign parts of speech to each word of a given text (such as nouns, verbs, adjectives, and others) based on its definition and its context.

CONCLUSION : In this article we described main steps included in data preprocessing like normalization, tokenization, lemmatization, part of speech tagging, etc using NLTK(Natural Language Tool Kit).

要查看或添加评论，请登录

查看全部

Data Preprocessing using NLTK for NLP

Sayali Shinde

SSR Software Engineer | Globant | React Native

更多精彩文章

社区洞察

其他会员也浏览了

Implementing Vision Transformer (ViT) in Python: A Step-by-Step Guide

Getting Started with AI, ML, and DL: A Beginner's Guide to Free Tools and Resources

NuminaMath 7B TIR: A New Era in AI-Powered Mathematical Problem-Solving

Rust vs. Python: The Battle for Deep Learning Dominance

Programming Languages For AI & ML

Develop AI Using Python: A Step-by-Step Guide

Automating Manual Data Labeling: A Python Approach

Text Classification with Hugging Face's BERT Model in Langchain

Top Programming languages for AI Development

MVVM Architecture in React Native using Redux

2024年3月7日

React.js: Introduction to the most powerful UI library

2019年4月23日

COMPARISON BETWEEN DOCKER AND VIRTUAL MACHINE

2019年4月22日

Understanding Sentiment Analysis

2019年4月22日

Activation Function’s in Deep Neural Networks

2019年4月14日

How Quantum Computers differ from classical computers?

2019年4月14日

Introduction to DevOps

2019年4月14日

Progressive Web Apps

2019年4月14日

Training a Deep Learning Model using GPU

2019年4月13日

Simple Implementation of Linear Regression in Python

2019年4月13日

社区洞察

其他会员也浏览了

Implementing Vision Transformer (ViT) in Python: A Step-by-Step Guide

Getting Started with AI, ML, and DL: A Beginner's Guide to Free Tools and Resources

NuminaMath 7B TIR: A New Era in AI-Powered Mathematical Problem-Solving

Rust vs. Python: The Battle for Deep Learning Dominance

Programming Languages For AI & ML

Develop AI Using Python: A Step-by-Step Guide

Automating Manual Data Labeling: A Python Approach

Text Classification with Hugging Face's BERT Model in Langchain

Top Programming languages for AI Development