登录查看更多内容

Natural Language Processing with Python Workshop on April 9th

Tony Ojeda

Data Science & AI Executive

发布日期: 2016年3月23日

Data Community DC and District Data Labs are hosting a Natural Language Processing with Python workshop on Saturday April 9th from 9am - 5pm. Register before March 26th for an early bird discount!

OVERVIEW

Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the natural language world - unstructured data that by its very nature has latent information that is important to humans. NLP practitioners have benefited from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python and with the Natural Language Toolkit (NLTK).

NLTK is an excellent library for machine-learning based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.

WHAT YOU WILL LEARN

In this course we will begin by exploring NLTK from the view of the corpora that it already comes with, and in this way we will get a feel for the various features and functionality that NLTK has. This will last us the first part of the course. However, most NLP practitioners want to work on their own corpora, therefore during the second half of the course we will focus on building a language aware data product from a specific corpus - a topic identification and document clustering algorithm from a web crawl of blog sites. The clustering algorithm will use a simple Lesk K-Means clustering to start, and then will improve with an LDA analysis.

COURSE OUTLINE

The following represents the one-hour modules that will make up the course.

Part One: Using NLTK

Introduction to NLTK: code + resources=magic
The counting of things: concordances, frequency distributions, tokenization
Tagging and parsing: PoS tagging, NERC, Syntactic Parsing
Classifying text: sentiment analysis, document classification

Part Two: Building an NLP Data Product

Using the NLTK API to wrap a custom corpus
Word vectors for K-Means clustering
LDA for topic analysis

Notably not mentioned: morphology, n-gram language models, search, raw text preprocessing, word sense disambiguation, pronoun resolution, language generation, machine translation, textual entailment, question and answer systems, summarization, etc.

After taking this workshop students will be able to create a Python module that wraps their own corpora and begin to leverage NLTK tools against it. They will also have an understanding of the features and functionality of NLTK, and a working knowledge of how to architect applications that use NLP. Finally, students who complete this course will have built an information extraction system that performs topic analyses on a corpora of documents.

More Info and Registration

要查看或添加评论，请登录

Tony Ojeda的更多文章

5 Reasons to Automate Manual Processes in Your Business

2018年10月3日

5 Reasons to Automate Manual Processes in Your Business

One of the most valuable applications of data science and machine learning to businesses is process automation. Every…
Applied Data Science & AI Round-Up: January 2018 Edition

2018年2月2日

Applied Data Science & AI Round-Up: January 2018 Edition

Applications in Criminal Justice, Retail, Immigration, Governance, and Cryptography For 2018, I'm starting a new…
Data Exploration with Python, Part 3

2017年3月31日

Data Exploration with Python, Part 3

This is the third post in our Data Exploration with Python series. Before reading this post, make sure to check out…

5 条评论
Data Exploration with Python, Part 2: Preparing Your Data to be Explored

2017年2月10日

Data Exploration with Python, Part 2: Preparing Your Data to be Explored

This is the second post in our Data Exploration with Python series. Before reading this post, make sure to check out…
Data Exploration with Python, Part 1: Preparing Yourself to Become a Great Explorer

2016年12月29日

Data Exploration with Python, Part 1: Preparing Yourself to Become a Great Explorer

Exploratory data analysis (EDA) is an important pillar of data science, a critical step required to complete every…
Applications Open: DDL Data Science Incubator

2016年8月8日

Applications Open: DDL Data Science Incubator

We are accepting applications for the next cohort of the DDL Data Science Incubator! The District Data Labs Incubator…
New Video Workshop: Content Optimization with Multi-Armed Bandits and Python

2016年5月3日

New Video Workshop: Content Optimization with Multi-Armed Bandits and Python

District Data Labs recently released a new online course on Content Optimization with Multi-Armed Bandits and Python…
Supervised Machine Learning with R Workshop on April 30th

2016年4月12日

Supervised Machine Learning with R Workshop on April 30th

Data Community DC and District Data Labs are hosting a Supervised Machine Learning with R workshop on Saturday April…

1 条评论
Data Visualization with R Workshop on April 2nd

2016年3月12日

Data Visualization with R Workshop on April 2nd

Data Community DC and District Data Labs are hosting a Data Visualization with R workshop on Saturday April 2nd from…
Graph Analytics with Python Workshop on March 12th

2016年2月26日

Graph Analytics with Python Workshop on March 12th

Data Community DC and District Data Labs are hosting a Graph Analytics with Python workshop on Saturday March 12th…

See all articles

Natural Language Processing with Python Workshop on April 9th

Tony Ojeda

Data Science & AI Executive

OVERVIEW

WHAT YOU WILL LEARN

COURSE OUTLINE

More Info and Registration

Tony Ojeda的更多文章

社区洞察

其他会员也浏览了

NLTK vs spaCy - Python based NLP libraries and their functions

Why Choose Python for NLP? A Comprehensive Guide

Natural Language Processing (NLP) with Python Training Course

Introduction to NLP Libraries - NLTK and spaCy

Can You Become a Certified Prompt Engineer?

Build Chatbots with Python training

Python Libraries for Generative AI in 2024

Python and the Democratization of AI: Hands-On Code Examples and Creative Project Ideas (EN-PT)

Preprocessing Documents for Natural Language Processing (NLP) in Python

A Friendly Introduction to Stemming in Natural Language Processing

OVERVIEW

WHAT YOU WILL LEARN

COURSE OUTLINE

More Info and Registration

Tony Ojeda的更多文章

5 Reasons to Automate Manual Processes in Your Business

Applied Data Science & AI Round-Up: January 2018 Edition

Data Exploration with Python, Part 3

Data Exploration with Python, Part 2: Preparing Your Data to be Explored

Data Exploration with Python, Part 1: Preparing Yourself to Become a Great Explorer

Applications Open: DDL Data Science Incubator

New Video Workshop: Content Optimization with Multi-Armed Bandits and Python

Supervised Machine Learning with R Workshop on April 30th

Data Visualization with R Workshop on April 2nd

Graph Analytics with Python Workshop on March 12th

社区洞察

其他会员也浏览了

NLTK vs spaCy - Python based NLP libraries and their functions

Why Choose Python for NLP? A Comprehensive Guide

Natural Language Processing (NLP) with Python Training Course

Introduction to NLP Libraries - NLTK and spaCy

Can You Become a Certified Prompt Engineer?

Build Chatbots with Python training

Python Libraries for Generative AI in 2024

Python and the Democratization of AI: Hands-On Code Examples and Creative Project Ideas (EN-PT)

Preprocessing Documents for Natural Language Processing (NLP) in Python

A Friendly Introduction to Stemming in Natural Language Processing