登录查看更多内容

spaCy part 1

SHOAIB SHAIK

?? Published author, AI and Deep learning; fascinated with technology with a deep passion to use technology to make the world a better place. Time, space, Quantum computing are traits I am happy to be involved.

发布日期: 2025年3月20日

spacy is an interesting production-ready, deployable package that help in the processing the language processing with the help of statistics to know the relation in between words and get the base understanding.

Spacy contains different packages that help to use different languages like some below.

pt_core_web_sm model is a small Portuguese pipeline trained on web text. Large models can require a lot of disk space, for example, en_core_web_lg takes up 382 MB, while en_core_web_md needs 31 and en_core_web_sm takes only 12 MB.

let see some of the base that covers the spacy.

Tokenizer

lemmatization

NLU

When we are working on the spacy the first step is to break the text into the tokenizer to produce DOC

we can then move to the tagger, parser and entity recognizer

the entire thing is known as a language processing pipeline.

Each pipeline component has a well-defined task:

Tokenizer (tokenizer): Segment text into tokens
Tagger (tagger): Assign part-of-speech tags
DependencyParser (parser): Assign dependency labels
EntityRecognizer (ner): Detect and label named entities

First, we import the library and load the English language model:import spacy nlp = spacy.load("en_core_web_md")

doc = nlp("I own a ginger cat.") print([token.text for token in doc]) >>> ['I', 'own', 'a', 'ginger', 'cat', '.']

要查看或添加评论，请登录

SHOAIB SHAIK的更多文章

Project Nessie

2025年2月18日

Project Nessie

Project Nessie is an open-source transactional catalog for data lakes, built to provide Git-like semantics for data…
BIG QUERY- Part 1

2025年1月19日

BIG QUERY- Part 1

Simple Answer to understand Part 1 what is bigquery big query is a fully manged service that helps the user and eng to…
Responsibility for the data engineer part 2

2024年11月4日

Responsibility for the data engineer part 2

Technical Responsibilities You must understand how to build architectures that optimize performance and cost at a high…
What does the data engineering do.

2024年11月4日

What does the data engineering do.

data engineer do a set of operations aimed at creating interface and mechanisms for flow and access of the information,…

1 条评论
Big Query - GCP

2024年10月31日

Big Query - GCP

What is BigQuery? How does BigQuery work? BigQuery administration and access BigQuery best practices and cost…
Deep Learning Intro 1

2024年7月29日

Deep Learning Intro 1

Deep learning is a computer technique to extract and transform data—with use cases ranging from human speech…
Hadoop FLUME SQOOP

2024年3月5日

Hadoop FLUME SQOOP

Hadoop NameNode Metadata Discusses the components of Hadoop NameNode metadata, including fsimage and edits files, and…
SQL Basic 1

2024年2月29日

SQL Basic 1

Client/Server Architecture After the era of mainframes, the shift was towards client/server systems where a main…
LINUX Basic 1

2024年2月21日

LINUX Basic 1

The text provides an overview of various commands and utilities in Linux, including regular expressions, #grep, #find…
Python Basic Questions

2023年12月29日

Python Basic Questions

\ Parser: What is the position of the parser we use the parser to have the code translated to the byte code level…

See all articles

SHOAIB SHAIK的更多文章

Project Nessie

BIG QUERY- Part 1

Responsibility for the data engineer part 2

What does the data engineering do.

Big Query - GCP

Deep Learning Intro 1

Hadoop FLUME SQOOP

SQL Basic 1

LINUX Basic 1

Python Basic Questions

社区洞察