spaCy part 1

spaCy part 1


spacy is an interesting production-ready, deployable package that help in the processing the language processing with the help of statistics to know the relation in between words and get the base understanding.


Spacy contains different packages that help to use different languages like some below.

pt_core_web_sm model is a small Portuguese pipeline trained on web text. Large models can require a lot of disk space, for example, en_core_web_lg takes up 382 MB, while en_core_web_md needs 31 and en_core_web_sm takes only 12 MB.


let see some of the base that covers the spacy.

Tokenizer

lemmatization

NLU


When we are working on the spacy the first step is to break the text into the tokenizer to produce DOC

we can then move to the tagger, parser and entity recognizer


the entire thing is known as a language processing pipeline.

Each pipeline component has a well-defined task:

  • Tokenizer (tokenizer): Segment text into tokens
  • Tagger (tagger): Assign part-of-speech tags
  • DependencyParser (parser): Assign dependency labels
  • EntityRecognizer (ner): Detect and label named entities



  • First, we import the library and load the English language model:import spacy nlp = spacy.load("en_core_web_md")

doc = nlp("I own a ginger cat.") print([token.text for token in doc]) >>> ['I', 'own', 'a', 'ginger', 'cat', '.']

要查看或添加评论,请登录

SHOAIB SHAIK的更多文章

  • Project Nessie

    Project Nessie

    Project Nessie is an open-source transactional catalog for data lakes, built to provide Git-like semantics for data…

  • BIG QUERY- Part 1

    BIG QUERY- Part 1

    Simple Answer to understand Part 1 what is bigquery big query is a fully manged service that helps the user and eng to…

  • Responsibility for the data engineer part 2

    Responsibility for the data engineer part 2

    Technical Responsibilities You must understand how to build architectures that optimize performance and cost at a high…

  • What does the data engineering do.

    What does the data engineering do.

    data engineer do a set of operations aimed at creating interface and mechanisms for flow and access of the information,…

    1 条评论
  • Big Query - GCP

    Big Query - GCP

    What is BigQuery? How does BigQuery work? BigQuery administration and access BigQuery best practices and cost…

  • Deep Learning Intro 1

    Deep Learning Intro 1

    Deep learning is a computer technique to extract and transform data—with use cases ranging from human speech…

  • Hadoop FLUME SQOOP

    Hadoop FLUME SQOOP

    Hadoop NameNode Metadata Discusses the components of Hadoop NameNode metadata, including fsimage and edits files, and…

  • SQL Basic 1

    SQL Basic 1

    Client/Server Architecture After the era of mainframes, the shift was towards client/server systems where a main…

  • LINUX Basic 1

    LINUX Basic 1

    The text provides an overview of various commands and utilities in Linux, including regular expressions, #grep, #find…

  • Python Basic Questions

    Python Basic Questions

    \ Parser: What is the position of the parser we use the parser to have the code translated to the byte code level…

社区洞察