spaCy part 1
SHOAIB SHAIK
?? Published author, AI and Deep learning; fascinated with technology with a deep passion to use technology to make the world a better place. Time, space, Quantum computing are traits I am happy to be involved.
spacy is an interesting production-ready, deployable package that help in the processing the language processing with the help of statistics to know the relation in between words and get the base understanding.
Spacy contains different packages that help to use different languages like some below.
pt_core_web_sm model is a small Portuguese pipeline trained on web text. Large models can require a lot of disk space, for example, en_core_web_lg takes up 382 MB, while en_core_web_md needs 31 and en_core_web_sm takes only 12 MB.
let see some of the base that covers the spacy.
Tokenizer
lemmatization
NLU
When we are working on the spacy the first step is to break the text into the tokenizer to produce DOC
we can then move to the tagger, parser and entity recognizer
the entire thing is known as a language processing pipeline.
Each pipeline component has a well-defined task:
doc = nlp("I own a ginger cat.") print([token.text for token in doc]) >>> ['I', 'own', 'a', 'ginger', 'cat', '.']