- Published on
Simple NLP with spaCy
423 words3 min read–––
Views
Quick examples for common NLP tasks using spaCy:
import spacy
# Load English language model
nlp = spacy.load("en_core_web_sm")
# Process text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
# Tokenization
tokens = [token.text for token in doc]
print("Tokens:", tokens)
# Output: ['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$', '1', 'billion']
# Part-of-speech tagging
pos_tags = [(token.text, token.pos_) for token in doc]
print("POS Tags:", pos_tags)
# Output: [('Apple', 'PROPN'), ('is', 'AUX'), ('looking', 'VERB'), ...]
# Named Entity Recognition
entities = [(ent.text, ent.label_) for ent in doc.ents]
print("Entities:", entities)
# Output: [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
# Dependency parsing
for token in doc:
print(f"{token.text} --> {token.dep_} --> {token.head.text}")
# Word vectors (semantic similarity)
doc1 = nlp("I like cats")
doc2 = nlp("I love cats")
similarity = doc1.similarity(doc2)
print(f"Similarity: {similarity}")
# Sentence segmentation
text_multi = "This is the first sentence. This is another one. And a third!"
doc_multi = nlp(text_multi)
sentences = [sent.text for sent in doc_multi.sents]
print("Sentences:", sentences)
Installing spaCy and downloading the language model:
pip install spacy
python -m spacy download en_core_web_sm
spaCy provides an efficient and intuitive API for various NLP tasks. The library is designed to be production-ready and integrates well with other data science tools in Python.