~/snippets/SimpleNLPWithSpacy
Published on

Simple NLP with spaCy

423 words3 min read–––
Views

Quick examples for common NLP tasks using spaCy:

import spacy

# Load English language model
nlp = spacy.load("en_core_web_sm")

# Process text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Tokenization
tokens = [token.text for token in doc]
print("Tokens:", tokens)
# Output: ['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$', '1', 'billion']

# Part-of-speech tagging
pos_tags = [(token.text, token.pos_) for token in doc]
print("POS Tags:", pos_tags)
# Output: [('Apple', 'PROPN'), ('is', 'AUX'), ('looking', 'VERB'), ...]

# Named Entity Recognition
entities = [(ent.text, ent.label_) for ent in doc.ents]
print("Entities:", entities)
# Output: [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

# Dependency parsing
for token in doc:
    print(f"{token.text} --> {token.dep_} --> {token.head.text}")

# Word vectors (semantic similarity)
doc1 = nlp("I like cats")
doc2 = nlp("I love cats")
similarity = doc1.similarity(doc2)
print(f"Similarity: {similarity}")

# Sentence segmentation
text_multi = "This is the first sentence. This is another one. And a third!"
doc_multi = nlp(text_multi)
sentences = [sent.text for sent in doc_multi.sents]
print("Sentences:", sentences)

Installing spaCy and downloading the language model:

pip install spacy
python -m spacy download en_core_web_sm

spaCy provides an efficient and intuitive API for various NLP tasks. The library is designed to be production-ready and integrates well with other data science tools in Python.