Remove English Stop Words with NLTK Step by Step – NLTK Tutorial

By | July 5, 2019

English stop words often provide meaningless to semantics, the accuracies of some machine models will be improved if you have removed these stop words.

If you want to know how many english stop words in nltk, you can read:

List All English Stop Words in NLTK – NLTK Tutorial

In this tutorial, we will introduce how to remove english stop words using nltk.

Preliminaries

# Load library
from nltk.corpus import stopwords

Load english stop words

# Load english stop words
stop_words = stopwords.words('english')

Create a word token list

You can tokenize a sentence to get a word token list.

A Beginner Guide to Tokenize Words and Sentences with NLTK – NLTK Tutorial

# Create a word token list
tokenized_words = ['this', 'is', 'a', 'very', 'excite', 'website', 'for', 'machine', 'learning', 'beginners', '.']

Remove english stop words

# Remove stop words
words = [word for word in tokenized_words if word not in stop_words]
print words

The print is:

['excite', 'website', 'machine', 'learning', 'beginners', '.']

 

Leave a Reply