English stop words often provide meaningless to semantics, the accuracies of some machine models will be improved if you have removed these stop words.
If you want to know how many english stop words in nltk, you can read:
List All English Stop Words in NLTK – NLTK Tutorial
In this tutorial, we will introduce how to remove english stop words using nltk.
Preliminaries
# Load library from nltk.corpus import stopwords
Load english stop words
# Load english stop words stop_words = stopwords.words('english')
Create a word token list
You can tokenize a sentence to get a word token list.
A Beginner Guide to Tokenize Words and Sentences with NLTK – NLTK Tutorial
# Create a word token list tokenized_words = ['this', 'is', 'a', 'very', 'excite', 'website', 'for', 'machine', 'learning', 'beginners', '.']
Remove english stop words
# Remove stop words words = [word for word in tokenized_words if word not in stop_words] print words
The print is:
['excite', 'website', 'machine', 'learning', 'beginners', '.']