In order to get the part-of-speech of a word in a sentence, we can use ntlk pos_tag() function. In this tutorial, we will introduce you how to use it.
Preliminary
In order to use post_tag() in nltk, we should import it.
from nltk import word_tokenize, pos_tag
Then we can start to extract the part-of-speech of a word.
Tokenizing words in sentence
We should split a sentence to some words. Here is an tutorial:
Tokenizing or Splitting Words and Sentences From String Using NLTK – NLTK Tutorial
s ='TutorialExample.com is a programming tutorial site' wx = word_tokenize(s)
Get the part-of-speech of a word
We will use nltk pos_tag() to extract.
pos = pos_tag(wx) print(pos)
Run this code, we will get:
[('TutorialExample.com', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('programming', 'VBG'), ('tutorial', 'JJ'), ('site', 'NN')]
We will find pos is a python list, it contains some python tuples. Word and its part-of-speech is saved in it.
Notice
post_tag() can not get the part-of-speech of one word. Look at this example code:
pos = pos_tag('TutorialExample.com') print(pos)
Run this code, it will output:
[('T', 'NNP'), ('u', 'JJ'), ('t', 'NN'), ('o', 'IN'), ('r', 'NN'), ('i', 'VBP'), ('a', 'DT'), ('l', 'NN'), ('E', 'NNP'), ('x', 'VBZ'), ('a', 'DT'), ('m', 'JJ'), ('p', 'NN'), ('l', 'NN'), ('e', 'NN'), ('.', '.'), ('c', 'VB'), ('o', 'JJ'), ('m', 'NN')]
We can find pos_tag() only receive a python list, a word will be processed by a sequence.