In chinese jieba tool, we have not found a list about chinese part-of-speech used in it. In this tutorial, we will introduce you how to get this list.
Look at example code:
import jieba import jieba.posseg d = jieba.posseg.dt.word_tag_tab for x,v in d.items(): if v not in all_jieba_pos: all_jieba_pos.append(v) print(all_jieba_pos)
Run this code, you will get a part-of-speech list in jieba tool.
['nz', 'n', 'nr', 'm', 'i', 'l', 'd', 's', 't', 'mq', 'j', 'a', 'r', 'b', 'f', 'nrt', 'v', 'z', 'ns', 'q', 'vn', 'c', 'nt', 'u', 'o', 'zg', 'nrfg', 'df', 'p', 'g', 'y', 'ad', 'vg', 'ng', 'x', 'ul', 'k', 'ag', 'dg', 'rr', 'rg', 'an', 'vq', 'e', 'uv', 'tg', 'mg', 'ud', 'vi', 'vd', 'uj', 'uz', 'h', 'ug', 'rz']
However, this list is not full, we will find answer in its source code.
We can find eng is not contained in list above, which means the full list is:
['nz', 'n', 'nr', 'm', 'i', 'l', 'd', 's', 't', 'mq', 'j', 'a', 'r', 'b', 'f', 'nrt', 'v', 'z', 'ns', 'q', 'vn', 'c', 'nt', 'u', 'o', 'zg', 'nrfg', 'df', 'p', 'g', 'y', 'ad', 'vg', 'ng', 'x', 'ul', 'k', 'ag', 'dg', 'rr', 'rg', 'an', 'vq', 'e', 'uv', 'tg', 'mg', 'ud', 'vi', 'vd', 'uj', 'uz', 'h', 'ug', 'rz', 'eng']