Best Practice to Create Word Embeddings Using Word2Vec – Word2Vec Tutorial

By | July 9, 2019

Word2Vec is an open source to create word embeddings , which is very useful in nlp filed. In this tutorial, we will introduce how to create word embeddings from a text file for you.

word2vec logo

Word2Vec has two models (CBOW and SKIP), each model has two strategy to create word embeddings.

Step 1: Download Word2Vec Source Code and Complie it

You can download it from:https://code.google.com/archive/p/word2vec/

Step 2: Execute word2vec bin file

Use CBOW Model:

1) Negative Sampling Method

./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 0 -cbow 1 -iter 3

2) Huffman Tree Method

./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 0 -hs 1 -binary 0 -cbow 1 -iter 3

Use Skip-Gram Model

1) Negative Sampling Method

./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 0 -cbow 0 -iter 3

2) Huffman Tree Method

./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 0 -hs 1 -binary 0 -cbow 0 -iter 3

Parameter Explain

-train data.txt : data.txt is train file
-output vec.txt : vec.txt is file contains word embeddings
-size 200: 200 is the demension of word embeddings
-binary 0: save word embeddings to txt file or bin file

Notice: Skip + Negative Sampling is often used in real application.

Then you may get a word embeddings file like:

word embeddings example

Leave a Reply