Word2Vec is an open source to create word embeddings , which is very useful in nlp filed. In this tutorial, we will introduce how to create word embeddings from a text file for you.
Word2Vec has two models (CBOW and SKIP), each model has two strategy to create word embeddings.
Step 1: Download Word2Vec Source Code and Complie it
You can download it from:https://code.google.com/archive/p/word2vec/
Step 2: Execute word2vec bin file
Use CBOW Model:
1) Negative Sampling Method
./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 0 -cbow 1 -iter 3
2) Huffman Tree Method
./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 0 -hs 1 -binary 0 -cbow 1 -iter 3
Use Skip-Gram Model
1) Negative Sampling Method
./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 0 -cbow 0 -iter 3
2) Huffman Tree Method
./word2vec -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 0 -hs 1 -binary 0 -cbow 0 -iter 3
Parameter Explain
-train data.txt : data.txt is train file -output vec.txt : vec.txt is file contains word embeddings -size 200: 200 is the demension of word embeddings -binary 0: save word embeddings to txt file or bin file
Notice: Skip + Negative Sampling is often used in real application.
Then you may get a word embeddings file like: