How can I use a word2vec model to train a machine learning classifier using MATLAB?
I have a very rudimentary knowledge of MATLAB, having had to use it for a few Coursera classes I attended. But given that it is a language with libraries to do matrix manipulation, I am guessing that MATLAB machine learning algorithms (both built-in and ones you would create from scratch) use matrix input and output.
Word2vec (and other embeddings) is basically just a dictionariy with words for keys and a fixed dimensional vector (300-d) for values. Assuming you are building a text classifier, something that takes in sentences (sequence of tokens) and predicts a sentiment (positive, neutral, negative), you could look these words up in the dictionary and extract their vectors, so your sentence is now a sequence of 300-d vectors, and your input is a matrix of shape (number-of-sentences, 300).
If you are asking about the mechanics of how to convert a binary word2vec vector to something usable in MATLAB, then I would recommend using gensim (Python framework), it provides a very simple API to download the word2vec model. You can then iterate through the model to write out the contents to a text file, which you can import into MATLAB. Here is some (untested) Python code that might be helpful to start with.
import gensim.downloader as api fout = open("/path/to/your/textfile.tsv", "w") w2v = api.load("word2vec-google-news-300") for word in w2v.vocab.keys(): vector = w2v[word] vector_str = ",".join(["{:.7e}".format(x) for x in vector.tolist()]) fout.write("\t".format(word, vector_str)) fout.close()
This will write out the contents of the model into a file where each line is the word followed by a TAB character, followed by a comma-separated list of 300 numbers for the word vector.