machine learning - Word2Vec Data Setup -
in word2vec skip-gram setup follows, data setup output layer? matrix 0 everywhere single "1" in each of c rows - represents words in c context?

add describe data setup question:
meaning dataset presented nn? lets consider "what single training example like"?. assume total input matrix, each row word in vocabulary (and there column each word , each cell 0 except specific word - 1 hot encoded)? thus, single training example 1xv shown below (all zeros except specific word, value 1). aligns picture above in input v-dim. expected total input matrix have duplicated rows - same one-hot encoded vector repeated each time word found in corpus (as output or target variable different).
the output (target) more confusing me. expected mirror input -- single training example has "multi"-hot encoded vector 0 except "1" in c of cells, denoting particular word in context of input word (c = 5 if looking, example, 2 words behind , 3 words ahead of given input word instance). picture doesn't seem agree though. dont understand appears c different output layers share same w' weight matrix?
the skip-gram architecture has word embeddings output (and input). depending on precise implementation, network may therefore produce 2 embeddings per word (one embedding word input word, , 1 embedding word output word; case in basic skip-gram architecture traditional softmax function), or 1 embedding per word (this case in setup hierarchical softmax approximation full softmax, example).
you can find more information these architectures in original word2vec papers, such distributed representations of words , phrases , compositionality mikolov et al.
Comments
Post a Comment