machine learning - Word2Vec Data Setup -

July 15, 2013

in word2vec skip-gram setup follows, data setup output layer? matrix 0 everywhere single "1" in each of c rows - represents words in c context?

enter image description here

add describe data setup question:

meaning dataset presented nn? lets consider "what single training example like"?. assume total input matrix, each row word in vocabulary (and there column each word , each cell 0 except specific word - 1 hot encoded)? thus, single training example 1xv shown below (all zeros except specific word, value 1). aligns picture above in input v-dim. expected total input matrix have duplicated rows - same one-hot encoded vector repeated each time word found in corpus (as output or target variable different).

the output (target) more confusing me. expected mirror input -- single training example has "multi"-hot encoded vector 0 except "1" in c of cells, denoting particular word in context of input word (c = 5 if looking, example, 2 words behind , 3 words ahead of given input word instance). picture doesn't seem agree though. dont understand appears c different output layers share same w' weight matrix?

the skip-gram architecture has word embeddings output (and input). depending on precise implementation, network may therefore produce 2 embeddings per word (one embedding word input word, , 1 embedding word output word; case in basic skip-gram architecture traditional softmax function), or 1 embedding per word (this case in setup hierarchical softmax approximation full softmax, example).

you can find more information these architectures in original word2vec papers, such distributed representations of words , phrases , compositionality mikolov et al.

Search This Blog

ANgular

machine learning - Word2Vec Data Setup -

Comments

Post a Comment

Popular posts from this blog

javascript - Google App Script ContentService downloadAsFile not working -

javascript - Function overwritting -

php - Find a regex to take part of Email -