2014_acl_goldberg
Short Description
Download 2014_acl_goldberg...
Description
Linguistic Regularities in Sparse and Explicit Word Representations CONLL 2014 Omer Levy and Yoav Goldberg.
Astronomers have been able to capture and record the moment when a massive star begins to blow itself apart.
International Politics
A distance measure between words? π π€π , π€π β’ Document clustering β’ Machine translation β’ Information retrieval β’ Question answering β’ β¦.
Space Science
Vector representation?
Vector Representation Explicit
Continuous ~ Embeddings ~ Neural
π
π’π π ππ is to πππ πππ€ as π½ππππ is to ππππ¦π Figures from (Mikolov et al.,NAACL, 2013) and (Turian et al., ACL, 2010)
Continuous vectors representation β’ Interesting properties: directional similarity
Older continuous vector representations: β’ Latent Semantic Analysis β’ Analogy between words: β’ Latent Dirichlet Allocation (Blei, Ng, Jordan, 2003) β’ etc. woman β man β queen β king
A recent result and Golberg, NIPS, 2014) king β(Levy man + woman β queen mathematically showed that the two representation are βalmostβ equivalent.
Figure from (Mikolov et al.,NAACL, 2013)
Goals of the paper β’ Analogy:
π is to π β as π is to π β
β’ With simple vector arithmetic: π β πβ = π β πβ β’ Given 3 words: π is to π β as π is to ? β’ Can we solve analogy problems with explicit representation? β’ Compare the performance of the β’ Continuous (neural) representation β’ Explicit representation Baroni et al. (ACL, 2014) showed that embeddings outperform explicit
Explicit representation β’ Term-Context vector computer
data
pinch
result
sugar
apricot
0
0
1
0
1
pineapple
0
0
1
0
1
digital
2
1
0
1
0
information
1
6
0
4
0
β’ π΄ππππππ‘: πΆππππ’π‘ππ: 0, πππ‘π: 0, ππππβ: 1, β¦ β β πππππ β100,000 β’ Sparse! β’ Pointwise Mutual Information: P ( word , context ) PMI ( word , context ) ο½ log 2 P ( word ) P ( context ) β’ Positive PMI: β’ Replace the negative PMI values with zero.
Continuous vectors representation β’ Example:
πΌπ‘πππ¦: (β7.35, 9.42, 0.88, β¦ ) β β100
β’ Continuous values and dense β’ How to create? β’ Example: Skip-gram model (Mikolov et al., arXiv preprint arXiv:1301.3781) 1 πΏ= π
π π‘=1
βπβ€πβ€π log π(π€π+π‘ |π€π‘ ) πβ 0
exp (π£π . π£π ) π π€π π€π = π exp (π£π . π£π ) β’ Problem: The denominator is of size π. β’ Hard to calculate the gradient
Formulating analogy objective β’ Given 3 words:
π is to πβ as π is to ?
β’ Cosine similarity: β’ Prediction:
π. π cos π, π = π π arg max cos π β , π β π + πβ β π
β’ If using normalized vectors: β , π β cos π β , π + cos π β , π β arg max cos π β π
β’ Alternative? β’ Instead of adding similarities, multiply them! cos π β , π cos π β , πβ arg max πβ cos π β , π
Corpora β’ MSR: β’ Google:
~8000 syntactic analogies ~19,000 syntactic and semantic analogies a
b
a*
b*
Good
Better
Rough
?
better
good
rougher
?
good
best
rough
?
β¦
β¦
β¦
β¦
β’ Learn different representations from the same corpus:
Very important!! Recent controversies over inaccurate evaluations for βGloVe wordrepresentationβ (Pennington et al, EMNLP)
Embedding vs Explicit (Round 1) 70% 60%
Accuracy
50%
40% 30%
Embedding 63%
Embedding 54%
20%
Explicit 29%
10%
Explicit 45%
0%
MSR
Google
Many analogies recovered by explicit, but many more by embedding.
Using multiplicative form β’ Given 3 words:
π is to πβ as π is to ?
β’ Cosine similarity:
π. π cos π, π = π π
β’ Additive objective: arg max cos π β , π β cos π β , π + cos π β , πβ β π
β’ Multiplicative objective:
cos π β , π cos π β , πβ arg max πβ cos π β , π
A relatively weak justification England β London + Baghdad = ? Iraq
cos πΌπππ, πΈππππππ β cos πΌπππ, πΏπππππ + cos πΌπππ, π΅ππβπππ
0.15
0.13
0.63
0.13
0.14
0.75
cos πππ π’π, πΈππππππ β cos πππ π’π, πΏπππππ + cos πππ π’π, π΅ππβπππ
Embedding vs Explicit (Round 2) 80%
70%
Accuracy
60% 50% 40%
30% 20%
Add 54%
Mul 59%
Add 63%
Mul 67%
Mul 68%
Mul 57%
Add 45%
Add 29%
10% 0% MSR
Google Embedding
MSR
Google Explicit
80%
70%
Accuracy
60% 50% 40% 30%
Embedding 59%
Explicit 57%
Embedding 67%
20% 10% 0%
MSR
Google
Explicit 68%
Summary β’ On analogies, continuous (neural) representation is not magical β’ Analogies are possible in the explicit representation β’ On analogies explicit representation can be as good as continuous (neural) representation. β’ Objective (function) matters!
Agreement between representations Objective
Both Correct
Both Wrong
Embedding Correct
Explicit Correct
MSR
43.97%
28.06%
15.12%
12.85%
Google
57.12%
22.17%
9.59%
11.12%
Comparison of objectives
Objective
Additive
Multiplicative
Representation
MSR
Google
Embedding Explicit Embedding Explicit
53.98% 29.04% 59.09% 56.83%
62.70% 45.05% 66.72% 68.24%
Recurrent Neural Network β’ Recurrent Neural Networks (Mikolov et al.,NAACL, 2013) β’ Input-output relations: π¦ π‘ = π(π½π π‘ ) π π‘ = π(πΎπ π‘ β 1 + πΌπ€(π‘)) β’ π¦ β β πππππ β’ π€ π‘ β β πππππ β’ π π‘ β βπ β’ πΌ β β|πππππ|Γπ β’ πΎ β βπΓπ β’ π½ β βπΓ|πππππ|
Continuous vectors representation β’ Example:
πΌπ‘πππ¦: (β7.35, 9.42, 0.88, β¦ ) β β100
β’ Continuous values and dense β’ How to create? β’ Example: Skip-gram model (Mikolov et al., arXiv preprint arXiv:1301.3781) 1 πΏ= π
π π‘=1
βπβ€πβ€π log π(π€π+π‘ |π€π‘ ) πβ 0
exp (π£π . π£π ) π π€π π€π = π exp (π£π . π£π ) β’ Problem: The denominator is of size π. β’ Hard to calculate the gradient
Continuous vectors representation 1 πΏ= π
π π‘=1
βπβ€πβ€π log π(π€π+π‘ |π€π‘ ) πβ 0
exp (π£π . π£π ) π π€π π€π = π exp (π£π . π£π ) β’ Problem: The denominator is of size π. β’ Hard to calculate the gradient β’ Change the objective: 1 π π€π π€π = 1 + exp (π£π . π£π ) β’ Has trivial solution! β’ Introduce artificial negative instances! πΏππ = log π(π£π . π£π ) + ππ=1 πΈπ€π βΌππ
π€
[log π(βπ£π . π£π )]
References Slides from: http://levyomer.wordpress.com/2014/04/25/linguistic-regularities-in-sparse-andexplicit-word-representations/ Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in Neural Information Processing Systems. 2013.
Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. "Linguistic Regularities in Continuous Space Word Representations." HLT-NAACL. 2013.
View more...
Comments