
- Nathan
Collective journal for participants in the Advanced Natural Language Processing Seminar at the Language Technologies Institute, Carnegie Mellon University, in Spring 2011.
~/sw/nlp/candc/candc-1.00 % echo "I bought a house on Thursday that was red ." | bin/pos --model models/pos | bin/parser --parser models/parser --super models/super
tagging total: 0.01s usr: 0.00s sys: 0.00s
total total: 2.53s usr: 2.45s sys: 0.09s
# this file was generated by the following command(s):
# bin/parser --parser models/parser --super models/super
# this file was generated by the following command(s):
# bin/parser --parser models/parser --super models/super
1 parsed at B=0.075, K=20
1 coverage 100%
(det house_3 a_2)
(dobj on_4 Thursday_5)
(ncmod _ house_3 on_4)
(xcomp _ was_7 red_8)
(ncsubj was_7 that_6 _)
(cmod that_6 house_3 was_7)
(dobj bought_1 house_3)
(ncsubj bought_1 I_0 _)
I|PRP|NP bought|VBD|(S[dcl]\NP)/NP a|DT|NP[nb]/N house|NN|N on|IN|(NP\NP)/NP Thursday|NNP|N that|WDT|(NP\NP)/(S[dcl]\NP) was|VBD|(S[dcl]\NP)/(S[adj]\NP) red|JJ|S[adj]\NP .|.|.
1 stats 5.8693 232 269
use super = 1
beta levels = 0.075 0.03 0.01 0.005 0.001
dict cutoffs = 20 20 20 20 150
start level = 0
nwords = 10
nsentences = 1
nexceptions = 0
nfailures = 0
run out of levels = 0
nospan = 0
explode = 0
backtrack on levels = 0
nospan/explode = 0
explode/nospan = 0
nsuccess 0 0.075 1 <--
nsuccess 1 0.03 0
nsuccess 2 0.01 0
nsuccess 3 0.005 0
nsuccess 4 0.001 0
total parsing time = 0.008075 seconds
sentence speed = 123.839 sentences/second
word speed = 1238.39 words/second
For this week's reading I read the required paper and the following related paper:
Unsupervised Semantic Parsing, Hoifung Poon, Pedro Domingos, EMNLP 2009
http://aclweb.org/anthology/D/D09/D09-1001.pdf
Both have as goal mapping sentences to logical form, but their approaches and settings are very different. I will highlight some key differences between the related paper and the required paper.
Setting: Supervised versus unsupervised. Kwiatkowski et al. use as training data sentences with corresponding logical representations. Poon et al.’s approach is unsupervised.
Approach: Kwiatkowski use a top-down approach. They start with logical forms that map sentences completely. These forms are then iteratively refined with a restricted higher-order unification procedure. Poon et al. use a bottom-up approach. They start with lambda-form clusters at the atom level and then recursively build up larger clusters using two operations (merge and compose).
What they learn: Kwiatkowski et al. learn a CCG grammar (thus both syntax as well as semantics). Poon et al. only focus on semantics, and use an existing parser (Stanford parser) for the syntax.
The nice thing about Kwiatkowski et al. approach is that it's more general than previous work (can handle different languages and meaning representations), while still having comparable performance compared with less general approaches.
What I liked about Poon’s approach is the idea of clustering to account for syntactic variations of the same meaning. However, Poon et al.'s work was more difficult to evaluate, because no gold standard was available. They therefore performed a task-based evaluation (question answering) and they compared their approach with information extraction systems. Because of their evaluation setup, their performance on semantic parsing was less clear to me.