Thursday, January 27, 2011

Metaphor Comics

I can't resist sharing Dinosaur Comics' take on conceptual metaphor (below) and metaphor-related idioms [1], [2]:

The paper I mentioned on metaphors in sign language will be the subject of discussion for Monday's Linguistics Reading Group (which meets at 2:30 in GHC 7501). Anyone who's interested is welcome to participate.

- Nathan

Reading for 2/3/11: Haghighi and Klein, 2010

An Entity-Level Approach to Information Extraction

Author:  Aria Haghighi and Dan Klein
Venue:  ACL 2010
Leader:  Dhananjay

Request (new!):  When you post to the blog, please include:

  1. Your name (plus "leader") if you are leading the discussion
  2. Which focus paper this post relates to
  3. Whether this is the pre-meeting review or the post-meeting summary
  • Leave a comment on this post (non-anonymously) giving the details of the related paper you will read (include a URL), by Monday, January 31.
  • Post your commentary (a paragraph) as a new blog post, by Wednesday, February 2.

Summary of week 2 commentary

As the focus paper this week was a survey, everyone chose to read papers detailing one of the methods discussed in the focus paper. With one exception, the papers people chose to read either focused on metaphor detection or metaphor interpretation.

Dani, Alan, Dhananjay and I read papers on metaphor detection.
Dani's paper, Metaphor Identification Using Verb and Noun Clustering, combines a small amount of seed knowledge in the form of source-target domain mappings with word clustering in order to generalize those mappings. The clustering is done using parse information and a spectral clustering algorithm. To evaluate, they sampled randomly from the output of their system, and had human annotators judge the sampled sentences, obtaining a precision of .79.

Alan's paper, Comparing Semantic Role Labeling with Typed Dependency Parsing in Computational Metaphor Identification, focused on a slightly different task: finding patterns in text that commonly indicate the use of metaphor. The paper found that semantic role labels are slightly more useful than typed dependency arcs for extracting semantic relations from text, but overall, the paper mostly discussed the problem instead of their solution.
Dhananjay read the paper Catching Metaphors, which used a maximum entropy classifier to detect the metaphorical usage of verbs. The features used in the model were the prior belief of each verb being used metaphorically, and the type of the verb's arguments. The paper used WSJ data that they annotated. A high accuracy of 96.98% accuracy is reported, although this weakened by the fact that over 90% of the verbs that were annotated were marked as metaphorical.
I read the paper Hunting Elusive Metaphors Using Lexical Resources. This paper looked at a wider range of phenomena than other metaphor detection papers, nouns, verbs, and adjectives, but used relatively simplistic techniques. In order to discover IsA metaphors, they simply checked to see if the first noun was a hyponym of the second, using WordNet. For verb and adjective metaphors, they used a method based on computing the frequency of a noun's hyponyms occurring as arguments of the predicate. Their evaluation was not well explained, and they had unimpressive results.

Next, Matt, Michael, Dong, and Weisi read papers that looked at metaphor interpretation.
Matt and Michael read the paper

A Fluid Knowledge Representation for Understanding and Generating Creative Metaphors. This paper automatically creates a knowledge base from WordNet and uses that knowledge to find semantic links between seemingly unrelated nouns. They find facts of the forms is_ADJ:NOUN and VERBs:NOUN by looking at parsed dictionary entries. They then create a graph of nouns, with links between nouns that have very similar facts. Connected nouns in the graph are then considered to be semantically related. The paper does not address in detail the problem of using this information for the task of metaphor interpretation.
Dong and Weisi's paper,
Automatic Metaphor Interpretation as a Paraphrasing Task, addressed the problem of finding literal paraphrases of metaphorical verbs. They use a variety of methods to obtain and filter a list of possible paraphrases using WordNet similarity, likelihood given the context, and a selectional preference measure. They hand-annotated a set of sentences with a ranked list of possible verb paraphrases, and evaluated their system on 1st choice accuracy and mean reciprocal rank, getting an accuracy of .81. Weisi makes a comparison of this task to word sense disambiguation, noting that is much easier, since the problem of detecting metaphor is already taken care of.

Finally, Brendan read a paper about analogical reasoning, A Logical Approach to Reasoning by Analogy. This paper is not about metaphor, but rather about giving a precise account of reasoning by analogy. The paper gives a formal definition of reasoning by analogy, then generalizes it and discusses an implementation of it in a logical programming language. Brendan discusses how this might be useful in metaphor interpretation, and how the task of metaphor detection is not particularly interesting by itself.

Overall, the topic of metaphor in NLP seems to suffer from a lack of a good definition, and no standardization of evaluation. None of the papers present results that are comparable to any of the others, so it is hard to say conclusively what sorts of techniques are preferable.

Wednesday, January 26, 2011

Comments for week 2

I read the paper:

T. Veale and Y. Hao. 2008. A fluid knowledge representation for understanding and generating creative metaphors. In Proceedings of COLING 2008, pages 945–952, Manchester, UK.

The goal of this paper is metaphor interpretation and generation (ambitious!). Michael gives a good overview of their approach which I would breakdown into 3 steps:

1. Extract facts
2. Link facts
3. Use knowledge representation to interpret metaphors

#1 seems straightforward and they accomplish it using WordNet and the web. They also have empirical results demonstrating the quality of their facts.

#2 seems more difficult and they give some small amount of details of how they identify closely related facts using semantic relations in WordNet. #3 they mostly explain by examples linking together two seemingly unrelated nouns like "Pope" and "Don (Crime Father)". It seems to me that the algorithm can be thought of as constructing a graph using some heuristic rules and then finding a path (any path?) through the graph from A to B. Its unclear to me how this could be used to interpret a new metaphor and the authors don't seem to address this directly. This also seems to contrast with earlier cited work which is also called a slipnet and is referred to as a "probabilistic network". As far as I can tell, there is nothing stochastic about their approach. I briefly endeavored to read the earlier work (Hofstadter, 1994) but failed due to the fact that it is 80 pages.

They also don't have any way of measuring the quality of their interpretations which, in fairness, seems like a difficult task.


Related Paper - Jan 27

I read the paper :

E. Shutova, L. Sun and A. Korhonen. 2010. Metaphor Identification Using Verb and Noun Clustering. In Proceedings of COLING 2010, Beijing, China.

The paper describes a word clustering approach to metaphor identification. Their decision to use word clustering is based on hypothesis that target concepts associated with a source concept appear in similar lexico-syntactic environments, and clustering will capture this relatedness by association. The method starts with a small set of seeds of source-target domain mappings, extracts rich features from a shallow parser, and uses spectral method to perform noun and verb clustering. The resulting noun clusters are considered as target concepts in the same source domain, and the resulting verb clusters are considered as source domain lexicon.

As for the results, they were able to get some nice metaphors that represent broad semantic classes such as {swallow anger, hurl comment, spark enthusiasm, etc.} from seeds {stir excitement, throw remark, cast doubt}, which the WordNet-based approach (baseline) cannot acquire. They evaluated the methods using precision, and got 0.79 (baseline 0.44). I don't think these numbers are convincing though since they randomly sampled sentences annotated by the systems and asked five human annotators to judge, but they did not report the size of the sample (or maybe I missed it?). Also, though there is no large annotated corpus for metaphor identification, it would be nice if they had reported recall on smaller data just to get an idea of the coverage of the method.

Readings for Jan. 27

As a refresher I read,

"Comparing Semantic Role Labeling with Typed Dependency Parsing in
Computational Metaphor Identification"

First off, the focus paper was slightly different than what I expected. Not that it was a bad read, but it seemed more like a general history of solving the metaphor problem and the current state of things. A lot easier to read than normal papers, and it presents some really interesting work, namely MIDAS, which I thought presented a very smart approach to the problem. Generally speaking, I feel like aggregating a huge repository of hand-generated metaphors isn't really the most elegant solution to the problem, which I guess not only emphasizes the difficulty of the problem itself, but the question of whether there's some slick solution to "successfully" triumphing over metaphor detection and resolution.

The additional paper I read attempted to use semantic role labeling in pace of typed dependency parsing to improve CMI, which is an approach that aims to identify patterns that indicate metaphors, and not worry so much about actually identifying each metaphor. I suppose you could call it a higher-order version of metaphor detection. Unfortunately, the paper doesn't have much to report in terms of result. Much of the paper was spent asking questions and giving background rather than actually talking about the significance of the research. It turns out that semantic role labeling proves slightly more effective in extracting relationships that have more semantic importance, but it's a double edged sword in that the granularity may be too fine to prove of effective use for input into other systems.


Commentary for Jan 27th

I read the paper Hunting Elusive Metaphors Using Lexical Resources. Saisuresh Krishnakumaran and Xiaojin Zhu. 2007.

This paper focuses on just the task of metaphor identification, although they consider metaphors expressed by nouns, adjectives, and verbs, unlike much of the related work which looks at only nouns or only verbs. The authors first discuss various general challenges in the task. They cover context-sensitive metaphors, metaphors that require reference resolution to identify, and metaphors that are not identifiable using lexical semantics alone. The authors then restrict their attention to three forms of metaphor: noun1-IsA-noun2, verb-object, and adjective-noun. To determine if a given sentence is metaphorical, they parse the sentence using Klein and Manning's unlexicalized phrase-structure parser, then look for each form of metaphor separately. To find the first type, they use a the following simple heuristic: if two nouns are in an IsA relationship and the latter is not a hypernym of the former, the sentence is metaphorical. The second two types are both done using the same method. For a predicate-noun pair, a corpus is searched for every instance of that predicate, and the probability of each noun being its argument is computed. If neither the target noun nor any of its hyponyms has a high enough probability in this distribution, the sentence is marked as metaphorical.
The methods this paper uses are fairly crude, and they do not do a very good job of explaining their evaluation. They appear to use data that they annotated with WordNet in mind, which seems problematic, since they use WordNet as a resource when running on the test set. They do not report their accuracy clearly, but they appear to get F measures in the 55-65% range.

Related paper for Jan 27

This week I read a paper - Catching Metaphors (Matt Gedigian, John Bryant, Srini Narayanan, Branimir Ciric).

It tries to classify whether a particular occurrence is a literal or metaphorical using a maxent classifier. The paper restricts the problem to identifying the metaphorical nature to verbs. It works on an corpus annotated by Propnet. The feature set is the bias of the verb (percentage of occurrences of metaphor to literal), and the type of the arguments of the annotation. Based on crossvalidation, the authors claim an accuracy of 96.98%.

Comments on Davies and Russell

I read: Todd R. Davies and Stuart Russell ``A Logical Approach to Reasoning by Analogy.'' In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Milan, Italy: Morgan Kaufmann, 1987.

It was completely different than the focus paper. The focus paper was talking about NLP work in metaphors, which was very poorly defined and often just "figurative language with unusual argument types" or so. In fact, when the paper quoted Nunberg 1987, I think that was an awfully good takedown of the entire premise of the article -- aren't metaphors just another word sense? What's interesting about metaphors is that their **semantics** is derived from, or somehow implicated by, the semantics of the other non-metaphorical word senses. The metaphor recognition task of finding selectional restriction violations seems kind of contrived. How is it useful or meaningful to claim "cold" as in "cold person" is metaphorical? Maybe the other theoretical work cited has more details (like Lakoff or Gentner) but it wasn't explained.

The Davies and Russell paper focuses on defining analogical reasoning and giving it a normative account. It's from KR&R AI, no language involved. It says that analogical reasoning often takes the form of

inferring a conclusion property Q holds of target object T
because T is similar to source object S by sharing properties P

[[Note: I think "source" and "target" are standard terms in the literature. Maybe Lakoff introduced them? Lakoff predates this work and is cited.]]

P(S) ^ Q(S)

The paper points out there are analogical reasoning systems that use heuristic similarity of S and T to justify Q(S) => Q(T).

They work out a "determination rule" among predicates that I interpret as saying the properties P and Q are either correlated or inverse-correlated, but not unrelated. (Actually a deterministic correlation):

(∀x P(x) => Q(x)) v (∀x P(x) => ~Q(x))

The important property this has is non-redundancy. If you just said (P(x) => Q(x)) as background knowledge, that's not analogical reasoning, because you get the target conclusion without having to use information about the source object. Instead, you say that P determines whether or not Q is true, but don't take a stance whether it's a positive or negative implication. You then apply information about the source to derive the implication for the target.

[[They cite different work by Davies that relates this to statistical correlation and regression]]

Properties 'P' have to do with relevance, so you don't make inferences based on similarity from spurious properties. They contrast to methods based on heuristic similarity between S and T.

This is mostly the first half of the paper. I got confused when they made it more general; the determination rule is actually a second-order thing, Det(P,Q). They talk a little about an implementation within a logic programming system. The examples weren't very convincing of its usefulness.

Anyways, this seems like a reasonable starting point to me for interpretation of metaphor. Naively, I might think that the semantic implications of a metaphorical statement (with its target sense T) can be inferred by analogical reasoning from the source (non-metaphorical) sense S. Actually this seems kind of definitional for what a metaphor is. (Oh: what IS a metaphor, anyway? Why doesn't the focus paper tell us?? The Wilks definition is crap.)

But there's lots of hoops to jump through before getting to interpretation. It would probably be useful to read less formal background theory like Gentner or something to understand the problem better.

Comments week 2

This week I have read the following related paper:

E. Shutova. 2010. Automatic Metaphor Interpretation as a Paraphrasing Task. In Proceedings of NAACL 2010, Los Angeles, USA.

This paper frames metaphor interpretation as a paraphrasing task. Given a metaphorical expression, the system returns a ranked list of literal substitutions. The author only focuses on single-word metaphors expressed by a verb. First paraphrases are ranked according to their likelihood in the context. Unrelated substitutions are then removed, by only keeping terms that are a hypernym or share a hypernym with the metaphor according to WordNet. A selectional preference measure is then used to filter out metaphors and rerank the paraphrases (how they did the reranking wasn't very clear to me). They showed in their evaluation that the last step increased performance a lot. I liked their approach, because the steps they performed are intuitive and relatively simple.

They evaluate their system in two different ways. By looking at the accuracy of the returned paraphrases ranked first, and at the MRR with a cutoff at rank 5. I think their performance was pretty good, with an accuracy of 0.81 when only looking at the first returned paraphrase. However, there are often multiple suitable substitutions for a metaphor (their annotators also had to list all suitable literal paraphrases they could come up with for the particular verb). It would have been interesting not only to look at which ranking the first correct paraphrase occurred, but also for example how many of the paraphrases in the top x were actually correct (accuracy instead of MRR).

Related Paper for Jan 27

For this week, I selected the paper:

A Fluid Knowledge Representation for Understanding and Generating Creative Metaphors

Tony Veale and Yanfen Hao

The authors present a conceptual representation based on the idea of a slipnet, wherein concepts are fluidly connected using information from WordNet and the World Wide Web. Talking Points of the form is_ADJ:NOUN and VERB:NOUN are automatically extracted from parses of WordNet dictionary glosses for lexical concepts. For example, talking points for "Hamas" could include "is_political:movement" and points for "musician" could include "composes:music". Additional talking points are gathered from the Web using customized search queries to detect attributes of the form has_ADJ:facet, such as "has_magical:skill" for "Wizard". Once talking points are associated with concepts, a slipnet can be constructed by linking points that are semantically related according to WordNet. For example, we can follow the path "composes:music" (attribute of "composer") to "composes:speech", "writes:speech", and finally "writes:novel" (attribute of "author") to see that composer is semantically related to author. This approach is particularly interesting for metaphor interpretation in that it offers both a method for detecting that words are semantically similar in ways that are not directly obvious (not first-order related in WordNet) as well as a measure of the "slippage" between them (number of steps in the slipnet to relate one to the other).

Friday, January 21, 2011

Comment for readings of the second week

I have read the paper "Automatic Metaphor Interpretation as a Paraphrasing Task". For the focused paper, I think it has done a good job covering the features and resources needed. For the selected paper, as the author is the same person as focused paper, the process of metaphor processing is divided to two sub-tasks and this paper is only tackling the metaphor interpretation, in which case the instance has already been identified as a metaphor, and the system simply needs to find out the most appropriate substitute for the metaphor verb. Although the task is not exactly Word Sense Disambiguation (WSD), this set-up of task seems to be doing WSD in one of its easy cases, and therefore has made the task much easier than WSD. First of all, identifying the metaphor itself is non-trivial. Actually, if we can successfully identify the metaphors, we are already beating the most frequent sense baseline (MFS), which is already state of art. Whenever the metaphor is identified, it is basically saying that most frequent sense, which in general is not metaphorical, is pruned out from the consideration. This makes the selection of the sense among the remaining senses much easier because the distribution of word sense is usually highly skewed to the MFS, and once it is removed, the prediction is much easier. Also the instances of metaphor that are chosen demonstrate much stronger constraints of the context (in this case selection restriction), and this makes the task even easier. Finally the evaluation used is based on substitutes, not WordNet synonyms, which could be much more fine grained. Despite these, I do think the paper has done a good job formulating the problem and the high presence of the metaphor does justify the task and also the identification of the metaphor can from a different angle make the general WSD task easier.

Thursday, January 20, 2011

CCG parsers

(1) NLTK has a tiny little CCG parser. Link

(2) Curran and Clark's industrial-strength parser that Noah was talking about. Link.

Here's the parser in action... They have a web demo!

I bought a house on Thursday that was red .

Command-line is fun. I bolded the line with the type-annotated parse. They also have a separate tools that outputs semantic structures based on this.

~/sw/nlp/candc/candc-1.00 % echo "I bought a house on Thursday that was red ." | bin/pos --model models/pos | bin/parser --parser models/parser --super models/super
tagging total: 0.01s usr: 0.00s sys: 0.00s
total total: 2.53s usr: 2.45s sys: 0.09s
# this file was generated by the following command(s):
# bin/parser --parser models/parser --super models/super

# this file was generated by the following command(s):
# bin/parser --parser models/parser --super models/super

1 parsed at B=0.075, K=20
1 coverage 100%
(det house_3 a_2)
(dobj on_4 Thursday_5)
(ncmod _ house_3 on_4)
(xcomp _ was_7 red_8)
(ncsubj was_7 that_6 _)
(cmod that_6 house_3 was_7)
(dobj bought_1 house_3)
(ncsubj bought_1 I_0 _)
I|PRP|NP bought|VBD|(S[dcl]\NP)/NP a|DT|NP[nb]/N house|NN|N on|IN|(NP\NP)/NP Thursday|NNP|N that|WDT|(NP\NP)/(S[dcl]\NP) was|VBD|(S[dcl]\NP)/(S[adj]\NP) red|JJ|S[adj]\NP .|.|.

1 stats 5.8693 232 269

use super = 1
beta levels = 0.075 0.03 0.01 0.005 0.001
dict cutoffs = 20 20 20 20 150
start level = 0
nwords = 10
nsentences = 1
nexceptions = 0
nfailures = 0
run out of levels = 0
nospan = 0
explode = 0
backtrack on levels = 0
nospan/explode = 0
explode/nospan = 0
nsuccess 0 0.075 1 <--
nsuccess 1 0.03 0
nsuccess 2 0.01 0
nsuccess 3 0.005 0
nsuccess 4 0.001 0
total parsing time = 0.008075 seconds
sentence speed = 123.839 sentences/second
word speed = 1238.39 words/second

Reading for 1/27/11: Shutova, 2010

Models of Metaphor in NLP

Author:  Ekaterina Shutova
Venue:  ACL 2010
Leader:  Daniel

  • Leave a comment on this post (non-anonymously) giving the details of the related paper you will read (include a URL), by Monday, January 24.
  • Post your commentary (a paragraph) as a new blog post, by Wednesday, January 26.

Commentary for January 20

I read "Using String Kernels for Learning Semantic Parsers" (Kate, 2006) as a related paper. While much in the spirit of this week's paper, it uses beam search over CFG derivations in a meaning representation language with a string kernel SVM at each node in the parse tree to perform semantic parsing. The parsing algorithm is a modified Early Parser. In general, I found their approach very satisfying, though it certainly was not as expressive as a CCG with unification. At the same time, I liked the simplicity of this model. Further, this model held up reasonably well when noise was injected to simulate speech recognizer errors.

Summary of the first week reading on semantic parsing

I find reading the comments by different people is a great pleasure. People have looked at the problem from different perspectives. For example, people have used unsupervised methods to attack the problem, which is of less cost but is weaker in evaluation because of the lack of gold standard.

As far as I could see, the comments for the selected paper fall into the following categories:

Lexicon induction and weight initialization: Michael has read the paper Learning for Semantic Parsing with Statistical Machine Translation" (Wong and Mooney, 2006) which utilizes the IBM alignment model to generate the lexicon. The paper uses the lexicon generated by the IBM 5 model directly for semantic parsing while the Kwiatkowski paper uses only the IBM 1 to initialize the feature weights. It is suggested that IBM 5, when used, requires more expert knowledge of the task to make it work better.

Unsupervised methods: Dong has read the paper "Unsupervised Semantic Parsing, Hoifung Poon, Pedro Domingos, EMNLP 2009" which employs an unsupervised bottom up methods to obtain the semantic representation. The method has the advantage of using clustering to account for syntactic variations of the same meaning. However, because of the unsupervised nature, the method is difficult to evaluate because of the lack of gold standard. The final evaluation in comparison with information extraction methods could not seem to completely account for the performance on semantic parsing itself.

Alternative models:
Matt read the paper "Wide-coverage semantic representations from a CCG parser" and analyzed the generality between the focus paper model and the selected paper model. He suggested that examining the failure mode and coming up with new ways to split the logical form would be a good way to improve the performance.
Daniel read the paper A Generative Model for Parsing Natural Language to Meaning Representations" (Lu et al, 2008) which jointly models the sentence and semantics, similar to CCG. The drawback of the model is that it generates the trees top down and thus uses less context, and therefore need re-ranking in the end. Daniel also compared the training methods used in the focused paper model and the selected paper model, where the former utilizes an alternation of refining the lexicon and re-parsing the training data, and the latter uses EM.

Brendan read the review "Constraint-based approaches to grammar: alternatives to transformational syntax" and gave a detailed description of the mechanism of the CCG formalism through a few examples. Because there are different variations in the CCG formalism, he suggested that the one used in the focused paper might be different from the one introduced the review.

Similar problems:
Alan read the paper "Learning Context-Dependent Mappings from Sentences to Logical Form". He suggested that the focused paper looks at a more general problem which maps logical forms over multiple languages, compared to the selected paper. He also suggested that "The 2009 paper presents both a simple method of contextual analysis along with a bare-bones linear model to produce roughly 80% accuracy. Looking at the results, we also see that even examining just the most recent statement before a sentence, almost doubles the accuracy."
I have also read the paper "Learning Context-Dependent Mappings from Sentences to Logical Form". I feel the contribution of the paper is that it utilizes the hidden derivation to represent the out, and manages to learn a model over it and the input sentence. One shortcoming of the method, I feel, is that the method is heavily based on heuristically generated rules obtained from the ATIS data set, which might hinder its generalization to other domains.

For the focused paper, people have conducted comparison between the focused model and the selected model. In general, the focused model is generalized to deal with different representations of semantics and the approaches used at different components of the model could be improved, eg. the logical splitting method and so on.

Wednesday, January 19, 2011

Summary of CCG review (Week 1)

For my I read a review of Combinatory Categorical Grammar, by Mark Steedman and Jason Baldridge. CCG is a lexicon-heavy formalism (like LFG or HPSG, perhaps) yielding semantics with logical forms (like typed lambda calculus). The expression for a lexical item specifies how it wants to combine with constituents on the left or right. The resulting compositions have their own expressions which can further combine until you have an interpretation for the entire sentence. For example,

Marcel proved completeness

Marcel: NP
proved: (S\NP)/NP
completeness: NP

A/B means "combine with right, of type B, yielding A." So [proved completeness] join together to yield
[proved completeness]: S\NP

A\B means "combine with left, of type A, yielding B." So this joins with the left to get
[Marcel [proved completeness]: S

You can also associate lambda calculus expressions with these expressions (I don't know their names), to get the logical forms that we see in the focus paper. There are quite a few more details for how these composition operations work, of course, but there's a core set of principles that seem reasonable.

The review is fairly long (60 pages) and goes over a variety of linguistic phenomena that CCG can handle, and talks a lot about how it relates to other syntactic theories. According to the review, it is the best thing since sliced bread.

I was hoping this would help me understand Section 4 of the focus paper but I'm still confused. I get the impression they only use part of CCG as it's described by Steedman and Baldridge, and then they do something weird too.

Comments for week 1

Unlike the related paper I chose to read [1], the focus paper actually had a method for evaluating the quality of their semantic results -- which is a good thing. It also seems that they have restricted their domain to a fairly small logical space, geo data, as opposed to general interest WSJ articles as in [1]. Despite the narrower domain, they still allow for ambiguity in the parse to be solved with a probabilistic model, whereas the earlier work seems to be rule-based with a fixed set of rule annotations for each category in the CCG. On the focus paper, the results seem promising for this data set (Geo), they are comparable to the previous state of the art. Unfortunately, the authors don't give much insight on the failure modes for this data set and how they might be improved. It seems one issue is due to previously unseen words/usage causing the recall number to be low. Another issue that might be interesting to explore is how allowing new ways of splitting the logical expressions improves performance.

[1] Bos, J., Clark, S., Steedman, M., Curran, J. R., & Hockenmaier, J. (2004). Wide-coverage semantic representations from a CCG parser. In Proceedings of the International Conference on Computational Linguistics

Commentary for Jan 20th

I read the related paper "A Generative Model for Parsing Natural Language to Meaning Representations" (Lu et al, 2008). In this paper, the authors describe a generative model that simultaneously generates a sentence and a meaning representation. Similar to how CCG directly incorporates semantic information in the parse trees, this model describes both the form and meaning of a sentence at the same time. However, the model generates the trees top-down and is thus only able to use a very small amount of context. Due to this shortcoming, a discriminative reranker is required as a final step. Featurizing this model could potentially be interesting, as doing so would allow the model to weaken some of its many independence assumptions. This model is less interesting algorithmically than the UBL approach, as it is trained with just the Expectation Maximization algorithm. The process used in UBL of alternating between refining the lexicon and reparsing the training data is a more novel concept that could potentially be useful for other problems involving lexica or other similar pieces of information.

Commentary for week of Jan 20

I chose the related paper "Learning for Semantic Parsing with Statistical Machine Translation" (Wong and Mooney, 2006), in which the authors utilize IBM word alignment models used in statistical machine translation to generate a lexicon for training a semantic parser. Words in each English sentence in a training set are aligned to productions in the derivation of its meaning representation in a formal language using IBM model 5. This is similar to the technique used by Kwiatkowski et al. of using IBM model 1 to initialize weights of lexical features for their parser. The differences between the two are that (1) Kwiatkowski et al. only use word alignments to initialize feature weights rather than directly using the final lexicon, and (2) IBM model 1 considers only cooccurrence statistics while model 5 considers notions of word order, distortion, and fertility. Directly using a model 5 lexicon relies more strongly on a larger set of modeling assumptions made specifically with SMT in mind. While the results shown by Wong and Mooney are favorable, the authors state in their conclusions that additional gains would likely be seen by developing alignment models with assumptions better fitting the task.

Commentary for Jan. 19th, 2011

To refresh, I read the paper "Learning Context-Dependent Mappings from Sentences to Logical Form" - Reading this paper proved relatively easier than reading the assigned paper, but also helped make clear some details that I wasn't familiar with in the assigned paper. Both papers tackle excellent problems and show relatively promising results. The assigned paper attempts to tackle more abstract and higher level problems while the paper on context-dependent mapping focuses a bit more on a single, explicit problem. By "higher level" problems I refer to such goals as successfully mapping logical forms over multiple languages, as well as discerning the meanings behind words for the purposes of non-trivial semantic analysis.

As far the context-dependent problem goes, it's really nice to see fairly simple, intuitive rules that produce generally high accuracy. The 2005 paper presents both a simple method of contextual analysis along with a bare-bones linear model to produce roughly 80% accuracy. Looking at the results, we also see that even examining just the most recent statement before a sentence, almost doubles the accuracy. Alas, the marginal benefits after looking further into the past drop almost to nothing. The assigned paper is a different kind of monster. Although it took some time to understand even just what was going on, the use of high-order unification to do the lexical splitting and the CCG to reconstruct the expression seemed pretty cool (when it started to make some sense). The approach is fairly complicated, and maybe not intuitive, but yielding high accuracy over multiple languages and meanings definitely marks progress.

-Alan Zhu

Dong's comments for week 1

For this week's reading I read the required paper and the following related paper:

Unsupervised Semantic Parsing, Hoifung Poon, Pedro Domingos, EMNLP 2009

Both have as goal mapping sentences to logical form, but their approaches and settings are very different. I will highlight some key differences between the related paper and the required paper.

Setting: Supervised versus unsupervised. Kwiatkowski et al. use as training data sentences with corresponding logical representations. Poon et al.’s approach is unsupervised.

Approach: Kwiatkowski use a top-down approach. They start with logical forms that map sentences completely. These forms are then iteratively refined with a restricted higher-order unification procedure. Poon et al. use a bottom-up approach. They start with lambda-form clusters at the atom level and then recursively build up larger clusters using two operations (merge and compose).

What they learn: Kwiatkowski et al. learn a CCG grammar (thus both syntax as well as semantics). Poon et al. only focus on semantics, and use an existing parser (Stanford parser) for the syntax.

The nice thing about Kwiatkowski et al. approach is that it's more general than previous work (can handle different languages and meaning representations), while still having comparable performance compared with less general approaches.

What I liked about Poon’s approach is the idea of clustering to account for syntactic variations of the same meaning. However, Poon et al.'s work was more difficult to evaluate, because no gold standard was available. They therefore performed a task-based evaluation (question answering) and they compared their approach with information extraction systems. Because of their evaluation setup, their performance on semantic parsing was less clear to me.

Monday, January 17, 2011

Comments for the required paper and additional paper

For the required paper, the problem is to map the natural language sentences to logical forms, namely lambda calculus. The method proposed uses Combinatorial Categorical Grammar to induce the final logical forms, by summing out the possible ambiguous trees induced by the different initial lexical items aligned to the sentence. The main contribution of the paper, I feel, is the splitting categories method that induces the lexicon without manual engineered rules. Also the lexicon items can be multi-word expression, compared to prior work such as "Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars" where the lexical items are usually single word predicate. The question I have is about the type function of lambda calculus, eg. I don't understand what logical form would be mapped to the form < e, < e,t > >, < e, t > does not seem to be a truth value.

For the additional paper, I read the paper:

Luke S. Zettlemoyer, Michael Collins. Learning Context-dependent Mappings from Sentences to Logical Form. In Proceedings of the Joint Conference of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), 2009.

The idea is to map context dependent sentences to logical forms. The method proposed starts with a incomplete logical form for the target sentence as the first step, and this incomplete logical form is further combined with previous logical forms to obtain the final logical form as the second step. For the second step, the author proposes a model with a solution space of derivations, which are different configurations for deriving a final logical form. The inference is done through searching the best derivation in beam search based on the score of the derivation. The learning is done through Collins' perceptron which tunes the weight vectors corresponding to the feature vector of the derivation. I feel the most interesting idea is that they use the derivations as the output, and search for the best derivation, which would in turn generate the final logical form.

Thursday, January 13, 2011

Reading for 1/20/11: Kwiatkowski et al., 2010

Here is the first reading:

Inducing Probabilistic CCG Grammars from Logical Form with Higher-order Unification
Authors:  Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman
Venue:  EMNLP 2010
Leader:  Weisi


  • Leave a comment on this post (non-anonymously) giving the details of the related paper you will read (include a URL), by Monday, January 17.
  • Post your commentary (a paragraph) as a new blog post, by Wednesday, January 19.
  • If you haven't received a message from the 11-713 mailing list listing the schedule for leading discussions, let Tae know.

Tuesday, January 11, 2011

Welcome to 11-713

This blog will serve as a forum for discussion by participants in the Advanced NLP Seminar (11-713) in Spring 2011.  Participants in the seminar will post regular blog entries relating to seminar readings.  Details on this requirement will be spelled out in the first meeting on Thursday, January 13.