Thursday, April 21, 2011

Pre-Meeting Alan

Pre-meeting Alan (leader)
Focus paper: Genre Distinctions for Discourse in the Penn Treebank
Related paper: Automatic sense prediction for implicit discourse relations in text


The related paper I read focused on automatically identifying the sense of implicit discourse relations. Here they only focus on the most general senses of comparison, contingency, temporal, and expansion.

A signficant portion of the paper is spent focusing on varous word pair features and how they can help determine information about a discourse between two text spans. They analyze prior work using word pair features and identifying their short-comings in trying to capture semantic oppositions. They then use their own features including polarity tags, verb classes, modality, context, language-model-based probabilities (WSJ-LM), etc. They use various sections of the Penn Discourse Treebank for training and testing, and run four different binary classification tasks to identify each realtion. These included Naive Bayes, Max Ent. and AdaBoost, implemented in MALLET.

As a metric for evaluation they use f-score for distinguishing a single sense versus something that is not that sense (other). The baseline is a random assignment of classes in proportion to the true distribution in the test set. The largest gain is in the Contingency prediction task, using the combination of polarity, verb information, first and last words, modality, and context.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.