Friday, April 22, 2011
Post meeting comment Dong
We first compared PDTB (Penn Discourse TreeBank) annotation with RST (Rhetorical Structure Theory). We then discussed the Penn Treebank itself. It corpus has mostly been treated as news, and many don't realize that it contains different genres such as poetry etc. We then discussed some other datasets, such as the Brown dataset, which is a dataset from the 1960s. For the focus paper, the conclusion was that it was data driven, not too theoretical, but it was unclear what the direct applications would be. The type of research (analysis) is very different than most papers found in NLP conferences nowadays. The analysis itself is probably not that much work, but often it requires looking at the data a lot and doing many analyses before coming up with an analysis like this. We also discussed discourse in general, but it still remains somewhat vague.