This week we talked about discourse relations as well as the Penn Tree Bank and the Penn Discourse Tree Bank.
There was discussion on assumptions people have made about the Penn Tree Bank and the surprising variety of genres amongst the material it contains. The focus paper was a little different than most this time, with less theory and not so algorithm-heavy as papers we read before. We talked about how papers in NLP are usually one of these two natures. The paper was very much experiment-based, and we agreed that it probably took a long time to get the right perspective and relation for the dataset.
From the discussion, discourse is very much dependent on other tasks, one of which we pointed out was co-reference resolution. Kevin brought up a few points about intra-sentence relations versus inter-sentence relations and from the results of the paper, there seems to be a lot going on within sentences, which could be pretty useful for doing things such as machine translation.
Other related papers covered topics pretty close to the focus paper, Dong and Weisi read about finding the arguments to discourse connectives, Daniel talked about Rhetorical Structure Theory, which is pretty much a modified high-tag-fidelity-discourse-based subset of the Penn Tree Bank, and Dhananjay talked about genre detection using common word frequencies as style markers. We talked about how discourse is usually not a big focus for undergraduates, and we mentioned some interesting things that could be done with Wikipedia in terms of interesting research projects.