Thursday, March 24, 2011
Post meeting summary by Weisi Duan
The meeting today concentrated on issues involving the evaluation of the topic models. The discussion has been mainly revolved around the focus paper, which presents tasks to demonstrate the semantics of the topics obtained from the topic models. The inter-rater agreement, if is there, would be nice, but as suggested by Noah, the statistical significance of the results is still valid. For automatic evaluation, Dong suggested the measures in related paper does not address the issue strongly. Daniel read a paper about various statistical measures for topic models and suggested the comparison does not make enough sense. Alan read a paper about applying topic models to polylingual case to help machine translation, and it is suggested that machine translation is not helped a lot by topic models so far. Dipanjan asked about current work of applying the topic models applied to structured prediction, and my reading turned out to a good fit, since the model is an integrated bayesian model for WSD. Dhananjay read a paper about the adapting the PLSA over time. Brendan asked about how exactly the perplexity on the test data is calculated, in other words how to fix the document distributions. People have pointed out various points that are confusing in focus paper (eg. Alan pointed out about showing the topic results to the raters and claiming there is no bias does not make sense) and can be improved. It would be great if the results turned out to meet people’s expectations, such as CTM performed the best in the tasks and so on. It is suggested a lot of work could be done to improve the focus paper.