Because I have one meeting and one class consecutively Thursday afternoon before the meeting, I may not have enough time to cover posts submitted after 1:00pm. However, I will try my best to update and cover all the new posts in time.
For this week’s focus paper, people have read different papers regarding the interpretation and application of the topics generated by the topic models.
Daniel has read the paper “Evaluation Methods for Topic Models.” by Wallach et al. ICML 2009. The paper explores various methods for evaluating LDA. The authors conduct the experiments in two settings: held-out documents, in which entire documents are held out, and document completion, in which only the latter half of each document is held out. The methods includes harmonic mean sampling, annealing importance sampling, estimated theta, Chib-style estimation, and left-to-right evaluation. The evaluation methods are compared by checking which method assigns higher probabilities to the held-out data, as well as the variance and computational complexity. The method for comparing evaluation methods does not seem well-justified, as noted by Daniel.
Dong read the paper “Automatic Evaluation of Topic Coherence” by Newman et al NAACL 2010, which is an extension of the focus paper. The paper explores 15 different automatic measures utilizing WordNet, Wiki, Google and a bunch of external methods. Dong suggests that more error analysis could be given and how well the method generalizes over domains is not known. The comparison on the ratings are also suggested to be not entirely convincing. About the focus paper, Dong suggests that the sensitivity of the parameters might affect the strength of their results.
I myself read the paper “A Topic Model for Word Sense Disambiguation” by Jordan Boyd-Graber, David Blei, and Xiaojin Zhu in EMNLP 2007. The paper is not exactly about the evaluation of topic models, but explores as an application of the semantic influence of the topics obtained by the topic models. The model of WSD is a packed model of P(topic | Corpus) and P(sense | topic). For the second model, the author uses WordNet-Walk which is model over all the possible paths on WordNet to a specific target word. The model is elegant in the sense that its components are modular. However, it does not work very well because of the structure of WordNet. There are also issues about estimation of P(sense | topic) where the path length is not paid enough attention. The perk of the whole model is that it is modular and can be integrated into bigger models. However, how well the approximate inference would work is not known. About the focus paper, I feel the authors could have provided the inter-rater agreement, because the raters are not very reliable.
Dhananjay read the paper “Topic Evolution in a Stream of Documents” by A e Gohr et al. in SIAM: 2009 Data Mining. The paper discusses about adapting topic models over time using PLSA. A PLSA model is obtained for each time period and for each new time period the estimates from the previous time period is used for intialisation. Comparison with randomly initialized PLSA model demonstrates a improvement of 5% in perplexity, as noted by Dhananjay.
Alan read the paper “Polylingual Topic Models” by Mimno et al. in EMNLP 2009. The paper is aimed at exploring the LDA to infer topics over direct translation, capture similarities between vocabularies in different languages, and detect differences in topic emphasis between languages. The evaluation is conducted on held out data using likelihood.To evaluate the possibility of using it for adapting machine translation systems, the authors also measure the ability of the model to align documents in one language with their translations in another language, as noted by Alan.