Focus Paper: Reading tea leaves: How humans interpret topic models. Chang et al. NIPS 2009
Related Paper: Evaluation Methods for Topic Models. Wallach et al. ICML 2009
This paper explores various intrinsic methods of evaluation for topic models, focusing on LDA. All of the methods are various ways of estimating the probability of held-out test documents given the model. The held-out documents are either entirely held out, or used in a document completion setting, where only the latter half of each test document is held out. For the completely held-out setting, they compare two commonly used methods, harmonic mean sampling and annealed importance sampling, to methods that had not previously been used for topic model evaluation, Chib-style estimation and left-to-right evaluation. For the document completion setting, they compare older methods, annealed importance sampling and estimated theta, to left-to-right evaluation. The methods are compared by seeing which assigns higher probabilities to the held-out data, as well as by looking at the variance and computational complexity of each method. The Chib-style estimation and left-to-right evaluation are determined to be the best. This method for comparing evaluation methods does not seem well-justified, and the authors do not go into any detail about why they used the procedures that they did.