Related Paper: Polylingual Topic Models, Mimno et al. EMNLP 2009Name: Alan
Focus Paper: Reading tea leaves: How humans interpret topic models. Chang et al. NIPS 2009
The related paper introduces a polylingual topic model that discovers topics aligned across multiple languages. They look at documents that are translated word-for-word via the EuroParl corpus as well as those that are not directly translated but very likely to be about similar concepts (wikipedia articles in various languages). The aim is to evaluate whether PLTM can accurately infer topics over direct translations, infer similarities between vocabularies in different languages, and detect differences in topic emphasis between languages.
PLTM is an extension of latent Dirichlet allocation (LDA) and topic assignments can be inferred using Gibbs sampling. The generalization ability of the model is based on the probability of previously unseen held-out document given posterior estimates. To evaluate the possibility of using PLTM for adapting machine translation systems, the authors also measure the ability of the model to align documents in one language with their translations in another language.