Pre-meeting (Dong Nguyen).
Related paper: Labeled LDA: A supervised topic model for credit attribution in multi-label corpora
Focus paper: Characterizing Microblogs with Topic Models
The related paper introduced Labeled LDA, which is used as the main method in the focus paper. Labeled LDA defines a one-to-one correspondence between LDA's latent topics and user tags. In comparison with previous models such as Supervised LDA, this model allows documents to be associated with multiple labels. The inference is similar to the standard LDA model, except the topics of a particular document are restricted to the topic set that is associated with the labels of that document. I liked the way they evaluated the model, they evaluated it in a range of tasks: topic visualization, snippet extraction and multilabel text classification (compared with strong baseline: one vs- rest SVM).
I'm not sure what to think of the focus paper. Much of the paper builds on the four dimensions: substance, style, status and social. Although they seem to make sense, these were identified by interviewing a small and not representative group. Also the way they identified the labeled dimensions in Twitter and the mapping to these dimensions seem somewhat ad hoc, so I'm not sure what to think of their Twitter characterization. Because for the ranking experiments, they don't use the 4S dimensions but only the topic distribution of the tweet, it would have been nice if they had also compared with standard LDA.