The focus paper for this week, Characterizing Microblogs with Topic Models, uses a variation of LDA, Labeled LDA, to analyze the topics present in Twitter, and then uses these for the tasks of rating tweets and recommending users to follow. They construct several datasets for this purpose: a small corpus of tweets labeled with topics and a set of tweets rated by users that received them. Overall, the features extracted from their topic model significantly improve performance on both tasks.
Only three different related papers were read this week: a paper about modeling conversations in twitter, a paper about Labeled LDA, and a paper about analyzing twitter discourse.
Dhananjay and Alan read "Unsupervised Modeling of Twitter Conversations", Alan Ritter, Colin Cherry, Bill Dolan, NAACL 2010. This paper discusses unsupervised methods for analyzing dialog acts in series of tweets. They present three methods, the EM Conversation model, the Conversation+Topic model, and the Bayesian Conversation model, where the Conversation+Topic model was their principal contribution. They perform several types of evaluation: qualitative analysis of the output of the system, held-out likelihood, and a new task, conversation ordering, which consists of reordering a scrambled conversation.
Weisi and Dong read "Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora", Daniel Ramage et al, EMNLP 2009. This paper presents the modified form of LDA that is used in the focus paper. This model allows some documents to be specified with labels, which constrains the latent topics that are allowed to be associated with that document. They evaluate the model on many different tasks, comparing to one-vs-rest SVM, although Weisi notes that comparing to vanilla LDA would have been good in order to understand the quantitative effects of adding the labels.
I read Beyond Microblogging: "Conversation and Collaboration via Twitter", Honeycutt, C. and Herring, S, HICSS 2009. This paper discusses and analyzes dialog in Twitter, particularly relating to the use of the @ symbol. The authors sampled 37k tweets over the course of the day, and hand-annotated a sample of these in order to answer questions about the presence and function of dialog in twitter, and the relation of the @ symbol to dialog. They found that dialogs are fairly prevalent on Twitter, and tended to be medium-length between two people. Further the use of the @ symbol was strongly related to discourse acts involving interacting with other people.