I have read the paper “Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora” by Daniel Ramage et al. appeared in EMNLP 2009. This paper tackles the problem on how to make the topics to respresent some known labels, which is not addressed by standard LDA. Not aware of this paper, I have recently came up with a similar model, in which I tied the word senses to the topics, so as to discover sense indicators for different senses. I have submitted a paper on this, and now I am not sure if I am going to get a rejection... This related paper is nice in that it has conducted numerous experiments to prove the model to be useful in certain situations. However, for certain experiments, the set up is not very clear and readers have to figure out by themselves, e.g. the snippet extraction task, it might be they are training first on the same data, and then do testing for the extraction. Also, in the Tagged Web Page task, they independently select 3000 docs for cross validation, and this suggests leak of test data. Although the leak is the same for both of the two models, I feel it would be nice if they could clearly separate the training and testing data.
For the focus paper, I am not sure the way they calculate the Fleiss Kappa makes sense, because they suggest they are doing an one-vs-other for each single category, and how they are treating the agreement on the “other” category is unknown. I feel it matters here on how to calculate the probability of accident in the Fleiss Kappa. The other thing is that it would be nice if they could present a comparison against the standard LDA in the Ranking experiments, since this way we would be sure that the label information that L-LDA brought in makes a difference, because maybe the standard LDA could also achieve the same results.