Pre-meeting (Dong Nguyen).
Related paper Automatic Evaluation of Topic Coherence, Newman et al.
Focus paper: Reading tea leaves: How humans interpret topic models, Chang et al.
I think the focus paper addressed an important topic. Topic models seem to be used a lot for visualization etc., thus it is important to be able to measure
how interpretable a topic is. I liked the way they evaluated,
although I'm wondering how sensitive their results are too the used inference methods/parameters.
My related paper essentially builds on the work by Chang et al. by proposing
automatic methods to measure topic coherence/interpretability.
They tried a bunch of external methods, using Wordnet, Wikipedia and Google
(a total of 15 different measures).
I like the range of methods they tried, but it would have been nice if they
had said some more regarding error analysis etc. Now I felt most of the paper
was about explaining all the different measures. They found that Wikipedia based method achieved very good results.
They used annotators to rate the topics on a 3-point scale. They view their upper bound as the inter-annotator agreement. It seems a bit strange that some of the methods, have an even higher score than that , and a lot of the methods are very close to the upperbound score. So I'm wondering if their comparison with the upper bound (which they use to conclude that their methods perform really well) makes sense.
In addition I wonder how well their methods work when the domains are very specific and not well covered by the Web/Wordnet/Wikipedia. It also wasn't clear to me also how they mapped terms to Wikipedia pages, which is not a trivial thing to do.