Friday, April 1, 2011

Post Meeting - Alan

A good portion of the discussion was dedicated to LDA and labeled LDA, which I found useful since I initially had only a very cloudy idea of how LDA works just from the fact it has been mentioned in the past two focus papers. I'm not sure of how well it works in practice, but it seems to be a favorite in terms of people who do research concerning topic models so there is some solidity behind the method.

I guess a big concern with the supplemental paper I read was that the evaluation metrics were all pretty hand-wavy. When it comes down to it, it's hard to say anything about the success of applying new techniques when there isn't a well-specified and intuitive standard to show something significant about a given approach - visually speaking, their results seem alright, but it's hard to say anything really constructive. Also, the fact they only used a small subset of the data, and truncated conversations to 3-6 posts (when some conversations consisted of more than 200 posts) is pretty questionable. For what it's worth, the amount of data they collected is fairly complete and fairly large, which will probably be pretty useful for future research. I guess it was also interesting to see topic models used for a slightly different purpose.

In conclusion we also commented on the current popularity of Twitter based on a well-implemented API and size of data publicly available. The data is fairly raw, which makes it noisy and hard to model, but also captures human correspondence on a more genuine level than normal text. There is probably some cool stuff to be found, but the general success of twitter-focused research ventures is questionable.

-Alan

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.