Wednesday, March 2, 2011
Pre-meeting Post from Weisi Duan
I have read the paper “A Sequential Model for Multi-Class Classification” by Yair Even-Zohar and Dan Roth. The paper disusses a prototype model for coarse-to-grain learning, but in the domain of multi-class classification, instead of structured prediction. The model is a pipelined model with a sequence of classifiers covering different feature space and output space, and the output space of the previous classifier is pruned by thresholding to generate the output space of the next classifier. This seems to suggest an explosion of classifiers if not well engineered. During training, the classifiers are trained based the pruned output space of the previous classifier and instances relevant to the pruned output space. The authors provide a proof on bigger output space induce more error and smaller output space reduce the training error. The idea gives the feeling that it basically training a set of independent classifier at each level, and does not take advantage of the overlap of the confusion lists on certain labels, which could result in better estimation for weights of the features f(x, overlapped_label). For example, in WSD, we engineer the features to reflect only semantic correlation, and not bond to the target words, in which case we would have one single classifier instead of multiple classifiers, with one for each target word respectively, and the features would be better estimated because of removal of the partition of training examples enforced by the target words. For the focus paper, I am curious about the methodology they used to generated the senses except for time constraints.