Friday, March 11, 2011
Pre-meeting post from Weisi Duan
I read the paper “Meta Clustering” by Rich Caruana etc. in 2006 ICDM. The paper describes meta-clustering which is mentioned in the focus paper. It also has lots of similarities to the focus paper and it seems that focus paper has taken some from this paper, eg. the similarity metric between clusterings. The Meta clustering procedure has been decomposed into 3 steps: 1) generate the base level clusterings; 2) define the similarity metric between the base level clusterings; 3) conduct the meta-clustering on the base level clustering using some clustering methods such as agglomerative clustering. The goal is to represent a meta-clustering of the base level clusterings to the user, so the user would spend less time going through all the base level clusterings to find the best one. For 1), random feature weights are assigned for clustering and PCA are conducted to remove correlated features. For 2), the percentage of pairs of instances that are treated differently (the pair being in one cluster in one clustering, and in two different clusters in the other) in two clustering is used as the metric.The evaluation is done on several data sets with user labels to calculate compactness and accuracy as performance measure. The main lesson from experiments, as the authors suggest, is that when the correct clustering criteria is not specified in advance, searching a single, optimal compact clustering is not appropriate since correctness criteria might not correlate strongly the compactness. This also serves to be the one of main motivations of this paper and the focus paper.