I read the paper -by Enrique Amigo´ Julio Gonzalo Javier Artiles Felisa Verdejo A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints
The paper evaluates various metrics for clustering. The authors list down the following required criteria for the constraints that contribute in evaluating a metric - (1) Each constraint should address a limitation of the metric. (2) For any metric, there should be an analytical way to prove whether the metric satisfies the contraint and (3) The constraint should discriminate between metric families.
Based on these criteria, the author suggest four constraints - (1) Cluster Homogeneity - a coarser cluster containing heterogeneous items have a lower score than finer clusters of homogenous items. (2) Completeness - Homogenous items should feature in the same cluster. (3) Rag Bag - Disorder in a noisy (heterogeneous cluster) is favored than in a homogenous cluster and (4) Cluster size vs. quantity - Small error in big cluster is favored to large number of small errors in small clusters.
On the basis of these constraints, the author compare four types of measures - (1) Set matching - Purity, Inverse purity (2) Counting pairs - Rand statistic, Jaccard coef, Folkes and Mallow FM, (3) Entropy based measures and (4) B-cubed
Set matching metrics fail on cluster completeness and rag bag as the bias is towards small clusters. Entropy based methods fail generally fail on rag bag. Pair counting satisfy both homogeneity and completeness, but don't address rag bag and cluster size vs. quantity. The b-cubed family of metrics satisfy all the four kinds of constraints.
In the focus paper, the distance between two clusters is represented by the sum of different clusterings for a pair of items. It may seem that only homogeneity and completeness are addressed over here. However, since none of the clusterings represent true clustering, constraints (3) and (4) probably are not required.