I read Joint Unsupervised Coreference Resolution with Markov Logic by Poon and Domigos (EMNLP'08). The paper claims that unsupervised approaches though attractive due to abundance of unlabeled training data, are not explored as they are more difficult. The method uses Markov logic to do a joint inference. The head word is determined by using Stanford parser rules. This gives a better precision than just choosing the right most word as the head word. Two mentions are clustered together if they have the same head word. Since this doesn't work for pronominal entities, predicates for gender, entity type and number are checked. In addition apposition and predicate nominals are also incorporated using predicates. I didn't digest the inference of this network.
The results show a 7% increase in the F-measure from the baseline H&K system. However, in the systems where the determination of the head word is similar (choosing the rightmost word), the precision decreases (although we have a good recall).
Since they cluster mentions with the same headword, without using any feature such as distance, some mentions whose head words are common nouns may be incorrectly clustered.