Focus paper: Collective Cross-Document Relation Extraction Without Labeled Data. Limin Yao Sebastian Riedel Andrew McCallum. EMNLP 2010
Related paper: Factorie: Probabilistic programming via imperatively deﬁned factor graphs. Andrew McCallum, Karl Schultz, and Sameer Singh. NIPS 2009.
This paper describes the system, Factorie, that underlies the work of the focus paper. Factorie is an combination imperative and declarative language for specifying conditional undirected graphical models, and it provides routines for learning the parameters of such models using MCMC methods. Models are specified by describing the variables present, the factors which are used to score the assignments to the variables and how they are shared, and a proposal function for generating a proposal distribution. This last step is optional, and generic methods such as Gibbs sampling can be chosen instead. They use a method called Sample-rank to avoid having to compute marginals or do full decoding of the input data.
To demonstrate the system, the authors use Factorie for the problem of joint segmentation and coreference of paper citations. They obtain a %20-25 percent error reduction and 3-15 times speedup over the next best system, one that uses Markov Logic Networks.