Collective Cross-Document Relation Extraction Without Labelled Data
Limin Yao, Sebastian Riedel, and Andrew McCallum
The focus paper proposes an approach that jointly models entity type prediction and relation extraction, and also explicitly models compatibility (in this paper they focused on selectional preferences). Furthermore, information across documents is used to exploit redundancy. Freebase is used as a source for distant supervision. Their evaluation showed high performance gains, especially when tested on out-of-domain data.
The related papers were all closely related to the focus paper. There was overlap (three people) for the paper about distant supervision. Other papers were about applying constraints to semi-supervised learning, joint entity and relation extraction, and factorie, a programming language for graphical models.
Weisi, Dhananjay and Dong read “Distant supervision for relation extraction without labeled data". This paper introduces the ‘distant supervision’ paradigm, and serves as a basis for the focus paper. Freebase is used to extract training instances, and then logistic regression is used as a classifier. They used both lexical and syntactic features, and found that syntactic features outperform lexical features for ambiguous or lexically distant relations. Evaluation was done with held out data and manual evaluation. Negative examples were created by selecting random entity pairs that were not in a Freebase relation. It is not totally clear why they chose to sample 1% as negative examples, and not a different number.
Daniel read “Factorie: Probabilistic programming via imperatively deﬁned factor graphs”. Factorie is a combination of an imperative and declarative language for specifying conditional undirected graphical models. It was used in the focus paper to construct the graphical model. In their evaluation, the authors obtained a 20-25% error reduction and 3-15 times speedup over the next best system that used Markov Logic Networks.
Alan read “Coupled Semi-Supervised Learning for Information Extraction”. The paper takes a semi-supervised learning approach for both extracting entities and relations. Semi-supervised learning often suffers from low accuracy. In their approach, multiple extractors are trained together, and the resulting constraints are applied to increase the accuracy of each extractor. In their evaluation, adding additional constraints showed significantly higher precision.
Brendan read “Joint entity and relation extraction using Card-Pyramid Parsing”. A pyramid structure is placed over the chunks of a sentence; the nodes are possible relations between pairs of chunks. Their motivation and goal doesn’t seem to be very clear. For example, it isn’t clear if they wanted to create single coherent trees. In their evaluation, joint inference sometimes improved performance.
In addition, Michael read “Bi-directional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs” and Matt read “Learning 5000 relational extractors”, but there was no write up for these.