Related paper - Distant supervision for relation extraction without labeled data
(Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky)
The paper introduces distant supervision as a new paradigm for training classifiers. The paradigm is applied to mine relations from unlabelled data.
Distant supervision is acheived by refering to Freebase. Each relation in freebase is mined in the training data to extract textual features used to train the classifier. The features considered are
* Lexical features - (1) sequence of the words (2) POS (3) which entity came first (4) window of k words left to e1 and right to e2 along with their POS tags
* Syntactic features - (1) dependency path (2) one window node
In the preprocessing task, consequtive words with same named entity tag (and occurring consequtively in the parse tree) are 'chunked'. Negative examples are provided by using random unrelated (according to Freebase) entities. Evaluation is made using held out data from Freebase and Human evaluation (seperately). Syntactic features outperform lexical features in which the sentences that mention the relations are ambiguous.