Coupled Semi-Supervised Learning for Information Extraction
As a starting note, I want to comment that the focus paper was fairly easy to digest this time around. Compared to previous papers there was sufficient background given and more tangible examples to keep a sanity check of what was going on.
The supplemental paper I read this week looks at semi-supervised learning for both extracting categories (entities) and relations. Supervised learning is costly in the sense that sufficient amounts of labeled data are required, so semi-supervised learning seeks to mitigate such factors at a highly undesirably sacrifice of accuracy. The selling point here is that different information extractors may be able to tell us something about one another. So instead of learning individual information extractors on their own, we can train multiple extractors together, and apply the resulting constraints to increase the accuracy of each extractor. Intuitively, this seems like a general step in the right direction, since it seems fairly likely that some relations can help restrict the possibilities of certain other relations. Another problem this could apply to is semantic drift from bootstrap learning methods and this is mentioned briefly.
In terms of results, several algorithms are compared with themselves except with an additional coupling procedure that filters out candidates using mutual exclusion and type checking. The version of the algorithms with the additional constraints show significantly higher average precision of the promoted instances in their bootstrapping learner. Pretty cool stuff from in-house.