Supplemental Paper: Tailoring Word Alignments to Syntactic Machine Translation
Focus Paper: Discriminative Modeling of Extraction Sets for Machine Translation
The supplemental paper I read was by the same authors as the focus paper. The purpose of the paper was to address the problems caused by word alignment errors in syntatic MT systems that extract tree transducer rules. When word alignments of a sentence pair violate the constituent structure of the target sentence, it increases the size of the minimal translational units. Units that span larger segments have poor ability to generalize and thus lead to the blocking of many rules that may be present.
An unsupervised word alignment model is presented that is an extension of both the HMM model of Ney and Vogel (1996) and of the system described by Galley et al. (2006). The innovative step that they take is to generate a parse tree for the target language, and use a syntax-sensitive distortion component that conditions on the tree. The idea is that these trees can alter the probabilities of transitions between alignment positions so that distortions which respect tree structure can be preferred. The regular HMM alignment model only uses string distance for its distortion model, where a key aspect of this paper is to now use a new kind of shortest path between two positions defined by a first degree Markov walk through the tree that consists of popping up from a leaf, moving to different branches, and pushing down to a new leaf, all done probabilistically.
Training of the model is done using general EM. The performance metric used is the standard alignment error rate (AER) metric. Evaluation is done on both French-English and Chinese-English manually aligned data sets. Their model succeeds in drastically reducing AER, meaning that the number of alignments that violate constituent structure is significantly reduced. However, the big question of whether this improves actual translation as a whole is still left in the open as there was no incorporation into a full MT system.