Related Paper: Tailoring Word Alignments to Syntactic Machine Translation
This paper presents an unsupervised word alignment method that aims to create alignments that are beneficial specifically to tree transducer-based syntactic machine translation models. The problem with standard IBM model 4 alignments is that alignments frequently cross constituent boundaries, which prevents rules from being extracted, leading to poorer performance. Roughly, the method uses target side trees to softly prefer alignments which do not violate constituents. The method is a modification of the HMM alignment method, which takes into account a weighted tree distance instead of string distance in the distortion model. The model is trained using plain EM. The model drastically reduces the number of alignments that cross constituencies, but only mildly improves alignment scores, generally preferring recall over precision. Unfortunately, the authors do not show the effects of the model when plugged into an end-to-end system, so it is difficult to say whether or not the method is actually beneficial.