Thursday, April 14, 2011
Pre-meeting Post from Weisi Duan
I have read the paper “Discriminative Training and Maximum Entropy Models for Statistical Machine Translation” by Franz Josef Och and Hermann Ney, appeared in ACL 2002. The paper frames the machine translation source channel into the log-linear framework and thus made adding features to the model easier. More specifically, all the probabilistic components in the source channel model can be used as features and will be combined with other features that come from different information source. The training is done through generalized iterative scaling. The inference in GIS, seems to be done in MAP, such that all probabilistic mass given a source sentence is allocated to the hypothesis that is closest to a possible gold-standard sentence. The inference is done with dynamic programming with n-best list for global features. The evaluation includes 7 different measures (SER, WER, PER, mWER, BLEU, SSER, IER) and as more distinct features are added, the performance improves in terms of the measures.