Pre-meeting (Dong Nguyen).
Related paper: Statistical Phrase-Based Translation
Focus paper: Discriminative Modeling of Extraction Sets for Machine Translation
The related paper presents a translation model and decoder and compare different ways to build phrase translation tables. Their decoder uses a beam search algorithm. The search involves selecting a sequence of untranslated foreign words and an English phrase, and updating the hypothesis cost.
They experimented with three different methods to build phrase translation tables:
* Phrases from Word Based alignments (Giza++)
* Taking syntactic phrases into account
* Joint phrase model
They used Europarl corpus for evaluation. Most experiments were done by translating German to English. They also experimented with some additional language pairs. Some observations they made were:
* Small phrases up to three lead to high level of accuracy
* Syntactic restrictions hurt
* Heuristic based on word alignments work well.
* What works depends on language pair and size of training corpus.