Responsive image
博碩士論文 etd-0908109-093951 詳細資訊
Title page for etd-0908109-093951
論文名稱
Title
應用於智慧型行動裝置之線上翻譯整合系統
Combining Outputs from On-Line Translation Systems on Mobile Devices
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
62
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2009-07-25
繳交日期
Date of Submission
2009-09-08
關鍵字
Keywords
機器翻譯、整合系統
System combination, Machine translation
統計
Statistics
本論文已被瀏覽 5642 次,被下載 0
The thesis/dissertation has been browsed 5642 times, has been downloaded 0 times.
中文摘要
本研究提供兩個線上機器翻譯整合系統,整合三個不同的線上翻譯引擎。我們利用IWSLT07 的訓練語料進行語言模型及翻譯模型訓練。第一個翻譯整合系統,利用了選擇、替換、插入及刪除等模組,針對線上翻譯假說進行修正。第二個翻譯整合系統,利用分類器針對假說中每個字將其分類成不同的後處理,再依據分類結果進行修正。我們實際整合了Google、Yahoo、譯言堂的翻譯假說。在IWSLT07 的測試語料進行中文至英文的翻譯整合。實驗結果顯示,第一個翻譯整合系統其BLEU 分數由原本單一最佳翻譯系統的19.15% 進步到整合後結果的20.55%。第二個翻譯整合系統則由19.15% 進步到20.47%。這兩個整合系統相較於所整合的最佳線上翻譯系統分別進步了1.4% BLEU 以及1.32% BLEU 。
Abstract
In this research, we propose two different frameworks combining outputs from multiple on-line machine translation systems. We train the language model and translation model from IWSLT07 training data. The first framework consists of several modules, including selection, substitution, insertion, and deletion. In the second framework, after selection, we use a maximum entropy classifier to classify each word in the selected hypothesis according to Damerau-Levenshtein distance. According to these classification results, each word in the selected hypothesis are processed with different post-processing. We evaluate these combination frameworks on IWSLT07 task. It contains tourism-related sentences. The translation direction is from Chinese to English in our test set. Three on-line machine translation systems, Google, Yahoo, and TransWhiz are used in the investigation. The experimental results show that first combination framework improves BLEU score from 19.15% to 20.55%. The second combination framework improves BLEU from 19.15% to 20.47%. These frameworks achieves absolute improvement of 1.4% and 1.32% in BLEU score, respectively.
目次 Table of Contents
List of Tables iv
List of Figures v
List of Algorithms vi
誌謝 vii
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.2 Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Machine Translation . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Machine Translation System Combination. . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Sentence-Level Combination . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Phrase-Level Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.3 Word-Level Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Evaluation of Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Distance-Based Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 N-Gram-Based Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Maximum Entropy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 On-Line Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.1 Google Translate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.2 TransWhiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.3 Yahoo Babel Fish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
2.6 Part-Of-Speech Tagger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Sequential Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.2 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Classification-Based Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Maximum Entropy Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.4 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.5 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 4 Translation Service on Mobile Devices . . . . . . . . . . . . . . . . . . . 34
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Translation Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.1 Data-Transfer Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Translation-Combination Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Translation Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.1 Sequential Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 Classification-Based Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . 43
Chapter 6 Conclusion and Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Bibliography 48
參考文獻 References
[1] N. Ayan, J. Zheng, and W. Wang, “Improving alignments for better confusion networks for combining machine translation systems,” in Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 33–40, 2008.
[2] A. Rosti, N. Ayan, B. Xiang, S. Matsoukas, R. Schwartz, and B. Dorr, “Combining outputs from multiple machine ranslation systems,” in Proceedings of NAACL HLT, pp. 228–235, 2007.
[3] C. Callison-Burch and R. Flournoy, “A program for automatically selecting the best output from multiple machine translation engines,” Proceedings of the MT Summit, 2001.
[4] T. Nomoto, “Multi-engine machine translation with voted language model,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics Morristown, NJ, USA, 2004.
[5] F. Och, “Minimum error rate training in statistical machine translation,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 160–167, Association for Computational Linguistics Morristown, NJ, USA, 2003.
[6] A. Stolcke, “Combination of machine translation systems via hypothesis selection from combined n-best lists,” in Proceedings of the Eighth Conference of the Association for Machine Translation, pp. 254–261, 2008.
[7] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al., “Moses: Open source toolkit for statistical machine translation,” in ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol. 45, p. 2, 2007.
[8] F. Huang and K. Papineni, “Hierarchical system combination for machine translation,”in Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP S2007), 2007.
[9] A. Eisele, C. Federmann, H. Saint-Amand, M. Jellinghaus, T. Herrmann, and Y. Chen, “Using moses to integrate multiple rule-based machine translation engines into a hybrid system,” in Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, Ohio, ACL (June 2008), pp. 179–182.
[10] B. Bangalore, G. Bordel, and G. Riccardi, “Computing consensus translation from multiple machine translation systems,” in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01, pp. 351–354, 2001.
[11] K. Sim, W. Byrne, M. Gales, H. Sahbi, and P.Woodland, “Consensus network decoding for statistical machine translation system combination,” Proc. ICASSP, vol. 4, pp. 105–108, 2007.
[12] A. Rosti, S. Matsoukas, and R. Schwartz, “Improved word-level system combination for machine translation,” in ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol. 45, p. 312, 2007.
[13] J. Fiscus, “A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER),” in 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, 1997. Proceedings., pp. 347–354, 1997.
[14] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul, “A study of translation edit rate with targeted human annotation,” in Proceedings of Association for Machine Translation in the Americas, pp. 223–231, 2006.
[15] M. van Zaanen and H. Somers, “DEMOCRAT: Deciding between multiple outputs created by automatic translation,” in Proceedings of the 10th Machine Translation Summit, pp. 173–180, 2005.
[16] E. Matusov, N. Ueffing, and H. Ney, “Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment,” Proc. of EACL, Trento, Italy, 2006.
[17] X. He, M. Yang, J. Gao, P. Nguyen, and R. Moore, “Improved monolingual hypothesis alignment for machine translation system combination,” 2009.
[18] C. Tillmann, S. Vogel, H. Ney, and A. Zubiaga, “A DP based search using monotone alignments in statistical translation,” in Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pp. 289–296, Association for Computational Linguistics Morristown, NJ, USA, 1997.
[19] F. Damerau, “A technique for computer detection and correction of spelling errors,”1964.
[20] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “BLEU: a method for automatic evaluation of machine translation,”
[21] G. Doddington, “Automatic evaluation of machine translation quality using n-gram cooccurrence statistics,” in Proceedings of the second international conference on Human Language Technology Research, pp. 138–145, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2002.
[22] Y. Tsuruoka, “Bidirectional inference with the easiest-first strategy for tagging sequence data,”in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 467–474, Association for Computational Linguistics Morristown, NJ, USA, 2005.
[23] A. Berger, V. Della Pietra, and S. Della Pietra, “A maximum entropy approach to natural language processing,”Computational linguistics, vol. 22, no. 1, pp. 39–71, 1996.
[24] T. Takezawa, E. Sumita, F. Sugaya, H. Yamamoto, and S. Yamamoto, “Toward a broadcoverage bilingual corpus for speech translation of travel conversations in the real world,” in Proc. of the Third Int. Conf. on Language Resources and Evaluation (LREC),pp. 147–152, 2002.
[25] A. Stolcke, “SRILM-an extensible language modeling toolkit,” in Seventh International Conference on Spoken Language Processing, ISCA, 2002.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.142.12.240
論文開放下載的時間是 校外不公開

Your IP address is 3.142.12.240
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code