國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,應用於智慧型行動裝置之線上翻譯整合系統,Combining Outputs from On-Line Translation Systems on Mobile Devices

論文名稱 Title	應用於智慧型行動裝置之線上翻譯整合系統 Combining Outputs from On-Line Translation Systems on Mobile Devices
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	97 學年度第 2 學期 The spring semester of Academic Year 97	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	62
研究生 Author	陳逸昌 Yi-Chang Chen
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	王新民 Hsin-Min Wang
口試委員 Advisory Committee	吳宗憲, 洪志偉, 張景新 Chung-Hsien Wu; Jeih-weih Hung; Jing-Shin Chang
口試日期 Date of Exam	2009-07-25	繳交日期 Date of Submission	2009-09-08
關鍵字 Keywords	機器翻譯、整合系統 System combination, Machine translation
統計 Statistics	本論文已被瀏覽 5642 次，被下載 0 次 The thesis/dissertation has been browsed 5642 times, has been downloaded 0 times.

中文摘要
本研究提供兩個線上機器翻譯整合系統，整合三個不同的線上翻譯引擎。我們利用IWSLT07 的訓練語料進行語言模型及翻譯模型訓練。第一個翻譯整合系統，利用了選擇、替換、插入及刪除等模組，針對線上翻譯假說進行修正。第二個翻譯整合系統，利用分類器針對假說中每個字將其分類成不同的後處理，再依據分類結果進行修正。我們實際整合了Google、Yahoo、譯言堂的翻譯假說。在IWSLT07 的測試語料進行中文至英文的翻譯整合。實驗結果顯示，第一個翻譯整合系統其BLEU 分數由原本單一最佳翻譯系統的19.15% 進步到整合後結果的20.55%。第二個翻譯整合系統則由19.15% 進步到20.47%。這兩個整合系統相較於所整合的最佳線上翻譯系統分別進步了1.4% BLEU 以及1.32% BLEU 。
Abstract
In this research, we propose two different frameworks combining outputs from multiple on-line machine translation systems. We train the language model and translation model from IWSLT07 training data. The first framework consists of several modules, including selection, substitution, insertion, and deletion. In the second framework, after selection, we use a maximum entropy classifier to classify each word in the selected hypothesis according to Damerau-Levenshtein distance. According to these classification results, each word in the selected hypothesis are processed with different post-processing. We evaluate these combination frameworks on IWSLT07 task. It contains tourism-related sentences. The translation direction is from Chinese to English in our test set. Three on-line machine translation systems, Google, Yahoo, and TransWhiz are used in the investigation. The experimental results show that first combination framework improves BLEU score from 19.15% to 20.55%. The second combination framework improves BLEU from 19.15% to 20.47%. These frameworks achieves absolute improvement of 1.4% and 1.32% in BLEU score, respectively.

目次 Table of Contents
List of Tables iv List of Figures v List of Algorithms vi 誌謝 vii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.2 Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Machine Translation . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Machine Translation System Combination. . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Sentence-Level Combination . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 5 2.2.2 Phrase-Level Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.3 Word-Level Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Evaluation of Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Distance-Based Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.2 N-Gram-Based Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Maximum Entropy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 On-Line Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.1 Google Translate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.2 TransWhiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.3 Yahoo Babel Fish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 2.6 Part-Of-Speech Tagger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Sequential Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Classification-Based Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.2 Maximum Entropy Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.3 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.4 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.5 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 4 Translation Service on Mobile Devices . . . . . . . . . . . . . . . . . . . 34 4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Translation Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 Data-Transfer Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Translation-Combination Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 Translation Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Chapter 5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.1 Sequential Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.2 Classification-Based Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . 43 Chapter 6 Conclusion and Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Bibliography 48

參考文獻 References
[1] N. Ayan, J. Zheng, and W. Wang, “Improving alignments for better confusion networks for combining machine translation systems,” in Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 33–40, 2008. [2] A. Rosti, N. Ayan, B. Xiang, S. Matsoukas, R. Schwartz, and B. Dorr, “Combining outputs from multiple machine ranslation systems,” in Proceedings of NAACL HLT, pp. 228–235, 2007. [3] C. Callison-Burch and R. Flournoy, “A program for automatically selecting the best output from multiple machine translation engines,” Proceedings of the MT Summit, 2001. [4] T. Nomoto, “Multi-engine machine translation with voted language model,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics Morristown, NJ, USA, 2004. [5] F. Och, “Minimum error rate training in statistical machine translation,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 160–167, Association for Computational Linguistics Morristown, NJ, USA, 2003. [6] A. Stolcke, “Combination of machine translation systems via hypothesis selection from combined n-best lists,” in Proceedings of the Eighth Conference of the Association for Machine Translation, pp. 254–261, 2008. [7] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al., “Moses: Open source toolkit for statistical machine translation,” in ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol. 45, p. 2, 2007. [8] F. Huang and K. Papineni, “Hierarchical system combination for machine translation,”in Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP S2007), 2007. [9] A. Eisele, C. Federmann, H. Saint-Amand, M. Jellinghaus, T. Herrmann, and Y. Chen, “Using moses to integrate multiple rule-based machine translation engines into a hybrid system,” in Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, Ohio, ACL (June 2008), pp. 179–182. [10] B. Bangalore, G. Bordel, and G. Riccardi, “Computing consensus translation from multiple machine translation systems,” in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01, pp. 351–354, 2001. [11] K. Sim, W. Byrne, M. Gales, H. Sahbi, and P.Woodland, “Consensus network decoding for statistical machine translation system combination,” Proc. ICASSP, vol. 4, pp. 105–108, 2007. [12] A. Rosti, S. Matsoukas, and R. Schwartz, “Improved word-level system combination for machine translation,” in ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol. 45, p. 312, 2007. [13] J. Fiscus, “A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER),” in 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, 1997. Proceedings., pp. 347–354, 1997. [14] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul, “A study of translation edit rate with targeted human annotation,” in Proceedings of Association for Machine Translation in the Americas, pp. 223–231, 2006. [15] M. van Zaanen and H. Somers, “DEMOCRAT: Deciding between multiple outputs created by automatic translation,” in Proceedings of the 10th Machine Translation Summit, pp. 173–180, 2005. [16] E. Matusov, N. Ueffing, and H. Ney, “Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment,” Proc. of EACL, Trento, Italy, 2006. [17] X. He, M. Yang, J. Gao, P. Nguyen, and R. Moore, “Improved monolingual hypothesis alignment for machine translation system combination,” 2009. [18] C. Tillmann, S. Vogel, H. Ney, and A. Zubiaga, “A DP based search using monotone alignments in statistical translation,” in Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pp. 289–296, Association for Computational Linguistics Morristown, NJ, USA, 1997. [19] F. Damerau, “A technique for computer detection and correction of spelling errors,”1964. [20] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “BLEU: a method for automatic evaluation of machine translation,” [21] G. Doddington, “Automatic evaluation of machine translation quality using n-gram cooccurrence statistics,” in Proceedings of the second international conference on Human Language Technology Research, pp. 138–145, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2002. [22] Y. Tsuruoka, “Bidirectional inference with the easiest-first strategy for tagging sequence data,”in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 467–474, Association for Computational Linguistics Morristown, NJ, USA, 2005. [23] A. Berger, V. Della Pietra, and S. Della Pietra, “A maximum entropy approach to natural language processing,”Computational linguistics, vol. 22, no. 1, pp. 39–71, 1996. [24] T. Takezawa, E. Sumita, F. Sugaya, H. Yamamoto, and S. Yamamoto, “Toward a broadcoverage bilingual corpus for speech translation of travel conversations in the real world,” in Proc. of the Third Int. Conf. on Language Resources and Evaluation (LREC),pp. 147–152, 2002. [25] A. Stolcke, “SRILM-an extensible language modeling toolkit,” in Seventh International Conference on Spoken Language Processing, ISCA, 2002.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.142.12.240 論文開放下載的時間是校外不公開 Your IP address is 3.142.12.240 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS