國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,應用於智慧型行動裝置之特定領域機器翻譯系統,Rule-based Machine Translation in Limited Domain for PDAs

論文名稱 Title	應用於智慧型行動裝置之特定領域機器翻譯系統 Rule-based Machine Translation in Limited Domain for PDAs
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	98 學年度第 1 學期 The fall semester of Academic Year 98	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	65
研究生 Author	江欣倩 Shin-Chian Chiang
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	李宗南 Chung-Nan Lee
口試委員 Advisory Committee	范俊逸, 江明朝 Chun-I Fan; Ming-Chao Chiang
口試日期 Date of Exam	2009-08-17	繳交日期 Date of Submission	2009-09-10
關鍵字 Keywords	詞彙化樹狀自動機文法、同步樹鄰接文法、機器翻譯、樹鄰接文法、規則式機器翻譯 Machine translation, Tree adjoining grammar, Rule-based machine translation, Lexicalized tree automata-based grammar, Synchronous lexicalized tree adjoining grammar
統計 Statistics	本論文已被瀏覽 5617 次，被下載 0 次 The thesis/dissertation has been browsed 5617 times, has been downloaded 0 times.

中文摘要
本研究是實做智慧行動裝置的機器翻譯系統，機器翻譯採用規則式方法來完成。規則式翻譯系統有三個主要模組：分析、轉換和生成。所使用的文法規則有詞彙化樹狀自動機文法及同步樹鄰接文法。詞彙化樹狀自動機文法使用在分析模組中；同步樹鄰接文法使用在轉換與生成。分析模組用到的剖析器是由現存剖析器改寫而成，而同步樹鄰接文法剖析器是用來比對原始語言剖析樹與同步樹鄰接文法的原始端，比對成功者代表可以轉換成同步樹鄰接文法的目標語言端，再由目標語言端合成出可能的目標語言樹，這些合成出的目標語言樹會經由語言模型及規則機率評分，以分數最高者為輸出。在合成過程中，為避免假說過多，會刪去低於門檻值的假說。整體而論，我們的系統和其他規則式系統不同之處在於：可自動擷取文法規則、使用了極具彈性的規則型態。同步樹鄰接文法剖析器即是為此具彈性的規則所設計。在實驗中，我們以中英旅遊語料為訓練語料，產生出所需要的規則，再以旅遊領域語料做測試集，可以得到17% BLEU值。
Abstract
In this thesis, we implement a rule-based machine ranslation (MT) system for Personal Digital Assistants (PDAs). Rule-based MT system has three modules in general: analysis, transfer and generation. Grammars used in our system are lexicalized tree automata-based grammar (LTA) and synchronous lexicalized tree adjoining grammar (SLTAG). LTA is used for analysis, and SLTAG is used for transfer and generation. We adjust developed parser to PDAs as a parser in the analysis module. The SLTAG parser in the transfer module would search possible source side of SLTAG in source parse tree. Then, growing target parse tree and scoring each hypothesis is based on language model and rule probability. To avoid too much estimation, generation step would prune some hypotheses under threshold. Compared with other rule-based MT systems, we can build rules automatically and design a flexible rule type. SLTAG parser is coded specially for the rule type. In experiments, Chinese-English BTEC is our training and test data. We can get 17% BLEU score for the test data.

目次 Table of Contents
List of Tables iv List of Figures v Acknowledgments vii Chapter 1 Introduction 1 1.1 ackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Related Works 5 2.1 Machine Translation for Handheld Devices . . . . . . . . . . . . . . . . . . 5 2.1.1 Rule-based MT on Handheld Devices . . . . . . . . . . . . . . . . . 6 2.1.2 Statistical MT on Handheld Devices . . . . . . . . . . . . . . . . . . 7 2.1.3 MT through Internet on Handheld Devices . . . . . . . . . . . . . . 8 2.2 Stanford Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Tree Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Basic Decoding Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Translation Quality Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5.1 Bilingual Evaluation Understudy . . . . . . . . . . . . . . . . . . . . 12 2.5.2 NIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 3 Fundamental Theory 14 3.1 Lexicalized Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Lexicalized Tree Automata-based Grammars . . . . . . . . . . . . . . . . . 15 3.3 Tree Adjoining Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Synchronous Lexicalized Tree Adjoining Grammar . . . . . . . . . . . . . . 21 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 4 Rule-based Machine Translation System 24 4.1 Rule Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 Corpus Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.2 Rule Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.3 Tree-to-Tree Alignment . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.4 SLTAG Rule Extraction . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.5 Compact Data Structure . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Translation Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.1 LTA Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2.2 SLTAG Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 5 Experiments 43 5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3.1 The Results of Analysis . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3.2 The Results of Transfer and Generation . . . . . . . . . . . . . . . . 46 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Chapter 6 Conclusion and Future Work 47 Bibliography 48 Appendix A 52

參考文獻 References
[1] A. Newell, “Production systems: Models of control structures,” 1973. [2] P. Brown, V. Della Pietra, S. Della Pietra, and R. Mercer, “The mathematics of statistical machine translation: Parameter estimation,” Computational linguistics, vol. 19, no. 2, pp. 263–311, 1993. [3] K. Yamada and K. Knight, “A decoder for syntax-based statistical MT,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 303–310, Association for Computational Linguistics Morristown, NJ, USA, 2001. [4] R. Hsiao, A. Venugopal, T. K‥ohler, Y. Zhang, P. Charoenpornsawat, A. Zollmann, S. Vogel, A. Black, T. Schultz, and A. Waibel, “Optimizing components for handheld twoway speech translation for an English-Iraqi Arabic system,” in Ninth International Conference on Spoken Language Processing, ISCA, 2006. [5] Y. Zhang and S. Vogel, “PanDoRA: A Large-scale Two-way Statistical Machine Translation System for Hand-held Devices,” the Proceedings of MT Summit XI, pp. 10–14. [6] B. Zhou, S. Chen, Y. Gao, I. Center, and Y. Heights, “Folsom: A fast and memoryefficient phrase-based approach to statistical machine translation,” in IEEE Spoken Language Technology Workshop, 2006, pp. 226–229, 2006. [7] Y. Gao, L. Gu, B. Zhou, R. Sarikaya, M. Afify, H. Kuo, W. Zhu, Y. Deng, C. Prosser, W. Zhang, et al., “IBM MASTOR SYSTEM: Multilingual automatic speech-to-speech translator,” 2006. [8] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al., “Moses: Open source toolkit for statistical machine translation,” in ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol. 45, p. 2, 2007. [9] M. Zhang, H. Jiang, A. Aw, H. Li, C. Tan, and S. Li, “A tree sequence alignment-based tree-to-tree translation model,” Proc. ACL-08: HLT, 2008. [10] R. Isotani, K. Yamabana, S. Ando, K. Hanazawa, S. Ishikawa, T. Emori, K. Iso, H. Hattori, A. Okumura, and T. Watanabe, “An automatic speech translation system on PDAs for travel conversation,” in Fourth IEEE International Conference on Multimodal Interfaces, 2002. Proceedings, pp. 211–216, 2002. [11] M. Paul, H. Okuma, H. Yamamoto, E. Sumita, S. Matsuda, T. Shimizu, and S. Nakamura, “Multilingual Mobile-Phone Translation Services for World Travelers,” [12] D. Klein and C. Manning, “Fast exact inference with a factored model for natural language parsing,” in Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference, p. 3, MIT Press, 2003. [13] M. Zhang, H. Jiang, A. Aw, J. Sun, S. Li, and C. Tan, “A tree-to-tree alignment-based model for statistical machine translation,” in Machine Translation Summit XI, 2007. Proceedings, pp. 935–542, Association for Computational Linguistics Morristown, NJ, USA, 2007. [14] K. Matsui, Y. Wakita, T. Konuma, K. Mizutani, M. Endo, and M. Murata, “An experimental multilingual speech translation system,” in Proceedings of the 2001 workshop on Perceptive user interfaces, pp. 1–4, ACM New York, NY, USA, 2001. [15] K. Mizutani, T. Konuma, M. Endo, T. Nambu, and Y. Wakita, “Evaluation of a speech translation system for travel conversation installed in PDA,” in First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004, pp. 465–470, 2004. [16] V. Lombardo and L. Lesmo, “An Earley-type recognizer for dependency grammar,” in Proceedings of the 16th conference on Computational linguistics-Volume 2, pp. 723– 728, Association for Computational Linguistics Morristown, NJ, USA, 1996. [17] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “BLEU: a Method for Automatic Evaluation of Machine Translation,” [18] G. Doddington, “Automatic evaluation of machine translation quality using n-gram cooccurrence statistics,” in Proceedings of the second international conference on Human Language Technology Research, pp. 138–145, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2002. [19] K. Yamabana, S. Ando, and K. Mimura, “Lexicalized Tree Automata-based Grammars for translating conversational texts,” in Proceedings of the 18th conference on Computational linguistics-Volume 2, pp. 926–932, Association for Computational Linguistics Morristown, NJ, USA, 2000. [20] A. Joshi and Y. Schabes, “Tree-adjoining grammars,” Handbook of formal languages, vol. 3, pp. 69–124, 1997. [21] S. Shieber and Y. Schabes, “Synchronous tree-adjoining grammars,” in Proceedings of the 13th conference on Computational linguistics-Volume 3, pp. 253–258, Association for Computational Linguistics Morristown, NJ, USA, 1990. [22] 呂雅娟, “單語句法分析指導的雙語結構對齊,” 計算機研究與發展, vol. 40, no. 007, pp. 970–976, 2003. [23] C. Doran, D. Egedi, B. Hockey, B. Srinivas, and M. Zaidel, “XTAG system: a wide coverage grammar for English,” in Proceedings of the 15th conference on Computational linguistics-Volume 2, pp. 922–928, Association for Computational Linguistics Morristown, NJ, USA, 1994. [24] Y. Schabes, A. Abeille, and A. Joshi, “Parsing strategies with’lexicalized’grammars: application to tree adjoining grammars,” in Proceedings of the 12th conference on Computational linguistics-Volume 2, pp. 578–583, Association for Computational Linguistics Morristown, NJ, USA, 1988. [25] C. G′omez-Rodr′ıguez, M. Alonso, and M. Vilares, “On theoretical and practical complexity of TAG parsers,” Monachesi, P. Penn, G., Satta, G., Wintner, S.(eds.) FG, pp. 29– 30, 2006. [26] http://stardict.sourceforge.net/, “LangDao English-Chinese Dictionary,” [27] J. Franz, “Och. 2000. GIZA++: Training of statistical translation models,” tech. rep., Technical report, RWTH Aachen, University of Technology.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.146.221.204 論文開放下載的時間是校外不公開 Your IP address is 3.146.221.204 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS