Responsive image
博碩士論文 etd-0910109-143628 詳細資訊
Title page for etd-0910109-143628
論文名稱
Title
應用於智慧型行動裝置之特定領域機器翻譯系統
Rule-based Machine Translation in Limited Domain for PDAs
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
65
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2009-08-17
繳交日期
Date of Submission
2009-09-10
關鍵字
Keywords
詞彙化樹狀自動機文法、同步樹鄰接文法、機器翻譯、樹鄰接文法、規則式機器翻譯
Machine translation, Tree adjoining grammar, Rule-based machine translation, Lexicalized tree automata-based grammar, Synchronous lexicalized tree adjoining grammar
統計
Statistics
本論文已被瀏覽 5617 次,被下載 0
The thesis/dissertation has been browsed 5617 times, has been downloaded 0 times.
中文摘要
本研究是實做智慧行動裝置的機器翻譯系統,機器翻譯採用規則式方法來完成。規則式翻譯系統有三個主要模組:分析、轉換和生成。所使用的文法規則有詞彙化樹狀自動機文法及同步樹鄰接文法。詞彙化樹狀自動機文法使用在分析模組中;同步樹鄰接文法使用在轉換與生成。分析模組用到的剖析器是由現存剖析器改寫而成,而同步樹鄰接文法剖析器是用來比對原始語言剖析樹與同步樹鄰接文法的原始端,比對成功者代表可以轉換成同步樹鄰接文法的目標語言端,再由目標語言端合成出可能的目標語言樹,這些合成出的目標語言樹會經由語言模型及規則機率評分,以分數最高者為輸出。在合成過程中,為避免假說過多,會刪去低於門檻值的假說。整體而論,我們的系統和其他規則式系統不同之處在於:可自動擷取文法規則、使用了極具彈性的規則型態。同步樹鄰接文法剖析器即是為此具彈性的規則所設計。在實驗中,我們以中英旅遊語料為訓練語料,產生出所需要的規則,再以旅遊領域語料做測試集,可以得到17% BLEU值。
Abstract
In this thesis, we implement a rule-based machine ranslation (MT) system for Personal Digital Assistants (PDAs). Rule-based MT system has three modules in general: analysis, transfer and generation. Grammars used in our system are lexicalized tree automata-based grammar (LTA) and synchronous lexicalized tree adjoining grammar (SLTAG). LTA is used for analysis, and SLTAG is used for transfer and generation. We adjust developed parser to PDAs as a parser in the analysis module. The SLTAG parser in the transfer module would search possible source side of SLTAG in source parse tree. Then, growing target parse tree and scoring each hypothesis is based on language model and rule probability. To avoid too much estimation, generation step would prune some hypotheses under threshold. Compared with other rule-based MT systems, we can build rules automatically and design a flexible rule type. SLTAG parser is coded specially for the rule type. In experiments, Chinese-English BTEC is our training and test data. We can get 17% BLEU score for the test data.
目次 Table of Contents
List of Tables iv
List of Figures v
Acknowledgments vii
Chapter 1 Introduction 1
1.1 ackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 Related Works 5
2.1 Machine Translation for Handheld Devices . . . . . . . . . . . . . . . . . . 5
2.1.1 Rule-based MT on Handheld Devices . . . . . . . . . . . . . . . . . 6
2.1.2 Statistical MT on Handheld Devices . . . . . . . . . . . . . . . . . . 7
2.1.3 MT through Internet on Handheld Devices . . . . . . . . . . . . . . 8
2.2 Stanford Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Tree Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Basic Decoding Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Translation Quality Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Bilingual Evaluation Understudy . . . . . . . . . . . . . . . . . . . . 12
2.5.2 NIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 3 Fundamental Theory 14
3.1 Lexicalized Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Lexicalized Tree Automata-based Grammars . . . . . . . . . . . . . . . . . 15
3.3 Tree Adjoining Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Synchronous Lexicalized Tree Adjoining Grammar . . . . . . . . . . . . . . 21
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 4 Rule-based Machine Translation System 24
4.1 Rule Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Corpus Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Rule Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.3 Tree-to-Tree Alignment . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.4 SLTAG Rule Extraction . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.5 Compact Data Structure . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Translation Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 LTA Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.2 SLTAG Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 5 Experiments 43
5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.1 The Results of Analysis . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.2 The Results of Transfer and Generation . . . . . . . . . . . . . . . . 46
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 6 Conclusion and Future Work 47
Bibliography 48
Appendix A 52
參考文獻 References
[1] A. Newell, “Production systems: Models of control structures,” 1973.
[2] P. Brown, V. Della Pietra, S. Della Pietra, and R. Mercer, “The mathematics of statistical
machine translation: Parameter estimation,” Computational linguistics, vol. 19, no. 2,
pp. 263–311, 1993.
[3] K. Yamada and K. Knight, “A decoder for syntax-based statistical MT,” in Proceedings
of the 40th Annual Meeting on Association for Computational Linguistics, pp. 303–310,
Association for Computational Linguistics Morristown, NJ, USA, 2001.
[4] R. Hsiao, A. Venugopal, T. K‥ohler, Y. Zhang, P. Charoenpornsawat, A. Zollmann, S. Vogel,
A. Black, T. Schultz, and A. Waibel, “Optimizing components for handheld twoway
speech translation for an English-Iraqi Arabic system,” in Ninth International Conference
on Spoken Language Processing, ISCA, 2006.
[5] Y. Zhang and S. Vogel, “PanDoRA: A Large-scale Two-way Statistical Machine Translation
System for Hand-held Devices,” the Proceedings of MT Summit XI, pp. 10–14.
[6] B. Zhou, S. Chen, Y. Gao, I. Center, and Y. Heights, “Folsom: A fast and memoryefficient
phrase-based approach to statistical machine translation,” in IEEE Spoken Language
Technology Workshop, 2006, pp. 226–229, 2006.
[7] Y. Gao, L. Gu, B. Zhou, R. Sarikaya, M. Afify, H. Kuo, W. Zhu, Y. Deng, C. Prosser,
W. Zhang, et al., “IBM MASTOR SYSTEM: Multilingual automatic speech-to-speech
translator,” 2006.
[8] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan,
W. Shen, C. Moran, R. Zens, et al., “Moses: Open source toolkit for statistical machine
translation,” in ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL
LINGUISTICS, vol. 45, p. 2, 2007.
[9] M. Zhang, H. Jiang, A. Aw, H. Li, C. Tan, and S. Li, “A tree sequence alignment-based
tree-to-tree translation model,” Proc. ACL-08: HLT, 2008.
[10] R. Isotani, K. Yamabana, S. Ando, K. Hanazawa, S. Ishikawa, T. Emori, K. Iso, H. Hattori,
A. Okumura, and T. Watanabe, “An automatic speech translation system on PDAs
for travel conversation,” in Fourth IEEE International Conference on Multimodal Interfaces,
2002. Proceedings, pp. 211–216, 2002.
[11] M. Paul, H. Okuma, H. Yamamoto, E. Sumita, S. Matsuda, T. Shimizu, and S. Nakamura,
“Multilingual Mobile-Phone Translation Services for World Travelers,”
[12] D. Klein and C. Manning, “Fast exact inference with a factored model for natural language
parsing,” in Advances in Neural Information Processing Systems 15: Proceedings
of the 2002 Conference, p. 3, MIT Press, 2003.
[13] M. Zhang, H. Jiang, A. Aw, J. Sun, S. Li, and C. Tan, “A tree-to-tree alignment-based
model for statistical machine translation,” in Machine Translation Summit XI, 2007.
Proceedings, pp. 935–542, Association for Computational Linguistics Morristown, NJ,
USA, 2007.
[14] K. Matsui, Y. Wakita, T. Konuma, K. Mizutani, M. Endo, and M. Murata, “An experimental
multilingual speech translation system,” in Proceedings of the 2001 workshop
on Perceptive user interfaces, pp. 1–4, ACM New York, NY, USA, 2001.
[15] K. Mizutani, T. Konuma, M. Endo, T. Nambu, and Y. Wakita, “Evaluation of a speech
translation system for travel conversation installed in PDA,” in First IEEE Consumer
Communications and Networking Conference, 2004. CCNC 2004, pp. 465–470, 2004.
[16] V. Lombardo and L. Lesmo, “An Earley-type recognizer for dependency grammar,” in
Proceedings of the 16th conference on Computational linguistics-Volume 2, pp. 723–
728, Association for Computational Linguistics Morristown, NJ, USA, 1996.
[17] K. Papineni, S. Roukos, T. Ward, and W. Zhu, “BLEU: a Method for Automatic Evaluation
of Machine Translation,”
[18] G. Doddington, “Automatic evaluation of machine translation quality using n-gram cooccurrence
statistics,” in Proceedings of the second international conference on Human
Language Technology Research, pp. 138–145, Morgan Kaufmann Publishers Inc. San
Francisco, CA, USA, 2002.
[19] K. Yamabana, S. Ando, and K. Mimura, “Lexicalized Tree Automata-based Grammars
for translating conversational texts,” in Proceedings of the 18th conference on Computational
linguistics-Volume 2, pp. 926–932, Association for Computational Linguistics
Morristown, NJ, USA, 2000.
[20] A. Joshi and Y. Schabes, “Tree-adjoining grammars,” Handbook of formal languages,
vol. 3, pp. 69–124, 1997.
[21] S. Shieber and Y. Schabes, “Synchronous tree-adjoining grammars,” in Proceedings of
the 13th conference on Computational linguistics-Volume 3, pp. 253–258, Association
for Computational Linguistics Morristown, NJ, USA, 1990.
[22] 呂雅娟, “單語句法分析指導的雙語結構對齊,” 計算機研究與發展, vol. 40, no. 007,
pp. 970–976, 2003.
[23] C. Doran, D. Egedi, B. Hockey, B. Srinivas, and M. Zaidel, “XTAG system: a wide coverage
grammar for English,” in Proceedings of the 15th conference on Computational
linguistics-Volume 2, pp. 922–928, Association for Computational Linguistics Morristown,
NJ, USA, 1994.
[24] Y. Schabes, A. Abeille, and A. Joshi, “Parsing strategies with’lexicalized’grammars: application
to tree adjoining grammars,” in Proceedings of the 12th conference on Computational
linguistics-Volume 2, pp. 578–583, Association for Computational Linguistics
Morristown, NJ, USA, 1988.
[25] C. G′omez-Rodr′ıguez, M. Alonso, and M. Vilares, “On theoretical and practical complexity
of TAG parsers,” Monachesi, P. Penn, G., Satta, G., Wintner, S.(eds.) FG, pp. 29–
30, 2006.
[26] http://stardict.sourceforge.net/, “LangDao English-Chinese Dictionary,”
[27] J. Franz, “Och. 2000. GIZA++: Training of statistical translation models,” tech. rep.,
Technical report, RWTH Aachen, University of Technology.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.146.221.204
論文開放下載的時間是 校外不公開

Your IP address is 3.146.221.204
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code