國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以問答格式文件為基礎之跨語言問答技術,Cross-Lingual Question Answering for Corpora with Question-Answer Pairs

論文名稱 Title	以問答格式文件為基礎之跨語言問答技術 Cross-Lingual Question Answering for Corpora with Question-Answer Pairs
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	93 學年度第 2 學期 The spring semester of Academic Year 93	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	56
研究生 Author	黃宣龍 Shiuan-Lung Huang
指導教授 Advisor	魏志平 Chih-Ping Wei
召集委員 Convenor	張德民 Te-Min Chang
口試委員 Advisory Committee	楊傳智 C.C. Yang
口試日期 Date of Exam	2005-07-20	繳交日期 Date of Submission	2005-08-02
關鍵字 Keywords	文件探勘、問答技術、資訊擷取、跨語言問答技術 Text mining, Cross-lingual question answering, Question answering, Information retrieval
統計 Statistics	本論文已被瀏覽 5785 次，被下載 5451 次 The thesis/dissertation has been browsed 5785 times, has been downloaded 5451 times.

中文摘要
一個以問答格式文件為基礎的自動問答系統必須能接受使用者以自然語言撰寫問句，並在文件集當中找出相關的文件供使用者參考。截至目前為止大部分的問答技術研究都集中在單一語言的處理之上，換言之，這些研究所處理的使用者問句以及文件集當中的文件皆屬於同一種語言。然而由於商業環境的全球化和網際網路技術的長足進步，不論是個人與組織都必須同時處理各種不同語言的問答格式文件。為了讓使用者能夠更便利地使用自然語言來存取各種語言的問答格式文件，跨語言問答技術的需求與日俱增。基於上述跨語言問答技術的重要性與需求，本研究提出以統計式辭典為基礎的跨語言問答技術，並以單語問答技術以及以機器翻譯為基礎之跨語言問答技術作為效能上的參考與比較基準。實驗結果顯示，本研究所提出之跨語言問答技術能夠達到令人滿意的結果。同時，本研究所提出之以統計式辭典為基礎的跨語言問答技術優於以機器翻譯為基礎之跨語言問答技術。
Abstract
Question answering from a corpus of question-answer (QA) pairs accepts a user question in a natural language, and retrieves relevant QA pairs in the corpus. Most of existing question answering techniques are monolingual in nature. That is, the language used for expressing a user question is identical to that for the QA pairs in the corpus. However, with the globalization of business environments and advances in Internet technology, more and more online information and knowledge are stored in the question-answer pair format on the Internet or intranet in different languages. To facilitate users’ access to these QA-pair documents using natural language queries in such a multilingual environment, there is a pressing need for the support of cross-lingual question answering (CLQA). In response, this study designs a thesaurus based CLQA technique. We empirically evaluate our proposed CLQA technique, using a monolingual question answering technique and a machine translation based CLQA technique as performance benchmarks. Our empirical evaluation results show that our proposed CLQA technique achieves a satisfactory effectiveness when using that of the monolingual question answering technique as a performance reference. Moreover, our empirical evaluation results suggest our proposed thesaurus based CLQA technique significantly outperforms the benchmark machine translation based CLQA technique.

目次 Table of Contents
Chapter 1. Introduction 8 1.1 Research Background 8 1.2 Research Motivation and Objectives 9 1.3 Organization of the Thesis 10 Chapter 2. Literature Review 11 2.1 Monolingual Question Answering Techniques 11 2.2 Cross-Lingual Question Answering (CLQA) Techniques 14 Chapter 3. Design of Cross-Lingual QA-Pair Question Answering (CLQA) Technique 18 3.1 Cross-lingual Thesaurus Construction 19 3.2 Question Representation 22 3.3 Question Type Identification 22 3.4 Term Vector Translation 27 3.5 QA Matching 28 Chapter 4. Empirical Evaluation 33 4.1 Data Collection 33 4.2 Performance Benchmarks 34 4.3 Evaluation Procedure and Criteria 36 4.4 Parameter Tuning Experiments and Results 37 4.4.1 Parameter Tuning for Monolingual Question Answering Technique 37 4.4.2 Parameter Tuning for Machine Translation Based CLQA Technique 42 4.4.3 Parameter Tuning for the Proposed Thesaurus Based CLQA Technique 44 4.5 Comparative Evaluations 47 Chapter 5. Conclusion and Future Research Directions 50 References 53

參考文獻 References
[AAB04] Ahn, K., Alex, B., Bos, J., Dalmas, T., Leidner, J. L., and Smillie, M. B., “Cross-lingual Question Answering with QED,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2004), Bath, United Kingdom, 2004. [AKM04] Aunimo, L., Kuuskoski, R. and Makkonen, J., “Cross-Language Question Answering at the University of Helsinki,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2004), Bath, United Kingdom, 2004. [ALL01] Agichtein, E., Lawrence, S. and Luis, G., “Learning Search Engine Specific Query Transformations for Question Answering,” Proceedings of the 10th World Wide Web Conference (WWW 10), Hong Kong, China, 2001, pp.169-178. [B92] Brill, E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the Third Conference on Applied Natural Language Processing, ACL, Trento, Italy, 1992, pp.152-155. [B94] Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,” Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994, pp.722-727. [BBF99] Breck E., Burger J., Ferro L., House D., Light M. and Mani I., “A Sys Called Qanda,” Proceedings of the 8th TExt Retrieval Conference (TREC-8), 1999, pp.443-451. [BCC00] Berger, A., Caruana, R., Cohn D., Freitag, D. and Mittal, V., “Bridging the Lexical Chasm: Statistical Approaches to Answer-Finding,” Proceedings of the 23rd Annual Conference on Research and Development in Information Retrieval (ACM SIGIR), Athens, Greece, 2000, pp.192-199. [BEG04] Bourdil, G., Elkateb, F., Grau, B., Illouz, G., Monceaux, L., Robba, I. and Vilnat, A., “How to Answer in English to Questions Asked in French: by Exploiting Results from Several Sources of Information,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2004), Bath, United Kingdom, 2004. [BHK97] Burke, R., Hammond, K. and Kulyukin, V., “Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System,” AI Magazine, Vol. 18, No.2, 1997, pp.57-66. [EOM03] Echihabi A., Oard, D. W., Marcu, D. and Hermjakob, U., “Cross-Language Question Answering at the USC Information Sciences Institute,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2003), Trondheim, Norway, 2003. [HGH00] Hovy, E., Gerber, L., Hermjakob, U., Junk, M. and Lin, C., “Question Answering in Webclopedia,” Proceedings of the Ninth Text Retrieval Conference (TREC-9), Gaithersburg, MD, November 2000, pp.655-664. [JC94] Jing, Y., and Croft, W. B., “An Association Thesaurus for Information Retrieval,” Technical Report, Department of Computer Science, University of Massachusetts at Amherst, 1994. [MHP99] Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., and Rus, V., “Lasso: A Tool for Surfing the Answer Net,” Proceedings of the 8th TExt Retrieval Conference (TREC-8), 1999, pp.175-183. [NS03] Neumann, G. and Sacaleanu, B., “A Cross-Language Question/Answering System for German and English,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2003), Trondheim, Norway, 2003. [OSS04] Osenova, P., Simov, A., Simov, K., Tanev, H. and Kouylekov, M., “Bulgarian-English Question Answering: Adaptation of Language Resources,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2004), Bath, United Kingdom, 2004. [PF03] Plamondon, L. and Foster, G., “Quantum: A French/English Cross-language Question Answering System,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2003), Trondheim, Norway, 2003. [RFQ02] Radev, D., Fan, W., Qi, H., Wu, H., and Grewal, A., “Probabilistic Question answering on the Web,” Proceedings of the 11th International WWW Conference, Honolulu, Hawaii, 2002, pp.408-419. [RR04] Roussinov, D. and Robles-Flores, J., “Self-Learning Web Question Answering System,” Proceedings of 2004 World Wide Web (WWW) Conference, New York, NY, 2004, pp.400-401. [RT00] Riloff, E. and Thelen, M., “A Rule-based Question Answering System for Reading Comprehension Tests,” Proceedings of ANLP/NAACL-2000 Workshop on Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, 2000. [S99] Sneiders, E., “Automated FAQ Answering: Continued Experience with Shallow Language Understanding,” Proceedings for the 1999 AAAI Fall Symposium on Question Answering Systems, North Falmouth, MA, 1999, pp.97-107. [SG03] Sekine, S. and Grishman, R., “Hindi-English Cross-lingual Question-Answering System,” ACM Transactions on Asian Language Information Processing (TALIP), Vol. 2, No. 3, September 2003, pp.181-192. [TNM04] Tanev, H., Negri, M., Magnini, B., and Kouylekov, M., “The DIOGENE Question Answering System at CLEF-2004,” Workshop of the Cross-Lingual Evaluation Forum (CLEF-2004), Bath, United Kingdom, 2004. [WLY05] Wei, C., Lin, Y. T., and Yang, C. C. “Cross-Lingual Text Categorization for Global Knowledge Management,” Working Paper, Department of Information Management, National Sun Yat-sen University, Kaohsiung, Taiwan, R.O.C., June 2005. [YL03] Yang, C. C. and Luk J., “Automatic Generation of English/Chinese Thesaurus Based on a Parallel Corpus in Laws,” Journal of the American Society for Information Science and Technology, Vol. 54, No. 7, 2003, pp. 671-682.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內外都一年後公開 withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0802105-142753.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS