Responsive image
博碩士論文 etd-0124106-010103 詳細資訊
Title page for etd-0124106-010103
論文名稱
Title
以詞彙網路與概念延伸改善詞義辨識
Word Sense Disambiguation Using WordNet and Conceptual Expansion
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
59
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2006-01-18
繳交日期
Date of Submission
2006-01-24
關鍵字
Keywords
詞彙網路、語義彙編資料檔、自然語言處理、詞義辨識
Natural Language Processing, SemCor, Word Sense Disambiguation, WordNet
統計
Statistics
本論文已被瀏覽 5738 次,被下載 0
The thesis/dissertation has been browsed 5738 times, has been downloaded 0 times.
中文摘要
如同一個英文單字可能有數種不同的解釋意思, 一種解釋意思也可能被表示成數個不同的英文單字。因此,對於在文句中詞義模糊的單字而言,如何選出最適當的代表意思是自然語言處理技術上的主要問題之一。然而,目前大部份詞義解析的方法若非只能解析部份詞類(如名詞)的單字,就是其解析詞義的精確度仍有賴突破,所造成的詞義模糊情況常令使用者感到困擾。
這篇研究中,提出一個新的利用詞彙網路(WordNet)、語義彙編資料庫檔案(SemCor)和全球資訊網(the Web)的詞義解析方法,試圖明確解析文句中不同詞類的單字,除了名詞之外,也能解析句子中的動詞、形容詞和副詞的模糊詞義。從語義彙編資料庫中隨機選擇文件檔案與字句,藉由測量詞對中詞義之間的語義相似度,篩選出詞對中目標詞可能的適當候選詞義;根據詞彙網路所提供的同義詞集資料庫,利用加權的機制考量同義詞之間可能的詞義差異,以決定候選詞義的對應同義詞集;並且使用對應同義詞集中的詞義以延伸候選詞義,再合併內文視窗技術形成查詢字串,以該查詢字串傳送到搜尋引尋搜尋全球資訊網上相關的文件之後,藉由找到的相關文件數量排序候選詞義,以判斷目標詞的最佳詞義。
本研究利用布朗大學建立的語義彙編資料檔作為效能評估的資料來源, 實驗結果顯示本方法解析名詞、動詞、形容詞和副詞等四種詞類之詞義,於使用第一個選擇詞義時的平均精確度為81.3% , 略優於Stetina et al.方法的80%和Mihalcea et al.方法的80.1%,且是其中唯一解析動詞的平均精確度達到70%的方法。若使用前三個選擇詞義時,本方法對該四種詞類的平均精確度超過96%,遠優於其它二種方法。本方法可望改善利用詞義分析技術的應用,例如機器翻譯、文件分類或資訊擷取等的效能。
Abstract
As a single English word can have several different meanings, a single meaning can be expressed by several different English words. The meaning of a word depends on the sense intended. Thus to select the most appropriate meaning for an ambiguous word within a context is a critical problem for the applications using the technologies of natural language processing. However, at present, most word sense disambiguation methods either disambiguate only restricted parts of speech words such as only nouns or the accuracy in disambiguating word senses is not satisfiable. The ambiguous situation often bothers users.
In this study, a new word sense disambiguation method using WordNet lexicon database, SemCor text files, and the Web is presented. In addition to nouns, the proposed method also attempts to disambiguate verbs, adjectives, and adverbs in sentences. The text files and sentences investigated in the experiments were randomly selected from SemCor. The semantic similarity between the senses of individually semantically ambiguous words in a word pair is measured to select the applicable candidate senses of a target word in that word pair. By a synonym weighting method, the possible sense diversity in synonym sets is considered based on the synonym sets WordNet provides. Thus corresponding synonym sets of the candidate senses are determined. The candidate senses expanded with the senses in the corresponding synonym sets, and enhanced by the context window technique form new queries. After the new queries are submitted to a search engine to search for the matching documents on the Web, the candidate senses are ranked by the number of the matching documents found. The first sense in the list of the ranked candidate senses is viewed as the most appropriate sense of the target word.
The proposed method as well as Stetina et al.’s and Mihalcea et al.’s methods are evaluated based on the SemCor text files. The experimental results show that for the top sense selected this method having the average accuracy of disambiguating word senses with 81.3% for nouns, verbs, adjectives, and adverbs is slightly better than Stetina et al.’s method of 80% and Mihalcea et al.’s method of 80.1%. Furthermore, the proposed method is the only method with the accuracy of disambiguating word senses for verbs achieving 70% for the top one sense selected. Moreover, for the top three senses selected this method is superior to the other two methods by an average accuracy of the four parts of speech exceeding 96%. It is expected that the proposed method can improve the performance of the word sense disambiguation applications in machine translation, document classification, or information retrieval.
目次 Table of Contents
CHAPTER 1 INTRODUCTION 1
CHAPTER 2 THE DEVELOPMENT RESOURCES 4
2.1 WORDNET 4
2.2 SEMCOR 8
CHAPTER 3 RELATED WORK 11
CHAPTER 4 THE WORD SENSE DISAMBIGUATION METHOD 16
4.1 SELECTING THE CANDIDATE SENSES 19
4.1.1 Semantic Similarity Measure 20
4.2 REDUCING THE SYNONYM SETS OF THE CANDIDATE SENSES 27
4.2.1 The Synonym Weighting Method 28
4.3 CONTEXTUAL RANKING OF THE CANDIDATE SENSES 35
4.3.1 Candidate Sense Expansion 36
4.3.2 Gathering Documents from the Web 39
CHAPTER 5 PERFORMANCE EVALUATION 40
5.1 EXPERIMENTAL ENVIRONMENT 40
5.2 EXPERIMENTAL RESULTS 45
5.3 DISCUSSION 48
CHAPTER 6 CONCLUSIONS 49
REFERENCES 50
參考文獻 References
[1] E. Agirre and G.. Rigau, “Word Sense Disambiguation Using Conceptual Density,” Proceedings of the 16th International Conference on Computational Linguistics, pp. 16-22, Copenhagen, 1996.
[2] S. Banerjee and T. Pedersen, “An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet,” Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, Mexico City, February 2002.
[3] S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Proc. 7th Int'l World Wide Web Conference, pp. 107-117, 1998.
[4] H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma, “Query Expansion by Mining User Logs,” IEEE Transaction on Knowledge and Data Engineering, Vol. 15, No. 4, pp. 829-839, July/August 2003.
[5] C. Fellbaum, “WordNet : An Electronic Lexical Database,” The MIT Press, ISBN 0-262-06179-X, Second Edition, 1999.
[6] K. Fragos, Y. Maistros, and C. Skourlas, “Word Sense Disambiguation using WordNet Relations,” Proceeding of the 1st Balkan Conference in Informatics, pp. 633-643, Thessaloniki, Greece, 2003.
[7] Google, “Internet Search Engines,” Http://www.google.com, 1997.
[8] C. Leacock and M. Chodorow, “Combining Local Context and WordNet Similarity for Word Sense Disambiguation,” WordNet : An Electronic Lexical Database, pp. 265-283, MIT Press, Cambridge MA, 1998.
[9] M. Lesk, “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone,” Proceedings of the 1986 SIGDOC Conference, pp. 24-26, New York, Association of Computing Machinery, 1986.
[10] G. A. Miller, “WordNet : A Lexical Database for English,” Communications of the ACM, Vol. 38, pp. 39-41, 1995.
[11] G. Miller, C. Leacock, T. Randee, and R. Bunker, “A Semantic Concordance,” Proceedings of the 3rd DARPA Workshop on Human Language Technology, pp. 303–308, Plainsboro, New Jersey, 1993.
[12] R. Mihalcea and D. Moldovan, “Using WordNet and Lexical Operators to Improve Internet Searches,” IEEE Internet Computing, Vol. 4, No. 1, pp. 34-43, January 2000.
[13] R. Mihalcea and D. Moldovan, “A Highly Accurate Bootstrapping Algorithm for Word Sense Disambiguation,” International Journal on Artificial Intelligence Tools, Vol.10, No.1-2, pp. 5-21, 2001.
[14] Princeton University, “WordNet online,” Http://wordnet.princeton.edu/perl/webwn, 2005.
[15] P. Rosso, F. Masulli, and D. Buscaldi, “Word sense disambiguation combining conceptual distance, frequency and gloss,” Natural Language Processing and Knowledge Engineering, pp. 120-12, Oct. 2003.
[16] P. Resnik and D. Yarowsky, “A Perspective on Word Sense Disambiguation Methods and Their Evaluation,” Proceedings of ACL SIGLEX Workshop on Tagging Text with Lexical Semantics, Why, What and How?, pp. 79-86, Washington DC, April 1997.
[17] J. Stetina, S. Kurohashi, and M. Nagao, “General Word Sense Disambiguation Method Based on a Full Sentential Context,” Proc. Workshop on Usage of WordNet in Natural Language Processing, pp. 1-8, Morgan Kaufmann, San Francisco, 1998.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.218.38.125
論文開放下載的時間是 校外不公開

Your IP address is 18.218.38.125
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code