Responsive image
博碩士論文 etd-0828103-162122 詳細資訊
Title page for etd-0828103-162122
論文名稱
Title
從網頁文件中學習知識架構以支援網頁搜尋
Learning ontology from Web documents for supporting Web query
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
92
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2003-07-30
繳交日期
Date of Submission
2003-08-28
關鍵字
Keywords
查詢延伸、本體論建構、本體論
query expansion, ontology leaning, ontology
統計
Statistics
本論文已被瀏覽 5678 次,被下載 18
The thesis/dissertation has been browsed 5678 times, has been downloaded 18 times.
中文摘要
本研究提出一個以本體論知識架構為基礎的查詢延伸機制。自動化的查詢延伸機制發展已久,並使用在包含網頁搜尋等多種不同的查詢系統中。為了更進一步協助使用者釐清有效的搜尋關鍵字,區分使用者可能的目標領域網頁,自動化查詢延伸機制常結合外在的知識庫或字典,試圖提升搜尋的滿意度以及正確性。在資訊科技領域裡,本體論知識架構被視作一個可交換、明確的語言,它定義了特定領域下的重要概念及其關聯。根據本體論知識架構,查詢延伸機制可利用它的概念以及關聯來作為延伸字彙的依據。本研究實作了以機器建構本體論知識架構的流程,以網際網路上的文件作為資料來源,產出一特定領域的本體論知識架構來。此外,亦探討不同的延伸機制以及字彙選擇依據。自動建構的本體論知識架構將藉由延伸機制的搜尋結果來加以驗證。
Abstract
This thesis proposes a query expansion mechanism based on ontology. Automatic query expansion has facilitated web pages search in several ways. An external knowledge resource can help user identify the searching domain and efficient keywords. Ontology is taken as the metadata of a knowledge domain. Query could be expanding in different approaches based on ontology. In this research, an ontology learning process is implemented. With no initial ontology as backbone, domain ontology is constructed from World Wild Web document semi-automatically. Three expanding approaches based on concepts and their relations are proposed. Ontology learning result and expanding approaches are evaluated by comparing the different search results in atypical IR system.
目次 Table of Contents
ABSTRACT III
中文摘要 IV
TABLE OF CONTENTS V
LIST OF FIGURES VIII
LIST OF TABLES IX

CHAPTER 1 INTRODUCTION 1
1.1 RESEARCH BACKGROUND 1
1.2 RESEARCH MOTIVATION 2
1.3 RESEARCH OBJECTIVES 3
1.4 THESIS ORGANIZATION 3
CHAPTER 2 LITERATURE REVIEW 4
2.1 QUERY EXPANSION 4
2.1.1 Query Model 4
2.1.2 Expansion approaches 6
2.2 ONTOLOGY 9
2.2.1 Ontology as Content Theory 9
2.2.2 Automatic Ontology Construction 11
2.3 WEB MINING 16
2.3.1 Text Mining Perspective 17
2.3.2 Internal Web Structure 18
CHAPTER 3 ONTOLOGY LEARNING PROCESS 19
3.1 THE COMPONENTS AND NOTIONS OF ONTOLOGY 20
3.2 DATA COLLECTION AND PREPROCESS 21
3.2.1 Parsing HTML Structure 23
3.2.2 Lexical Analysis 25
3.3 EXTRACTING CONCEPTS FROM HTML DOCUMENTS 27
3.4 FINDING RELATIONS BETWEEN CONCEPTS 29
CHAPTER 4 KNOWLEDGE-BASED QUERY EXPANSION 35
4.1 CRITERIA IN ONTOLOGY-BASED APPROACH 35
4.2 TERM SELECTION AND QUERY MODIFICATION 36
4.2.1 Searching Approaches 36
4.2.2 Query Modification Operations 44
CHAPTER 5 THE EVALUATION OF ONTOLOGY LEARNING WITH QUERY EXPANSION APPLICATION 46
5.1 EVALUATING ONTOLOGY 46
5.2 THE EXPERIMENTAL DESIGN OF QUERY EXPANSION 47
5.2.1 Experimental Environment 47
5.2.2 Testing Data Collection 47
5.3 THE EXPERIMENTAL DESIGN 49
5.3.1 Selecting Initial Query String 49
5.3.2 Query Modification Mechanisms 50
5.3.3 Retrieval Results 51
5.4 EXPERIMENTAL RESULTS AND EVALUATION 52
5.4.1 Recall and Precision Evaluation 52
5.4.2 Evaluating Different Expansion Approaches 55
5.4.3 Evaluating Different Support values 59
CHAPTER 6 CONCLUSIONS AND RESEARCH LIMITATION 61
REFERENCES 64
APPENDIX A. A PENN TREEBANK POS TAG SET 69
APPENDIX B. A PENN TREEBANK PHRASAL CATEGORIES 70
APPENDIX C. A LIST OF CONCEPT TERMS WITH WEIGHTS 71
APPENDIX D. A LIST OF CONCEPT PAIRS WITH DISTANCE 74
APPENDIX E. THE RANKING ALGORITHM USED IN LUCENE 80
APPENDIX F. TERM EXPANSION RESULTS IN THREE APPROACHES WITH VARIOUS SUPPORT VALUES 81
APPENDIX G. THE PAIR T-TEST OF EXPANSION EFFECTIVENESS IN DIFFERENT SUPPORT VALUES 90
參考文獻 References
Agirre, E. and Rigau, G. (1996). Word Sense Disambiguation using Conceptual Density. Proceedings of 15th International Conference on Computational Linguistics, COLING 96. Copenhagen, Denmark, 1996.
Agirre, E. and Ausa, O. and Havy, E. and Martinez, D. (2000). Enriching very large ontologies using the WWW. ECA12000 workshop on Ontology Learning, http:1/012000.aifb.uni- karlsruhe.de/, Berlin, August 2000
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison Wesley, Essex, England, 1999.
Berners-Lee, T., Hendler, J. & Lassila, O. (2001). The semantic web. Scientific American, May.
Bodner, R. and Song, F. (1996). Knowledge-based approaches to query expansion in information retrieval. In McCalla, G. (Ed.), Advances in Artificial Intelligence (pp. 146-158). New York: Springer.
Carchiolo, V. and Longheu, A. and Malgeri, M. (2000). Extracting Logical Schema from the Web. Workshop on Text and Web Mining, pages 64-71, 2000
Carnot, M.J. (2001). Concept Map-Based versus Web Page-Based Interfaces in Search and Browsing. ICTE Tallahassee 2001
Chandrasekaran,B. and Josephson, R. and Benjamins, R. (1999). What Are Ontologies, and Why Do We Need Them? IEEE Intelligent Systems, 14(1):20--26, January 1999.
Cormen, T.H. and Leiserson, C.E. and Rivest, R.L. (1990). Introduction to algorithms. Cambridge, Mass. (u.a.) : MIT Pr. ; New York (u.a.) : McGraw-Hill, 1990.
Cui, H. and Wen, J.R. and Nie, J.Y. and Ma, W.Y. (2003). Query Expansion by Mining User Logs. IEEE Transaction on Knowledge and Data Engineering, Vol. 15, No. 4, July/August 2003
Davenport, T.H. & Prusak, L. (1998). Working Knowledge: How Organizations Manage What They Know, Harvard Business School Press, 1998
Devedzic, V. (2002). Understanding Ontological Engineering. Communications of the ACM, Volume 45, No.4ve, April 2002, pp. 136-144.
Ding, Y. (2001). Ontology: The enabler for the Semantic Web. Journal of Information Science, 27(6)
Ding, Y. and Foo, S. (2002). Ontology Research and Development: Part 1 – A Review of Ontology Generation. Journal of Information Science 28(2).
DiPasquo, D. (1998). Using HTML Formatting to Aid in Natural Language Processing on the World Wide Web. Senior Honors Thesis, School of Computer Science, CMU, May 1998.
Fensel, D. (2001). Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. Springer-Verlag, 2001.
Frakes, W.B. and Baeza-Yates, R.B. (1992). Information Retrieval: Data Structure & Algorithms, Prentice Hall, Englewood Cliffs, New Jersey.
Gandon, F. (2002). Ontology Engineering: a survey and a return on experience., Rapport de Recherche INRIA, RR4396, Mars 2002 http://www.inria.fr/rrrt/rr-4396.html
Gomez-Perez, A. and Benjamins, V.R. (1999). Overview of knowledge sharing and reuse components: Ontologies and problem-solving methods. IJCAI-99
Gruber, T.R. (1993). A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5 (1993) 2, 199-220
Hull, D. (1993). Using Statistical Testing in the Evaluation of Retrieval Experiments. SIGIR 1993: 329-338
Jichang, W. and Huan, H. and Gangshan, W. and Fugan, Z. (1997). Web mining: Knowledge Discovery on the web. In Proceedings of the ninth International Conference on Tools with Artificial Intelligence. Nov.
Khan, L. (2000). Ontology-based Information Selection. Ph.D. Dissertation, Department of Computer Science, University of Southern California, August 2000.
Klein, D. and Manning, C.D. (2002). Fast Exact Inference with a Factored Model for Natural Language Parsing. To appear in Advances in Neural Information Processing Systems 15 (NIPS 2002), December 2002.
Kietz, J-U. and Maedche, A. and Volz, R. (2000). A method for semi-automatic ontology acquisition from a corporate intranet. In Proceedings of the EKAW'00 Workshop on Ontologies and Text, Juan-Les-Pins, France, oct 2000
Korfhage, R. (1997). Information Storage and Retrieval. N.Y.: John Wiley, 1997.
Lee, D. (2002). Query Relaxation for XML Model. In Ph.D Dissertation, University of California, Los Angeles, June 2002
Maedche, A. and Staab, S. (2000). Mining Ontologies from Text. EKAW 2000: 189-202
Maedche, A. and Staab, S. (2001). Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 2001.
Maedche, A. and Pekar, V. and Staab, S. (2002). Ontology Learning Part One - On Discoverying Taxonomic Relations from the Web. In: Ning Zhong et al. (eds) Web Intelligence. Springer, 2002.
Manning, C. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, 1999.
Marcus, M. et al. (1999). The Penn Treebank Project. http://www.cis.upenn.edu/~treebank/home.html
Miller, R. (2002). WebSPHINX: A Personal, Customizable Web Crawler. http://www-2.cs.cmu.edu/~rcm/websphinx/
Mitchell, T. (1997). Machine Learning. McGraw Hill, 1997
Mizoguchi, R. & Ikeda, M. (1997). Towards Ontology Engineering, Proc. of PACES'97, pp.259-266, 1997
Page, L. and Brin, S. and Motwani, R. and Winograd, T. (1998). The PageRank Citation Ranking: Bringing Order to the Web. Manuscript in progress. http://google.stanford.edu/~backrub/pageranksub.ps
Peat, H.J. and Willett, P. (1991). The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the ASIS, 42(5), (1991), 378--383.
Qiu, Y. and Frei, H.P. (1993). Concept Based Query Expansion. In Proc. of the 16th Int. ACM SIGIR Conf., pages 160-169, ACM Press, June 1993.
Raggett, D. (2003). Clean up your Web pages with HTML TIDY. http://www.w3.org/People/Raggett/tidy/
Rijsbergen, C. J. van (1979). Information Retrieval. Butterworth 1979.
Salton, G. and Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5): 513-523 (1988)
Salton, G.. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
Soderland, S. (1997). Learning to Extract Text-based Information from the World WideWeb. Proceedings of the Third International Conference on Knowledge Discovery and DataMining, 1997, Newport Beach, California, pp. 251--254.
Srikant, R. and Agrawal, R. (1995). Mining Generalized Association Rules. VLDB 1995: 407-419
Velardi, P. and Missikoff, M. and Fabriani, P. (2001). Using Text Processing Techniques to Automatically enrich a Domain Ontology. ACM conference on Formal Ontologies in Information Systems (FOIS 2001), Maine, USA (2001)
Wood, L. et al. (1998). Document Object Model (DOM) Level 1 Specification. http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/
Yang, Y. and Pedersen J.P. (1997). A Comparative Study on Feature Selection in Text Categorization. Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), 1997.
Yuwono, B. and Lee, D. L. (1996). Search and ranking algorithms for locating resources on the world wide web. In S. Su, editor, Proceedings of the Twelfth International Conference on Data Engineering, volume 1996, pages 164 -- 171, New Orleans, LA, 1996. IEEE.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內一年後公開,校外永不公開 campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 34.237.140.238
論文開放下載的時間是 校外不公開

Your IP address is 34.237.140.238
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code