Responsive image
博碩士論文 etd-0729104-222415 詳細資訊
Title page for etd-0729104-222415
論文名稱
Title
模糊群集式查詢擴展技術
Fuzzy Cluster-Based Query Expansion
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
69
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2004-07-27
繳交日期
Date of Submission
2004-07-29
關鍵字
Keywords
字詞使用差異、模糊群集、群集式查詢擴展、資訊擷取、文件探勘、查詢擴展、文件分群、字詞關聯性、模糊群集式查詢擴展
Fuzzy clustering, Fuzzy cluster-based query expansion, Term association, Information retrieval, Word mismatch, Query expansion, Document clustering, Cluster-based query expansion, Thesaurus, Text mining
統計
Statistics
本論文已被瀏覽 5780 次,被下載 2279
The thesis/dissertation has been browsed 5780 times, has been downloaded 2279 times.
中文摘要
隨著網路與資訊科技的高速發展,越來越多的資訊以文字文件的型態出現在網路上。資訊擷取(Information Retrieval) 指的是依使用者所下的查詢語句將相關連的文章傳回給使用者。而字詞使用差異(Word Mismatch)對於資訊擷取是一項挑戰,字詞使用差異指的是使用者使用和文件中不同的關鍵詞來描述同一概念的情況,查詢擴展即是一個處理字詞使用差異的方法。

因此在本篇論文中,我們提出一個模糊群集式字詞擴展技術(Fuzzy Cluster-Based Query Expansion Technique)來解決字詞使用差異,並利用現存的字詞擴展技術 (也就是Global Analysis and Cluster-Based Query Expansion Technique)當做我們的衡量基準。根據實證的結果,我們發現模糊群集式字詞擴展技術則可以提供比現存字詞擴展技術較精確的查詢結果。
Abstract
Advances in information and network technologies have fostered the creation and availability of a vast amount of online information, typically in the form of text documents. Information retrieval (IR) pertains to determining the relevance between a user query and documents in the target collection, then returning those documents that are likely to satisfy the user’s information needs. One challenging issue in IR is word mismatch, which occurs when concepts can be described by different words in the user queries and/or documents. Query expansion is a promising approach for dealing with word mismatch in IR.

In this thesis, we develop a fuzzy cluster-based query expansion technique to solve the word mismatch problem. Using existing expansion techniques (i.e., global analysis and non-fuzzy cluster-based query expansion) as performance benchmarks, our empirical results suggest that the fuzzy cluster-based query expansion technique can provide a more accurate query result than the benchmark techniques can.
目次 Table of Contents
CHAPTER 1 . INTRODUCTION 1
1.1 Background 1
1.2 Research Motivation and Objectives 4
1.3 Organization of the Thesis 5
CHAPTER 2 . LITERATURE REVIEW 6
2.1 Query Expansion Methods 6
2.1.1 Global analysis 6
2.1.2 Local feedback 7
2.1.3 Non-fuzzy cluster-based query expansion techniques 8
2.2 Thesaurus Construction Techniques 10
2.3 Document Clustering 14
2.4 Fuzzy Clustering and Fuzzy Document Clustering 16
CHAPTER 3 . DEVELOPMENT OF A FUZZY CLUSTER-BASED QUERY EXPANSION TECHNIQUE 19
3.1 Process of the Fuzzy Cluster-Based Query Expansion Technique 19
3.2 Thesauri Construction Process 20
3.2.1 Fuzzy document clustering 21
3.2.2 Local thesaurus construction 26
3.3 Query Process 27
3.3.1 Local query expansion 28
3.3.2 Document retrieval 29
CHAPTER 4 . EMPIRICAL EVALUATION 31
4.1 Data Collection 31
4.2 Evaluation Procedure and Criteria 34
4.3 Benchmark Techniques 38
4.4 Evaluation Results 40
4.4.1 Effects of number of document clusters 42
4.4.2 Comparative evaluation 43
4.4.3 Effects of the number of query terms 44
CHAPTER 5 . CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS 49
REFERENCES 51
APPENDIX A: CANDIDATE TERMS FOR INFORMATION RETRIEVAL 55
APPENDIX B: INTERSECTION INFORMATION 61
參考文獻 References
[A73] Anderberg, M. R., Cluster Analysis for Applications, New York: Academic Press Inc., 1973.
[AF77] Attar, R. and Fraenkel, A. S., “Local Feedback in Full-Text Retrieval Systems,” Journal of the ACM, Vol. 24, No. 3, 1997, pp. 397-417.
[BR99] Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval. New York: Addison Wesley, ACM Press, 1999.
[B92] Brill, E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the Third Conference on Applied Natural Language Processing, ACL Trento, Italy, 1992.
[B94] Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,” Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994.
[BGG99] Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, L., “Partitioning-Based Clustering for Web Document Categorization,” Decision Support Systems, Vol. 27, No. 3, December 1999, pp. 329-341.
[CCW95] Croft, W. B., Cook, R., and Wilder, D., “Providing Government Information on the Internet: Experiences with THOMAS,” Digital Libraries Conference, 1995, pp. 19-24.
[CH79] Croft, W. B., and Harper, D. J., “Using Probabilistic Models of Document Retrieval Without Relevance Information,” Journal of Documentation, Vol.35, 1979, pp. 285-295.
[CKP92] Cutting, D., Karger, D., Pedersen, J. and Tukey, J., “Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections,” Proceedings of 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, pp.318-329.
[D74] Dunn, J. C., “A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well separated Clusters,” Journal of Cybernetics, 3, 1974, pp. 95-104.
[D91] Dave, R. M., “Characterization and Detection of Noise in Clustering,” Pattern Recognition Letters, Vol.12, 1991, pp.657-664.
[EW86] El-Hamdouchi, A. and Willett, P., “Hierarchical Document Clustering Using Ward’s Method,” Proceedings of ACM Conference on Research and Development in Information Retrieval, 1986, pp. 149-156.
[FLG87] Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T., “The Vocabulary Problem in Human-System Communication,” Communications of the ACM, Vol. 30, No. 11, November 1987, pp. 964-971.
[GK79] Gustafson, E. E., Kessel. W. C., “Fuzzy Clustering with a Fuzzy Covariance Matrix,” IEEE CDC, San Diego, California, 1979, pp. 761-766.
[HKK99] Hoppner, F., Klawonn, F., Kruse, R., and Runkler, T., Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition. New York: John Wiley & Sons, 1999.
[H03] Huang, C., “Cluster-Based Query Expansion Technique,” Master Thesis, Department of Information Management, National Sun Yat-sen University, Taiwan, July 2003.
[JMF99] Jain, A. K., Murty, M. N., and Flynn, P.J., “Data Clustering: A Review,” ACM Computing Surveys, Vol. 31, No. 3, September 1999.
[JC94] Jing, Y. and Croft, W. B., “An Association Thesaurus for Information Retrieval”, Technical Report, Department of Computer Science, University of Massachusetts at Amherst, 1994.
[J71] Jones, S. K. Automatic Keyword Classification for Information Retrieval, London: Butterworth, 1971.
[JJ70] Jones, S. K., and Jackson, D. “The Use of Automatically-Obtained Keyword Classifications for Information Retrieval,” Information Processing and Management, Vol. 5, 1970, pp. 175-201.
[KR90] Kaufman, L., and Rousseeuw, P. J., Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
[K89] Kohonen, T., Self-Organization and Associative Memory. New York: Springer, 1989.
[K95] Kohonen, T., Self-Organizing Maps. New York : Springer, 1995.
[KCM00] Kraft, D. H., Chen, J., and Mikulcic, A., “Combining Fuzzy Clustering and Fuzzy Inferencing in Information Retrieval,” Fuzzy Systems, 2000. FUZZ IEEE 2000. Ninth IEEE International Conference, Volume 1 , May 7-10, 2000, pp. 375-380.
[LHK96] Lagus, K., Honkela, T., Kaski, S., and Kohonen, T., “Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration,” Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996.
[LA99] Larsen, B. and Aone, C., “Fast and Effective Text Mining Using Linear-Time Document Clustering,” Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 16-22.
[L69] Lesk, M. E., “Word-Word Association in Document Retrieval systems,” American Documentation, Vol. 20, No. 27, 1969.
[L92] Lewis, D. D., “Representation and Learning in Information Retrieval,” Ph.D. thesis, University of Massachusetts at Amherst, 1992.
[MS00] Mendes, M. E. S., and Sacks, L., “Assessment of the Performance of Fuzzy Cluster Analysis in the Classification of RFC Documents,” Proceedings of London Communications Symposium, September 14-15, 2000, London.
[MS03] Mendes, M. E. S., and Sacks, L., “Evaluating Fuzzy Clustering for Relevance-based Information Access,” Fuzzy Systems, 2003. FUZZ '03. 12th IEEE International Conference, Volume 1 , May 25-28, 2003, pp. 648-653.
[M80] McCarn, D., “MedLine: An Introduction to On-line Searching,” Journal of the American Society for Information Science, Vol. 31, No. 3, 1980, pp.181-192.
[M95] Miller, G. A., “WordNet: A Lexical Database for English,” Communications of the ACM, Vol. 38, No. 11, November 1995, pp. 39-41.
[MBF93] Miller, G. A., Beckwith, R., Felbaum, C., Gross, D., and Miller, K., “Introduction to WordNet: An On-line Lexical Database,” Revised Version. International Journal of Lexicography, Vol.3, No. 9, 1993.
[MWZ72] Minker, J., Wilson, G., and Zimmerman, B. “An Evaluation of Query Expansion by the Addition of Clustered Terms for a Document Retrieval System,” Information Storage and Retrieval, Vol. 8, 1972, pp. 329-348.
[MT01] Mylonopoulos, N. A., and Theoharakis, V., “On-Site: Global Perceptions of IS Journals,” Communications of the ACM, Vol. 44, No. 9, 2001, pp. 29-33.
[N01] National Library of Medicine, UMLS Knowledge Sources, 12th Experimental Edition, January 2001.
[QF93] Qiu, Y., and Frei, H. P., “Concept Based Query Expansion,” Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp. 160-169.
[RC99] Roussinov, D., and Chen, H., “Document Clustering for Electronic Meetings: An Experimental Comparison of Two Techniques,” Decision Support Systems, Vol. 27, No. 1-2, 1999, pp. 67-79.
[R97] Ruge, G., Combining Corpus Linguistics and Human Memory Models for Automatic Term Association, AI Group, Institut fuer Informatik, TU Muenchen. Natural Language Information Retrieval. Kluwer Academic Publishers, 1997.
[RPC01] Rui Pedro Chaves, “WordNet and Automated Text Summerization,” Computation of Lexical and Grammatical Knowledge Research Group, Centro de Linguística da Universidade de Lisboa, 2001.
[SB88] Salton, G., Buckly, C., “Term Weighting Approach in Automatic Text Retrieval,” Information Processing and Management, Vol. 24, no. 5, 1988, pp. 513—523.
[SB90] Salton, G. and Buckley, C., “Improving the Retrieval Performance by Relevance Feedback,” Journal of American Society for Information Sciences, Vol. 41, 1990, pp.288-197.
[V86] Voorhees, E. M., “Implementing Agglomerative Hierarchical Clustering Algorithms for Use in Document Retrieval,” Information Processing and Management, Vol. 22, 1986, pp. 465-476.
[V93] Voutilainen, A., “Nptool: A Detector of English Noun Phrases,” Proceedings of Workshop on Very Large Corpora, Ohio, June 1993.
[WBO00] Wei, J., Bressan, S., and Ooi, B. C., “Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results,” Proceedings of the First International Conference on Web Information Systems Engineering, 2000, pp. 366-373.
[X97] Xu, J., “Solving the Word Mismatch Problem Through Automatic Text Analysis,” Ph.D. Thesis, University of Massachusetts at Amherst, 1997.
[XC96] Xu, J., and Croft, W. B., “Query Expansion Using Local and Global Document Analysis,” Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp. 4-11.
[YL03] Yang, C. C., and Luk, J., “Automatic Generation of English/Chinese Thesaurus Based on a Parallel Corpus in Laws,” Journal of the American Society for Information Science and Technology, Vol. 54, No. 7, 2003.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code