Responsive image
博碩士論文 etd-0725104-204936 詳細資訊
Title page for etd-0725104-204936
論文名稱
Title
網際網路搜尋資訊之涵意探究及其變化偵測
Concepts Extraction and Change Detection from Navigated Information over the Internet
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
63
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2004-07-20
繳交日期
Date of Submission
2004-07-25
關鍵字
Keywords
擴散催化理論、涵意變化追蹤、涵意擷取、網際網路
Spreading Activation Theory., Concept Change Detection, Concepts Extraction, Internet
統計
Statistics
本論文已被瀏覽 5852 次,被下載 2631
The thesis/dissertation has been browsed 5852 times, has been downloaded 2631 times.
中文摘要
網際網路的出現使得全球資訊間的溝通變的更加容易。網際網路讓世界間的資訊能夠相連互通,使用者可以透過網際網路之搜尋引擎來查詢所需要的資訊。雖然搜尋引擎可以幫助使用者收集資訊,但是使用者無法從這些大量結果中整理出其所包含的涵意。另外,網際網路的資訊也會隨著時間增加而變化,使得使用者更不容易去追蹤主題涵意的變化及其意義。因此本研究提出一個二階段漸進式的方法針對使用者有興趣的主題,搜尋相關的資訊,找出可代表此主題的概念結構圖;並利用擴散催化理論近一步隨時間偵測概念的變化並找出其變化的意義。
本研究接著進行實驗以驗證所提方法之適用性。實驗一是評估所提方法第一階段的輸出結果,經專業專家驗證所得結果有很高的精確度與回覆率。實驗二是評估所提方法追蹤的涵意變化結果,經專業專家驗證所得結果亦有很高的同意率。這些實驗都說明所提方法在實際案例的實用性。因此,藉由本研究方法的幫助,使用者可以容易地瞭解他們有興趣的主題內容涵意,並知悉這些涵意隨時間的變化。
Abstract
The emergence of the Internet has made the global information communications much easier than before. Users can navigate the desired information over the Internet by means of search engines. Even though search engine can help users search specified topic in a primary way, users usually cannot gain the overall idea of what the entire navigated results mean. In addition, information over the Internet keeps changing. Users cannot even keep track of the changes, let alone to comprehend the meanings of such changes. Consequently, this research proposes a two-stage incremental approach to figuring out the concept structure that represents the main concepts of the search results in the first stage, and keeping track of the concept changes with time based on spreading activation theory to assist users in the second stage.
Experiments are conducted to examine the feasibility of our proposed approach. The first experiment is to evaluate the results from the first stage. It shows that the performance on recall and precision is quite satisfactory based on human experts’ results. The second experiment is to examine the changing results from the entire proposed approach. It shows that high degree of agreement with our results is achieved from domain experts. Both experiments justify the feasibility of our proposed approach in real applications. That is, applying our proposed approach, users can easily focus on the topic they are interested in and learn its trend with great support.
Keywords: Internet, Concepts Extraction, Concept Change Detection, Spreading Activation Theory.
目次 Table of Contents
CHAPTER 1 INTRODUCTION 1
1.1 OVERVIEW 1
1.2 OBJECTIVE OF THE RESEARCH 2
1.3 ORGANIZATION OF THE THESIS 3
CHAPTER 2 LITERATURE REVIEW 4
2.1 INFORMATION RETRIEVAL 4
(1) Boolean retrieval model 4
(2) Vector space model 5
(3) Probabilistic model 5
2.2 TEXT MINING 6
(1) Text Categorization 7
(2) Document Clustering 7
(3) CONCEPT EXTRACTION 8
2.3 CLUSTERING TECHNIQUES 9
2.4 SPREADING ACTIVATION THEORY 13
CHAPTER 3 PROPOSED APPROACH 16
3.1 CONCEPT EXTRACTION STAGE 16
Step 1: Preprocessing documents 16
Step 2: Establishing the co-occurrence graph 20
Step 3: Calculating co-occurrence frequency 20
Step 4: Clustering features 21
Step 5: Extracting concepts 22
3.2 CONCEPT CHANGE DETECTION STAGE 23
Step 1: Collecting features over time 24
Step 2: Changing the activation strengths of links 25
Step 3: Analyzing the change of clusters 27
Step 4: Detecting concept changes 29
CHAPTER 4 EXPERIMENTS AND RESULTS 31
4.1 EXPERIMENT I ON CONCEPTS EXTRACTION PERFORMANCE 31
4.2 EXPERIMENT II ON THE OVERALL PERFORMANCE 38
4.2.1 Experiment II.1 38
4.2.2 Experiment II.2 42
CHAPTER 5 CONCLUSIONS 46
5.1 CONCLUDING REMARKS 46
5.2 FUTURE WORK 47
REFERENCE 48
APPENDIX A THE EXPERIMENT RESULTS OF CONCEPT EXTRACTION 48
APPENDIX B PREDEFINED KEYWORDS BY EXPERTS 48
APPENDIX C DETAIL INFORMATION OF INFORMATION CHANGE 48
APPENDIX D SUMMARY OF RESULT CONCEPTS 48
APPENDIX E DETAIL INFORMATION OF CONCEPT CHANGE DETECTION 48
APPENDIX F SUMMARY OF RESULT CONCEPTS 48
參考文獻 References
(1) 賴志民,網際網路上資訊涵意探究與資訊變化追蹤之研究,中山大學資訊管理研究所碩士論文,民91
(2) 葉飛, 線上問答集輔助建立之研究,中山大學資訊管理研究所碩士論文,民92
(3) Anderberg. M. R. , Cluster Analysis for Application. Academic Press, Inc., 1973.
(4) Anderson, J. R. “A spreading activation theory of memory,” Journal of Verbal Learning and Verbal Behavior 22, 1983, pp.261–295.
(5) Anderson, J. R. and Pirolli P. L., “Spread of Activation,” Journal of Experimental Psychology: Learning, Memory, and Cognition, 1984, Vol. 10 . No. 4. pp.261–295.
(6) Apt'e, C., Damerau, F. and Weiss, S., ”Automated Learning of Decision Rules for Text categorization,” ACM Transaction on Information System, Vol.12, No.3, 1994, pp. 233-251
(7) Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval, Addison Weseley, 1999.
(8) G Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J., “Partitioning-based Clustering for Web Document Categorization,” Decision Support Systems, Vol. 27, No. 3, 1999, pp.329-341.
(9) Carpineto, C., Romano, G., “Effective reformulation of Boolean queries with concept lattices,” Datalogiske Skrifter, Issue.78, 1998.
(10) Chang, T. M., and Lai, C. M., "Cluster-based Keyword Extraction Approach," Proceedings of The 6th Pacific Asia Conference on Information Systems, Tokyo, Japan, September 2002.
(11) Chih-Ping Wei, Selwyn Piramuthu, and Michael J. Shaw, ”Knowledge Discovery and Data mining .”
(12) Chircu, A.M. and R.J. Kauffman, "Reintermediation Strategies in Business-to-Business Electronic Commerce," International Journal of Electronic Commerce, vol. 4, no. 4, 2000, pp. 7-42
(13) Collins, A.M and Loftus, E. F,”A spreading activation theory of semantic processing,” Psychological Review, 82, 1975, pp.407-425.
(14) Crestani, F.,” Application spreading activation techniques in information retrieval,” Artificial Intelligent Review,11(6),1995, pp.453-482.
(15) Croft, W. B. and D.J. Harper., “Using probabilistic models of document retrieval without relevance information,” Journal of Documentation, 1979.
(16) Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R., “Indexing by latent semantic indexing,” Journal of the American Society for Information Science, Vol. 41, No. 6, 1990.
(17) Dumais, S. T., “Latent semantic indexing (LSI) and TREC-2,” Proceedings of Text Retrieval Conference, 1994.
(18) Dumais, S., Platt, J., Heckerman, D., and Sahami, M., “Inductive Learning Algorithms and Representation for Text Categorization,” Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management (CIKM '98), 1998, pp.148-155.
(19) Elizabeth D., Liddy., ”Text mining,” Bulletin of the American Society for Information Science, Vol. 27 , N0.1, 2000.
(20) Guha, S., Rastogi, R., Shim, K., “CURE: An Efficient Clustering Algorithm for Large Databases,” SIGMOD Conference, 1998, pp.73-84.
(21) Halliday, M. A. K. and Hansan, R., Cohesion in English, Longman, 1976.
(22) Hathaway, R.J., Bezdek, J.C., “An Iterative procedure for minimizing a generalized sum-of-squared-errors clustering criterion”, Neural, Parallel & Scientific Computations Vol. 2, 1994, pp. 1-16.
(23) Hathaway, R.J., Bezdek, J.C., Davenport, J.W., “On relational data versions of c-means algorithms,” Pattern Recognition Lett. 17, 1996, pp. 607-612.
(24) Huberman, B.A. and T. Hogg, “Phase transitions in artificial intelligence systems,” Artificial Intelligences, 33 (1987) , pp. 155-171
(25) Huffman, S., Learning information extraction patterns from examples. In IJCAI 1995 Workshop on New Approaches to Learning for Natural Language Processing, 1995, pp.127-142.
(26) Jain, A. K., Murt, M. N., P.J. Flynn., “Data Clustering: A Review,” ACM Computing Surveys, Vol.41, No.3, September 1999, pp.264-323
(27) Jardine, N., Sibson, R., Mathematical Taxonomy. London: Wiley, 1971.
(28) Jiawei , H. . Micheline, K ., Data mining: Concepts and Techniques, 2001.
(29) Karen Sparck Jones, “Automatic summarizing: factors and directions,” In Inderjeet Mani and Mark T. Maybury, editors, ADVANCES IN AUTOMATIC TEXT SUMMARIZATION, The MIT Press, 1999, pp. 1-12.
(30) Kaufman, L. and Rousseeuw, P. J., Clustering by means of Medoids, in Statistical Data Analysis Based on the L1-Norm and Related Methods, Amsterdam, North-Holland Publishing Company, 1987.
(31) Kaufman, L. and Rousseeuw, P.J., “Finding Groups in Data: An Introduction to Cluster Analysis,” New York: John Wiley & Sons, 1990.
(32) King, B., “Step-wise clustering procedures,” J.Am. Stat. Assoc.69, 1967, pp.86-101.
(33) Kucera, H., and Francis, W. N., 1967, “Computational Analysis of Present-Day American English.” Providence, Rhode Island: Brown University Press.
(34) Larsen, B. and Aone, C., “Fast and Effective Text Mining Using Linear-time Document Clustering,” Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp.16-22.
(35) MacQueen, J., ”Some Methods of Classification and Analysis of Multivariate Observations,” Proc. 5th Berkeley Symp. Math. Statist, Prob., 1,1967, pp.281-297.
(36) Matsumura, M., Ohsawa, Y., Ishizuka, M., “PAI: Automatic Indexing for Extracting Asserted Keywords from a Document,” AAAI Fall Symposium on Chance Discovery, 2002.
(37) Matthew KO, Lee and Efraim Turban, "A Trust Model for Consumer Internet Shopping", International Journal of Electronic Commerce, 6(1), Fall 2001, pp 75-91.
(38) Morris, J. and Hirst, G., “Lexical cohesion computed by thesaural relations as indicator of the structure of text,” Computational Linguistics, Vol. 17, No. 1, 1991.
(39) Nasukawa T. and Nagano, T., ”Text analysis and knowledge mining systems,” IBM System Journal, Vol. 40 , No. 4 ,2001, pp.967-984.
(40) Ng , R., Han, J., ”Efficient and Effective Clustering Method for Spatial Data Mining,” In Proc.1994 Int. Conf. Very Large Data Nases (VLDB’94), 1994, p.p144-155.
(41) Ohsawa, Y., Benson, N. E. and Yachida, M., “KeyGraph: Automatic indexing by co-occurrence graph based on building construction metaphor,” Proceedings of Advanced Digital Library Conference, 1998.
(42) Ohsawa, Y., “The Scope of Chance Discovery,” New Frontiers in Artificial Intelligence: Joint JSAI 2001 Workshop Post-Proceedings, 2001.
(43) Okazaki, N., Matsuo, Y., Matsumura, N., Ishizuka, M., “Sentence Extraction by Spreading Activation with Refined Similarity Measure,” Proc. 16th Int'l FLAIRS Conf., 2003, pp. 407-411.
(44) Peter Weill, Michael Vitale: “ What It Infrastructure Capabilities Are Needed To Implement E-Business Models?“ MIS Quarterly Executive Vol. 1 No. 1 / March 2002. pp.: 17-34.
(45) Paice, 1991 ,”A Thesaural model of information retrieval,” Information processing and management, 27(5), 1991, pp.433-447.
(46) Pirolli, P., Pitkiw, J., Rao, R., ”Silk from a sow’s ear: Extracting usable structures from the web,” In proceeding of Chi, 1996, pp.118-125.
(47) Ricardo B. Y. and. Berthier R. N., Modern Information Retrieval , Addison-Wesley Longman, 1999.
(48) Robert R. Korfhage, Information Storage and retrieval, John Wiley & Sons, Inc.,1997.
(49) Robertson, S. E., Jones, K. S., “Relevance weighting of search terms,” Journal of the American Society for Information Sciences, 27(3), 1976, pp.129-146.
(50) Roussinov, D, G. and Chen, H., “Document Clustering for Electronic Meetings: An Experimental Comparison of Two Techniques,” Decision Support Systems, Volume 27, Number 1-2, Pages 67-80, November 1999.
(51) Ruge, G., “Combining Corpus Linguistics and Human Memory Models for Automatic Term Association,” AI Group, Institut fuer Informatik, TU Muenchen. Natural Language Information Retrieval, Kluwer Academic Publishers, 1997.
(52) Rumelhat, D., & McClelland, J. (1986). Parallel Distributed Processing. Cambridge, MA.: MIT Press.
(53) Rumelhart, D., Norman, D., ”Representation in memory,” Technical report, Department of Psychology and Institute of Cognitive Science, UC, 1983.
(54) Saitou, N. and M. Nei, “The neighbor-joining method: A new method for reconstructing phylogenetic trees,” Molecular Biology and Evolution 4, 1987, pp. 406-425.
(55) Salton, G. and Buckley, C., “Term weighting approaches in automatic text retrieval,” Information Processing and Management, Vol. 14 No. 5, 1988.
(56) Salton, G., Wong, A., and Yang, C. S., “A vector space model for automatic indexing,” Communications of the ACM. Vol.18, 1975.
(57) Sneath, P. H. A., Sokal, R. R. Numerical Taxonomy., Freeman, London, UK, 1973.
(58) Tan, A.H., “Text Mining: The State of the art and challenges,” Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data mining(PAKDD’99), Beijing, 1999, pp.65-70 ,
(59) Van Rijsbergen, C.J., Information retrieval, 2d ed. London: Butterworths, 1979.
(60) Voorhees, E. M. 1985, “The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval,” Ph.D. Thesis, Cornell University .
(61) Ward, J. H. Jr., “Hierarchical Grouping to optimize an Objective Function.,“ Journal of American Statistical Association, Vol.69, 1963, pp.236-244.
(62) Wei, J., Bressan, S., and Ooi, B. C., “Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results,” Proceedings of the First International Conference on Web Information Systems Engineering, 2000, pp.366-373.
(63) Yang Y., Pederson J.P., “a comparative study on feature selection in text categorization,” Proceeding of the Fourteenth International Conference on Machine Learning (ICML’97), 1997, pp.412-420.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code