國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,網際網路上資訊涵意探究與資訊變化追蹤之研究 ,The Study of Information Concepts Extracting and Change Detecting over the Internet

論文名稱 Title	網際網路上資訊涵意探究與資訊變化追蹤之研究 The Study of Information Concepts Extracting and Change Detecting over the Internet
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	91 學年度第 1 學期 The fall semester of Academic Year 91	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	70
研究生 Author	賴志明 Chi-Ming Lai
指導教授 Advisor	張德民 Te-Min Chang
召集委員 Convenor	鄭炳強 Bing-Chiang Jeng
口試委員 Advisory Committee	劉賓陽 Bin-Yang Liu
口試日期 Date of Exam	2002-07-16	繳交日期 Date of Submission	2003-01-23
關鍵字 Keywords	資訊涵意擷取、資訊變化追蹤、關鍵字擷取、網際網路 Keyword Extracting., Internet, Information Concepts Extraction, Change Detection
統計 Statistics	本論文已被瀏覽 5794 次，被下載 3191 次 The thesis/dissertation has been browsed 5794 times, has been downloaded 3191 times.

中文摘要
網際網路的的蓬勃發展使得它成為一個重要的資訊來源。目前當使用者想從網際網路上得到某一特定主題的相關資訊時，最常使用的工具是搜尋引擎。然而使用者透過搜尋引擎所得的結果往往是龐大且雜亂的，使用者無法很容易的了解這些資訊中所包含的涵意。另外由於網際網路上的資料更新速度也相當迅速，這些資訊也會持續的變化，使用者不容易去追蹤這些變化及找出變化隱含的意義。本文提出了一個方法用以協助使用者分析搜尋資訊結果並進一步分析其涵意，同時追蹤這些資訊變化的意義。本文所提方法可分為二個部分，第一個部分是分析針對某一特定主題所搜尋出的資訊涵意，以關鍵字擷取方法RCBKE，找出可以代表資訊中主題的關鍵字及其關係。第二個部份則蒐集在一段時間內特定主題的資訊並分析其資訊變化的涵意及趨勢。二個部分皆以實驗評估以驗證所提方法之適用性。
Abstract
Information acquisition over the Internet has become popular recently. Users, however, have difficulty in understanding the overall concept resulting from the searched information about a specific topic of their interests in the Internet. Moreover, such pieces of information keep changing over time. Therefore, in this thesis, an approach is proposed to help users further realize the searched results of their interested topic, and detect implications of the information changes over time. The first part of this approach is to gather information of a user-specified topic and analyze the overall meaning and the relations represented by those pieces of information. In this manner, users can gain the general concept of what the search results indicate. Here the keyword extraction approach, called RCBKE, is proposed to identify keywords with their relationships. Evaluations are performed and the results show that RCBKE can discover representative keywords. The second part is to track and investigate the information change of the topic in a certain time period. As a result, users can easily recognize the change patterns of the specified topic. An example to illustrate our approach is shown accordingly. The feasibility of our proposed approach is then justified.

目次 Table of Contents
CHAPTER 1. INTRODUCTION.................1 1.1 Overview 1.2 Objective of this research.........................2 1.3 Organization of the thesis........................3 CHAPTER 2. LITERATURE REVIEW...............4 2.1 Information retrieval..........................4 2.1.1 Boolean retrieval........................4 2.1.2 Vector-space model.......................5 2.1.3 Latent semantic indexing.....................5 2.2 Automatic keyword extraction.......................6 2.2.1 Frequency-based methods.....................6 2.2.2 Natural language analysis....................7 2.2.3 Learning-based methods.....................7 2.2.4 Lexical cohesion approach....................8 2.3 Clustering analysis...........................10 2.3.1 Clustering techniques......................10 2.3.2 Clustering on relational data...................12 2.4 Current information change tracking systems................13 CHAPTER 3. CLUSTER-BASED KEYWORD EXTRACTION....16 3.1 The RCBKE approach.........................16 3.2 Experiments and Results........................23 3.2.1 Example I..........................23 3.2.2 Experiment II.........................26 3.3 Summary..............................29 CHAPTER 4. PROPOSED APPROACH AND ITS APPLICATIONS..30 4.1 The proposed approach.........................30 4.2 An illustrated application........................32 4.3 Summary..............................40 CHAPTER 5. CONCLUSIONS..................42 5.1 Concluding Remarks..........................42 5.2 Future work.............................43 REFERENCES.........................44 Appendix A. Detailed results of experiment II.................48 Appendix B. Summary of extracted keywords in change detecting experiment....54 Appendix C. Summary of new concepts....................67

參考文獻 References
Anderberg, M., “Cluster Analysis for Application,” New York, Academic Press, 1973. Ario, A., “Bounds on the complexity of the longest common subsequenee problem,” Journal of the ACM, Vol. 23, No. 1, 1976 Bhattacherjee, A. “Acceptance of e-commerce services: the case of electronic brokerages,” IEEE Transactions On Systems, Man and Cybernetics, Part A, Vol. 30, Iss. 4, 2000, pp. 411-420. Brin, S. and Page, L., “The anatomy of a large-scale hypertextual web search engine,” The Seventh International WWW Conference, 1998. Carpineto, C. and Romano, G., “Effective reformulation of boolean queries with concept lattices,” Datalogiske Skrifter, Issue. 78, 1998. Cliff P., “Searching the Web using a 3-D model”, Webnet Journal, Vol.1, No.2, 1999. Chakrabarti S., Dom, B. E., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A., “Spectral filtering for resource discovery,” SIGIR, 1998 Workshop on Hypertext IR for the Web, Melbourne, Australia, 1998. Croft, W. B. and D.J. Harper. “Using probabilistic models of document retrieval without relevance information,” Journal of Documentation, 1979. Chakrabarti, S., Van den Berg, M. and Dom, B., “Focused crawling: A new approach to topic specific resource discovery,” The Eighth World Wide Web conference, Toronto, Canada, 1999. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R., “Indexing by latent semantic indexing,” Journal of the American Society for Information Science, Vol. 41, No. 6, 1990. Dumais, S. T., “Latent semantic indexing (LSI) and TREC-2,” Proceedings of Text Retrieval Conference, 1994. Douglis, F., et al. “The AT&T Internet difference engine: Tracking and viewing changes on the Web,” World Wide Web, Vol. 1 No. 1, 1998. Fayyad, U. M. “Data mining and knowledge discovery: making sense out of data.” IEEE Expert, October, 1996, pages 20-25. Fellbaum C., WordNet: An Electronic Lexical Database, MIT Press, 1998. Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., and Nevill-Manning, C. G., “Domain-specific keyphrase extraction,” Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, California, 1999. Gladwell, M., The tipping point: How little things can make a big difference, Little Brown & Company, 2000. Guha, S., Rastogi, R., Shim, K., “CURE: An Efficient Clustering Algorithm for Large Databases,” SIGMOD Conference, 1998, pp.73-84. Hathaway, R. J. and Bezdek, J. C., “An Iterative procedure for minimizing a generalized sum-of-squared-errors clustering criterion”, Neural, Parallel & Scientific Computations Vol. 2, 1994, pp. 1-16. Halliday, M. A. K. and Hansan, R., Cohesion in English, Longman, 1976. Hirschberg, D., “Algorithms for the longest common subsequence problem,” Journal of the ACM, Vol. 24, No. 4, 1997. Kaufman, L. and Rousseeuw, P. J., Clustering by means of Medoids, in Statistical Data Analysis Based on the L1-Norm and Related Methods, Amsterdam, North-Holland Publishing Company, 1987. Kaufman, L., and Rousseeuw. P. J., Finding Groups in Data, New York, Wiley, 1990. Krulwich, B., and Burkey, C., “Learning user information interests through the extraction of semantically significant phrases,” AAAI 1996 Spring Symposium on Machine Learning in Information Access, California, 1996. Keen, E. M., “The use of term position devices in ranked output experiments,” Journal of Documentation, Vol. 47, No. 1, 1991. Lowrance, R. and Wagner, R. A., “An extension of the string-to-string correction problem,” Journal of the ACM, Vol. 22, No. 2, 1975. Luhn, H. P., “A statistical approach to the mechanized,” IBM J. Research and Development, Vol. 1, No. 4, 1957. MacQueen, J. B., “Some Methods for Classification and Analysis of Multivariate Observation”, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297. Meeker N., The Internet Advertising Report, Morgan Stanley Corp., 1997 Morris, J. and Hirst, G., “Lexical cohesion computed by thesaural relations as indicator of the structure of text,” Computational Linguistics, Vol. 17, No. 1, 1991. Munoz, A., “Compound key word generation from document databases using a hierarchical clustering ART model, ” Intelligent Data Analysis, Vol. 1 No.1 , Amsterdam, Elsevier, 1996 Ohsawa, Y., Benson, N. E. and Yachida, M., “KeyGraph: Auromatic indexing by co-occurrence graph based on building construction metaphor,” Proceedings of Advanced Digital Library Conference, 1998. Phillips, M., Aspects of text structure: an investigation of the lexical organization of text, Amsterdam, North-Holland Publishing Company, 1985. Robertson,bS. E. and Jones, K. S., “Relevance weighting of search terms,” Journal of the American Society for Information Science, Vol. 27, 1976. Saeyor, S. and Ishizuka, M., “WebBeholder: A Revolution in tracking and viewing changes on the web by agent community,” Proceedings of WebNet98, 3rd World Conference on WWW and Internet, Orlando, Florida, 1998. Saitou, N. and Nei, M. “The neighbor-joining method: a new method for reconstructing phylogenetic trees,” Molecular Biology and Evolution, Vol. 4, No. 4, 1987, pp. 406-425. Salton, G., Introduction to Modern Information Retrieval, McGraw-Hill, 1983. Salton, G. and Buckley, C., “Term weighting approaches in automatic text retrieval,” Information Processing and Management, Vol. 14 No. 5, 1988. Salton, G., Wong, A., and Yang, C. S., “A vector space model for automatic indexing,” Communications of the ACM. Vol. 18, 1975. Salton, G. and Yang, C. S., “On the specification of term values in automatic indexing,” J. Documentation, Vol. 29, No. 4, 1973. Skorochod'ko, E. F., “Adaptive method of automatic abstracting and indexing,” Information Processing 71: Proceedings of the IFIP Congress 71, Amsterdam, North-Holland Publishing Company, 1972. Sleator, D. and Temperley, D., “Parsing English with a Link Grammar,” Third International Workshop on Parsing Technologies, 1993. Sneath, P. H. A. and Sokal, R. R., Numerical Taxonomy-The Principles and Practice of Numerical Classification, San Francisco, W. H. Freeman, 1993. Steier, A. M., and Belew, R. K., “Exporting phrases: A statistical analysis of topical language,” Second Symposium on Document Analysis and Information Retrieval, 1993 Swaminathan, K., “Tsu: A domain-independent approach to information extraction from natural language documents,” DARPA workshop on document managementm Palo Alto, 1993. Turney, P., “Extraction of Keyphrases from Text: Evaluation of Four Algorithms,” Technical Report ERB-1051, National Research Council, Institute for Information Technology, 1997. Turney, P., “Learning to Extract Keyphrases from Text,” Technical Report ERB-1057, National Research Council, Institute for Information Technology, 1999. Van Rijsbergen, C. J., Information Retrieval, 2nd Edition, Boston, Butterworths, 1979. http://informant.dartmouth.edu/ http://www.netmind.com/

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內外都一年後公開 withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0123103-202444.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS