Responsive image
博碩士論文 etd-0806101-111117 詳細資訊
Title page for etd-0806101-111117
論文名稱
Title
次序性文件探勘:事件演化關連之研究
Discovery of Evolution Patterns from Sequences of Documents
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
68
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2001-07-19
繳交日期
Date of Submission
2001-08-06
關鍵字
Keywords
以屬性為基礎的演化關連、文件分群、屬性粹取、屬性選擇、文字探勘、經常性時間關連
Feature Extraction, Feature Selection, Document Clustering, Frequent Temporal Patterns, Feature-Based Evolution Patterns, Text Mining
統計
Statistics
本論文已被瀏覽 5726 次,被下載 2899
The thesis/dissertation has been browsed 5726 times, has been downloaded 2899 times.
中文摘要
因為文字資料庫的內容和數量愈來愈多,使得文字挖掘成為在知識探索領域中快速成長的一個應用。過去相關的文字資料探勘技術的研究,主要都著重在找出文件本身內所含有的樣板(patterns),如文件分類、文件分群、查詢展開(query expansion)和事件追蹤等技術,而找出文件與文件之間關連的相關研究則非常缺乏。本研究的目的在於從次序性的文件中,找出文件與文件之間的關連,我們將這種關連稱作事件演化關連(evolution patterns),並提出事件演化關聯之探勘技術。透過發掘出的事件演化關連,我們可用來支援環境掃描(environmental scanning)和知識管理 (knowledge management),並且可用來輔助文件管理(document management)和擷取(document retrieval)的技術,譬如可用來輔助事件追蹤。
Abstract
Due to the ever-increasing volume of textual documents, text mining is a rapidly growing application of knowledge discovery in databases. Past text mining techniques predominately concentrated on discovering intra-document patterns from textual documents, such as text categorization, document clustering, query expansion, and event tracking. Mining inter-document patterns from textual documents has been largely ignored in the literature. This research focuses on discovering inter-document patterns, called evolution patterns, from document-sequences and proposed the evolution pattern discovery (EPD) technique for mining evolution patterns from a set of ordered sequences of documents. The discovery of evolution patterns can be applied in such domains as environmental scanning and knowledge management, and can be used to facilitate existing document management and retrieval techniques (e.g., event tracking).
目次 Table of Contents
Chapter 1. Introduction....................................................1
1.1 Background.............................................................1
1.2 Research Motivation....................................................2
1.3 Research Objective.....................................................5
1.4 Organization of the Thesis.............................................5

Chapter 2. Problem Analysis and Definition.................................7
2.1 Characteristics of Document-sequences..................................7
2.2 Definition of Evolution Patterns......................................10
2.3 Requirements for Mining Feature-based Evolution Patterns..............13

Chapter 3. Literature Review..............................................18
3.1 Temporal Patterns Discovery Algorithm.................................18
3.2 Document Clustering...................................................22
3.2.1 Preprocessing Phase.................................................23
3.2.2 Document Representation Phase.......................................24
3.2.3 Clustering Phase....................................................24

Chapter 4. Evolution Pattern Discovery Technique..........................27
4.1 Process of Evolution Patterns Discovery (EPD) Technique...............27
4.2 Example...............................................................32
4.3 Algorithm of Evolution Patterns Discovery (EPD) Technique.............36

Chapter 5. Empirical Evaluation...........................................37
5.1 Evaluation Application: Event Tracking................................37
5.1.1 Traditional Event Tracking Technique................................37
5.1.2 Event Tracking Techniques Supported by Evolution Patterns...........39
5.2 Parameter Tuning Experiments..........................................44
5.2.1 Data Set............................................................45
5.2.2 Evaluation Criteria.................................................46
5.2.3 Parameter Tuning Procedure..........................................49
5.2.4 Evaluation Results for Parameters Tuning Experiments................49
5.3 Empirical Evaluation for Supporting Event Tracking....................56
5.3.1 Data Set............................................................56
5.3.2 Evaluation Criterion................................................57
5.3.3 Evaluation Procedure................................................58
5.3.4 Comparative Evaluation..............................................59

Chapter 6. Conclusions and Future Research Directions.....................63

References................................................................65
參考文獻 References
[A67] Aguilar, F. J., Scanning the Business Environment, Macmillan Publisher, New York, 1967.
[ADW94] Apte, C., Damerau, F. and Weiss, S., “Automated Learning of Decision Rules for Text Categorization,” ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp.233-251.
[APL98] Allan, J., Papka, R. and Lavrenko, V., “On-line New Event Detection and Tracking,” 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM press, New York, 1998, pp.37-45.
[AS94] Agrawa, R., and Strikant, R. “Fast Algorithms for Mining Association Rules,” Proceedings of 1994 International Conference on Very Large Daata Bases, Santiago, Chile, Sep, 1994, pp. 487-499.
[AS95] Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” Proceedings of International Conference on Data Engineering, Taipei, Taiwan, March 1995, pp.3-14.
[B92] Brill E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the Third Conference on Applied Natural Language Processing, ACL, Trento, Italy, 1992.
[B94] Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,” Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994.
[BS97] Berson A. and Smith S. J., Data Warehousing, Data Mining & OLAP, McGraw-Hill, Inc., 1997.
[C99] Choo, C. W., “The Art of Scanning the Environment,” Bulletin of the American Society for Information Science, 1999, pp.21-24.
[CH99] Chang C. H., and Hsu C. C., “Enabling Concept-Based Relevance Feedback for Information Retrieval the WWW,” IEEE Transactions on Knowledge and Data Engineering, 1999.
[DPH98] Dumais, S., Platt, J., Heckerman, D., and Sahami, M., “Inductive Learning Algorithms and Representations for Text Categorization,” Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management (CIKM ’98), 1998, pp.148-155.
[H81] Hambrick, D. C., “Specialization of Environmental Scanning Activities Among Upper Level Executives,” Journal of Management Studies, Vol. 18, 1981, pp.299-320.
[JL92] Jennings, D. and Lumpkin, J., “Insights Between Environmental Scanning Activities and Porter’s Generic Strategies: An Empirical Analysis,” Journal of Management, Vol.18, No. 4, 1992, pp.791-803.
[LR94] Lewis, D. and Ringuette, M., “A Comparison of Two Learning Algorithms for Text Categorization,” Proceedings of Symposium on Document Analysis and Information Retrieval, 1994.
[NGL97] Ng, H. T., Goh, W. B., and Low, K. L., “Feature Selection, Perceptron Learning, and A Usability Case Study for Text Categorization,” Proceedings of Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘97), 1997, pp.67-73.
[PFL00] Popescul, A., Flake, G. W., Lawrence, S., Ungar, L. H. and Giles C. L., “Clustering and Identifying Temporal Trends in Document Databases,” IEEE, 2000.
[SA96] Srikant, R. and Agrawal, R., “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proceedings of the 5th International Conference on Extending Database Technology (EDBT), Avignon, France, March 1996.
[SHP95] Schutze, H., Hull, D. A. and Pedersen, J. O., “A Comparison of Classifiers and Document Representations for the Routing Problem,” Proceedings of 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995.
[V93] Voutilainen, A., “Nptool: A Detector of English Noun Phrases,” Proceedings of Workshop on Very Large Corpora, Ohio, June 1993.
[V94] Voorhees E. M., “Query Expansion Using Lexical-Semantic Relations,” Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 1994, pp. 61-69.
[WD01] Wei C. and Dong Y. X., "A Mining-based Category Evolution Approach to Managing Online Document Categories," Proceedings of 34th Hawaii International Conference on System Sciences, Maui, Hawaii, Jan. 2001.
[WHY00] Wei C., Hwang S., and Yang W., “Mining Frequent Temporal Patterns in Process Databases,” Proceedings of Workshop on Information Technologies & Systems, December 9-10, 2000, Brisbane, Australia.
[WLH00] Wei C., Lee Y., and Hsu C., “Empirical Comparison of Fast Clustering Algorithms for Large Data Sets,” Proceedings of 33rd Hawaii International Conference on System Sciences, January 2000.
[YC94] Yang, Y. and Chute, C. G., “An Example-Based Mapping Method for Text Categorization and Retrieval,” ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp.252-277.
[YCB99] Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., and Liu, X., “Learning Approaches for Detecting and Tracking News Events,” IEEE Intelligent Systems and Their Applications, 1999, pp.32-43.
[YL99] Yang, Y., and Liu, X., “A Re-examination of Text Categorization methods,” Proceedings of SIGIR ’99: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp.42-49.
[YPC98] Yang, Y., Pierce, T. and Carbonell, J., “A Study on Retrospective and Inline Event Detection,” Proceedings of SIGIR ’98: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, 1998, pp.28-36.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code