Responsive image
博碩士論文 etd-0907105-161646 詳細資訊
Title page for etd-0907105-161646
論文名稱
Title
由文件序列中萃取事件階段之研究─導入時間概念為基礎之技術
Event Episode Discovery from Document Sequences: A Temporal-based Approach
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
83
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2005-07-28
繳交日期
Date of Submission
2005-09-07
關鍵字
Keywords
事件演化、知識管理、文字探勘、文件分群、事件偵測與追蹤
Document Clustering, Knowledge Management, Text Mining, Event Tracking, Event Evolution, Evolution Patterns
統計
Statistics
本論文已被瀏覽 5713 次,被下載 2028
The thesis/dissertation has been browsed 5713 times, has been downloaded 2028 times.
中文摘要
像是新聞文件、個人郵件、 blog, 或是某個產品的客戶服務紀錄,隨時間一天天累積,這樣子具有時間順序性的文件,無所不在。一個這樣子的文件,描述著某個事件主題;事件是發生在某個時間地點,例如倫敦地鐵爆炸案、或是針對某種特定目的,例如討論清大交大整併案的會議紀錄。事件隨著時間的演進而有階段性的發展,本文主旨在這麼大量隨時間累積的文件當中,找到這樣子事件階段的結構。傳統技術用文件分群,而本文導入了時間的概念,提出一個temporal-IDF並且在實證研究中証實比傳統技術能得到更好的分群效果。
Abstract
Recent advances in information and networking technologies have contributed significantly to global connectivity and greatly facilitated and fostered information creation, distribution, and access. The resultant ever-increasing volume of online textual documents creates an urgent need for new text mining techniques that can intelligently and automatically extract implicit and potentially useful knowledge from these documents for decision support. This research focuses on identifying and discovering event episodes together with their temporal relationships that occur frequently (referred to as evolution patterns in this study) in sequences of documents. The discovery of such evolution patterns can be applied in such domains as knowledge management and used to facilitate existing document management and retrieval techniques (e.g., event tracking). Specifically, we propose and design an evolution pattern (EP) discovery technique for mining evolution patterns from sequences of documents. We experimentally evaluate our proposed EP technique in the context of facilitating event tracking. Measured by miss and false alarm rates, the evolution-pattern supported event-tracking (EPET) technique exhibits better tracking effectiveness than a traditional event-tracking technique. The encouraging performance of the EPET technique demonstrates the potential usefulness of evolution patterns in supporting event tracking and suggests that the proposed EP technique could effectively discover event episodes and evolution patterns in sequences of documents.
目次 Table of Contents
1 Introduction 1
1.1 Background and Motivation 1
1.2 Overview solution 6
2 Problem Definition, Literature Review 9
2.1 Preliminaries 10
2.2 Event-Episode-Story Structure in Document Sequences 13
2.3 Event Episode Identification 15
2.4 Research Challenges 16
2.5 Literature Review 19
2.6 Other Techniques related to Organization of Event-based Document Stream 23
3 Design of Temporal-based Event Episode Identification Technique 27
3.1 Problem Analysis: Document Clustering Utilizing Temporal Characteristics 28
3.1.1 MODELING EVENT EPISODE IDENTIFICATION AS A DOCUMENT CLUSTERING PROBLEM 28
3.1.3 Selecting Representative Features, Contributing to Similarity Meaures 31
3.2 TF
參考文獻 References
[ACD+98] Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y., “Topic detection and tracking pilot study: Final report,” Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, February 1998, pp.194–218. (Cite in: Problem Definition)
[AGF+05] Allan, J., Harding, S., Fisher, D., Bolivar, A., Guzman-Lara, S., and Amstutz, P., "Taking Topic Detection From Evaluation to Practice," CD Proceedings of the Thirty-Eighth Annual Hawaii International Conference on System Sciences (HICSS), Big Island, Hawaii, January 2005, pp.101a–101a. (Cite in: Related Work)
[ALS02] Allen, J., Lavrenko, V., and Swan, R., “Explorations within topic tracking and detection,” Chapter 10 in Topic Detection and Tracking: Event-based Information Organization, James Allan (Ed.), Kluwer Academic Publishers, 2002, pp.197–224. (Cite in: Section 5.3)
[APL98] Allan, J., Papka, R., and Lavrenko, V., “On-line New Event Detection and Tracking,” Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98), Melbourne, Australia, August 1998, pp.37–45. (Cite in: Problem Definition)
[CSG+02] Cieri, C., Strassel, S., Graff, D., Martey, N., Rennert, K., and Liberman, M., “Corpora for topic detection and tracking,” Chapter 3 in Topic Detection and Tracking: Event-based Information Organization, James Allan (Ed.), Kluwer Academic Publishers, 2002, pp.33–66. (Cite in: Problem Definition)
[DJ04] Diaz, F. and Jones, R., “Using temporal profiles of queries for precision prediction,” Proceedings of the 27th annual international conference on Research and development in information retrieval (SIGIR’04), Sheffield, United Kingdom, 2004, pp.18-24.
[GDH03] Gabrilovich, E., Dumais, S., and Horvitz, E., “Newsjunkie: providing personalized newsfeeds via analysis of information novelty,” Proceedings of the 13th international conference on World Wide Web (WWW’04), New York, NY, USA, May 2003, pp.482-489.
[HGM00] Hatzivassiloglou, V., Gravano, L., and Maganti, A., “An investigation of linguistic features and clustering algorithms for topical document clustering,” Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’00), Athens, Greece, 2000, pp.224 – 231.
[KG02] Kab
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code