Responsive image
博碩士論文 etd-0730100-014038 詳細資訊
Title page for etd-0730100-014038
論文名稱
Title
以資訊萃取為基礎之事件偵測技術
Development of Information Extraction-based Event Detection Technique
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
73
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2000-07-26
繳交日期
Date of Submission
2000-07-30
關鍵字
Keywords
文件分類、事件偵測、環境掃瞄、資訊萃取
environmental scanning, text categorization, information extraction, event detection
統計
Statistics
本論文已被瀏覽 5694 次,被下載 9713
The thesis/dissertation has been browsed 5694 times, has been downloaded 9713 times.
中文摘要
企業組織必須不斷地對其外部環境進行環境掃瞄以瞭解外部環境的變化,
並求能及時掌握各種環境變動來做出適當的因應決策。隨著組織外在環境
的日漸複雜及資訊科技蓬勃發展所引發的資訊爆炸,已使得組織在進行環
境掃瞄時的負擔更為加重。因此,利用資訊科技來減輕組織進行環境掃瞄
的負擔已勢在必行。事件偵測技術即為協助組織進行環境掃瞄的技術之一
,其能從一連串的新聞稿中藉由比較不同新聞稿之間用字的相似性來偵測
出其所描述的新聞內容為發生過或未發生過的新聞事件。然而、傳統的事
件偵測技術藉由比較新聞稿之間用字差異來進行事件偵測的方式仍有某些
缺點。舉例來說,像是新聞的用字可能會因記者的習慣不同而有差異,因
而導致降低事件偵測的準確性。此外、傳統的事件偵測技術也無法提供事
件分類及後續或相關事件發展追蹤的功能。在本論文中,我們提出了一套
以資訊萃取為基礎的事件偵測技術,其結合了文件分類與資訊萃取的技術
以改善過去事件偵測技術所面臨的限制與問題。在以實際的新聞資料來評
估此新的事件偵測技術時,以0%的錯誤率(miss rate)及9.6%的誤判率(
false alarm rate)獲得了較優於傳統事件偵測技術的結果。
Abstract
Environmental scanning is an important process, which acquires and uses the information about events, trends, and relationships in an organization's external environment. It permits an organization to adapt to its environment and to develop effective responses to secure or improve their position in the future. Event detection technique that identifies the onset of new events from streams of news stories would facilitate the process of organization's environmental scanning. However, traditional feature-based event detection techniques, which identify whether a news story contains an unseen event by comparing the similarity of words between the news story and past news stories, incur some limitations (e.g., the features shown in news document cannot actually represent the event described in it.). Thus, in this study, we developed an information extraction-based event detection (NEED) technique that combines information extraction and text categorization techniques to address the problems inherent to traditional feature-based event detection techniques. The empirical evaluation results showed that the NEED technique outperformed the traditional feature-based event detection techniques in miss rate and false alarm rate and achieved comparable event association accuracy rate to its counterpart.
目次 Table of Contents
Chapter 1. Introduction 1
1.1 Background 1
1.2 Research Motivation and Objectives 2
1.3 Organization of the Thesis 6
Chapter 2. Literature Review 7
2.1 Event Detection 7
2.2 Text Categorization 15
2.2.1 Preprocessing Phase 17
2.2.2 Document Representation Phase 18
2.2.3 Induction Phase 19
2.3 Information Extraction 23
Chapter 3. Development of Information Extraction-based Event Detection (NEED) Technique 29
3.1 Architecture of Information Extraction-based Event Detection (NEED) Technique 29
3.2 Learning and Extraction Subsystem 32
3.3 Detection Subsystem 37
3.4 Complete Algorithm of Information Extraction-based Event Detection (NEED) Technique 41
Chapter 4. Empirical Evaluation 45
4.1 Evaluation Design 45
4.1.1 Data Collection 45
4.1.2 Evaluation Criteria 46
4.1.3 Performance Benchmarks 47
4.1.4 Evaluation Procedure 50
4.2 Evaluation Result 51
4.2.1 Parameter Tuning 51
4.2.2 Comparative Evaluation of Event Detection Techniques 56
4.2.3 Evaluation of Event Association Accuracy 57
Chapter 5. Conclusion and Future Research Directions 59
Appendix A Regular Expression Rule 61
Appendix B Ontology of "Airplane Crash" Event Topic 64
Appendix C Ontology of "Interest Rate Adjustment" Event Topic 65
Appendix D Ontology of "Business Merger" Event Topic 66
Appendix E Ontology of "Business Partnership" Event Topic 67
Appendix F Ontology of "Computer Virus" Event Topic 68
References 69
參考文獻 References
[A67] Aguilar, F. J., Scanning the Business Environment, Macmillan Publisher, New York, 1967.
[ADW94] Apte, C., Damerau, F. and Weiss, S., "Automated Learning of Decision Rules for Text Categorization," ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp.233-251.
[APL98] Allan, J., Papka, R. and Lavrenko, V., "On-line New Event Detection and Tracking," Proceedings of SIGIR '98: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM press, New York, 1998, pp.37-45.
[AZM98] Ahituv, N., Zif, J. and Machlin, I., "Environmental Scanning and Information Systems in Relation to Success in Introducing New Products," Information and Management, Vol. 33, 1998, pp.201-211.
[B92] E. Brill, "A Simple Rule-Based Part of Speech Tagger," In Proceedings of the Third Conference on Applied Natural Language Processing, ACL, Trento, Italy, 1992.
[B94] E. Brill, "Some Advances in Rule-Based Part of Speech Tagging," Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994.
[BL97] Berry, M. J. A., and Linoff, G., Data Mining Techniques: For Marketing, Sales and Customer Support, Wiley, 1997.
[CN89] Clark, P. and Niblett, T., "The CN2 Induction Algorithm," Machine Learning Journal, Vol. 3, No. 4, 1989, pp.261-283.
[CB91] Clark, P. and Boswell, R., "Rule Induction with CN2: Some Recent Improvements," Proceedings of the 5th European Conference (EWSL '91), 1991, pp.151-163.
[CKPT92] Cutting, D., Karger, D., Pedersen, J. and Tukey, J., "Scatter/Gather: A cluster-Based Approach to Browsing Large Document Collections," Proceedings of SIGIR '92: 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, pp.318-329.
[CS96] Cohen, W. and Singer, Y., "Context-sensitive Methods for Text Categorization," Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp.307.
[C98] Choo, C. W., Information Management for the Intelligent Organization: The Art of Scanning the Environment, 2nd Ed., Information Today, Inc., Medford, NJ, 1998.
[C99] Choo, C. W., "The Art of Scanning the Environment," Bulletin of the American Society for Information Science, 1999, pp.21-24.
[DH73] Duda, R. O. and Hart, P. E., Pattern Classification and Scene Analysis, Wiley, New York, 1973.
[DSP88] Daft, R. L. Sormunen, J. and Parks, D., "Chief Executive Scanning, Environmental Characteristics and Firm Performance: An Empirical Study," Strategic Management Journal, Vol. 9, 1988, pp.123-139.
[DKR97] Dagan, I., Kariv, Y. and Roth, D., "Mistake-Driven Learning in Text Categorization," Proceedings of the Second Conference on Empirical Methods in NLP, 1997, pp.55-63.
[DPHS98] Dumais, S., Platt, J., Heckerman, D. and Sahami, M., "Inductive Learning Algorithms and Representations for Text Categorization," Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management (CIKM '98), 1998, pp.148-155.
[EKW92] Embley, D. W., Kurtz, B. D. and Woodfield, S. N., Object-oriented Systems Analysis: A Model-Driven Approach, Prentice Hall, Englewood Cliffs, New Jersey, 1992.
[ECS98] Embley, D. W., Campbell, D. M. and Smith, R. D., "Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents," Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management (CIKM '98), 1998, pp.52-59.
[FKN81] Fahey, L., King, W. H. and Narayanan, V. K., "Environment Scanning and Forecasting in Strategic Scanning - the State of the Art," Long Range Planning, Vol. 4, No. 1, 1981, pp.32-39.
[H81] Hambrick, D. C., "Specialization of Environmental Scanning Activities Among Upper Level Executives," Journal of Management Studies, Vol.18, 1981, pp.299-320.
[J84] Jain, S. C., "Environmental Scanning - How the Best Companies Do It," Long Range Planning, 1984, pp.117-128.
[JL92] Jennings, D. and Lumpkin, J., "Insights Between Environmental Scanning Activities and Porter's Generic Strategies: An Empirical Analysis," Journal of Management, Vol.18, No. 4, 1992, pp.791-803.
[L92] Lewis, D. D., "An Evaluation of Phrasal and Clustered Representations on A Text Categorization Task," Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, pp. 37-50.
[L96] Looney, C. G., "Advances in Feedforward Neural Networks: Demystifying Knowledge Acquiring Black Boxes," IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 2, April 1996, pp.211-226.
[LC96] Larkey, L. and Croft, W., "Combining Classifiers in Text Categorization," Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp.289-297.
[LSCP96] Lewis, D., Schapore, R., Callan, J. and Papka, R., "Training Algorithms for Linear Text Classifiers," Proceedings of the 9th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp.298-306.
[L98] Liu, S., "Business Environment Scanner for Senior Managers: Towards Active Executive Support with Intelligent Agents," Expert Systems with Application, Vol. 15, 1998, pp.111-121.
[LH98] Lam W., and Ho, C. Y., Using A Generalized Instance set for Automatic Text categorization; "Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval," 1998, pp. 81-89.
[M61] Maron, M., "Automatic Indexing: An Experimental Inquiry," Journal of the ACM, Vol. 8, 1961, pp.404-417.
[M69] Michalski, R. S., "On the Quasi-minimal Solution of the General Covering Problem," In Proceedings of the 5th International Symposium on Information Processing (FCIP69), Vol. A3 (Switching circuits), Bled, Yugoslavia, 1969, pp.125-128.
[MMHL86] Michalski, R. S., Mozetic, I., Hong, J. and Lavrac, N., "The Multipurpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains," Proceedings of AAAI-86, Vol. 2, 1986, pp.1041-1045.
[MW87] Mason, D. H. and Wilson, R. G., "Future Mapping: a New Approach to Managing Strategic Uncertainty," Planning Review, 1987, pp.20-29.
[MLW92] Masand, B., Linoff, G., and Waltz, D., "Classifying News Stories Using Memory Based Reasoning," Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '92), 1992, pp.59-64.
[M96] Mayeux, P., Broadcast News: Writing & Reporting, 2ed, Brown & Benchmark Publishers, Guilford CT, 1996, p.79.
[MRS97] Maier, J. L., Rainer, R. K. and Snyder, C. A., "Environmental Scanning for Information Technology: An Empirical Investigation," Journal of Management Information Systems, Vol. 14, No. 2, 1997, pp.177-200.
[N82] Nanus, B., "QUEST - Quick Environment Scanning Technique," Long Range Planning, Vol. 15, No. 2, 1982, pp.39-45.
[NGL97] Ng, H. T., Goh, W. B., and Low, K. L., "Feature Selection, Perceptron Learning, and A Usability Case Study for Text Categorization," Proceedings of Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '97), 1997, pp.67-73.
[Q86] Quinlan, J. R., "Induction of Decision Trees," Machine Learning, Vol. 1, 1986, pp. 81-106.
[Q93] Quinlan, J. R., C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[R62] Rosenblatt, F., Principles of Neurodynamics: Perceptrons and the Theory of Brain Machanisms, Spartan Books, Washington, D.C., 1962.
[R71] Rocchio, J., "Relevance Feedback in Information retrieval," The Smart Retrieval System-Experiments in Automatic Document Processing, G. Salton (Ed.), Prentice-Hall, Englewood Cliffs, NJ, 1971, pp.313-323.
[RHW86] Rumelhart, D. E., Hinton, G. E. and Williams, R. J., "Learning Internal Representations by Error Propagation," Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, D. E. Rumelhart and J. L. McClelland (Eds.), MIT Press, Cambridge, MA, 1986, pp.318-362.
[RK91] Rich, E. and Knight, K., Artificial Intelligence, McGraw-Hill, Inc., second edition, 1991, pp.487-514.
[RK98] Ragas, H. and Koster, C., "Four Text Classification Algorithms Compared in a Dutch Corpus," Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp.369-370.
[S93] Stoffels, J. D., Strategic Issues Management: A Comprehensive Guide to Environmental Scanning, Pergamon, Oxford, OH, 1993.
[SHP95] Schutze, H., Hull, D. A. and Pedersen, J. O., "A comparison of classifiers and document representations for the routing problem," In Proceedings of 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995.
[T78] Thayer, L., Communication and Communication Systems, Homewood, IL: Richard D. Irwin, 1978.
[T99] Tu, H. L., "Automatic Categorization of News Using Title Analysis," Unpublished Master Thesis, National Tsing-Hua University, Taiwan, R. O. C., July 1999.
[V93] Voutilainen, A., "NPtool, a Detector of English Noun Phrases," In Proceedings of Workshop on Very Large Corpora, Ohio, Jun., 1993.
[WS85] Widrow, B. and Stearns, S., Adaptive Signal Processing, Prentice-Hall, Englewood Chiffs, NJ, 1985.
[W88] Willett, R., "Recent Trends in Hierarchic Document Clustering: A Critical Review," Information Processing and Management, Vol. 25, No 5, 1988, pp.577-597.
[WGT90] Weiss, S., Galen, R. and Tadepalli, P., "Maximizing the Predictive Value of Production Rules," Artificial Intelligence, Vol. 45, 1990, pp.47-71.
[WK91] Weiss, S. M., and Kulikowski, C. A., Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, 1991.
[WI93] Weiss, S. and Indurkhya, N., "Optimized Rule Induction," IEEE Expert, Vol. 8, No. 6, 1993, pp.61-69.
[WPW95] Wiener, W., Pedersen, J. O., and Weigend, A. S., "A Neural Network Approach to Topic Spotting," Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR '95), 1995.
[Y94] Y., Yang, "Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval," Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994, pp. 13-22.
[YC94] Yang, Y. and Chute, C. G., "An Example-Based Mapping Method for Text Categorization and Retrieval," ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp.252-277.
[YPC98] Yang, Y., Pierce, T. and Carbonell, J., "A Study on Retrospective and Inline Event Detection," Proceedings of SIGIR '98: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM press, New York, 1998, pp.28-36.
[YCB99] Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T. and Liu, X., "Learning Approaches for Detecting and Tracking News Events," IEEE Intelligent Systems and Their Applications, 1999, pp.32-43.
[YL99] Yang, Y., and Liu, X., "A Re-examination of Text Categorization methods," Proceedings of SIGIR '99: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp.42-49.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code