論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
以文件摘要技術支援事件偵測 Use of Text Summarization for Supporting Event Detection |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
57 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2003-07-22 |
繳交日期 Date of Submission |
2003-08-12 |
關鍵字 Keywords |
環境掃描、文件摘要、事件偵測 text summarization, event detection, environmental scanning |
||
統計 Statistics |
本論文已被瀏覽 5758 次,被下載 3264 次 The thesis/dissertation has been browsed 5758 times, has been downloaded 3264 times. |
中文摘要 |
在資訊爆炸以及資訊流通快速的時代,組織所面臨之外部環境也隨之越趨複雜且變化快速,使得組織必須不斷地偵測其面臨的外部環境、及時反應和掌握環境的變化及趨勢。隨著網際網路和線上電子新聞的崛起,有關組織外部環境的資訊量也隨之增加,因此利用資訊科技來輔助組織進行環境掃描已成為組織策略管理中重要的一環。事件偵測技術為協助組織環境掃描的技術之一,其藉由比較新產生的新聞文件與過去的新聞文件之間的文字相似度,判定新產生的新聞文件所描述之新聞事件為已發生過或未發生過之事件。然而,在一般的新聞文件中,記者為了使報導更加完整,會做額外的補充報導。然而,這些補充報導通常跟該新聞文件所欲描述的主題沒有高度相關性,且容易降低事件偵測的準確性。因此,本論文提出以文件摘要技術作為基礎的事件偵測技術,其結合了文件摘要技術,對於每一新聞文件萃取出跟主題有高度相關性的句子來代表 每一篇新聞,在以傳統比較文字相似度的方法來判斷該新聞是否為未發生之新聞事件。以實際的新聞資料來做實驗評估本論文提出的事件偵測技術時,此技術能夠達到與傳統事件偵測技術相似或較好的準確率。 |
Abstract |
Environmental scanning, which acquires and use the information about event, trends, and changes in an organization’s external environment, is an important process in the strategic management of an organization and permits the organization to quickly adapt to the changes of its external environment. Event detection that detects the onset of new events from news documents is essential to facilitating an organization’s environmental scanning activity. However, traditional feature-based event detection techniques detect events by comparing the similarity between features of news stories and incur several problems. For example, for illustration and comparison purpose, a news story may contain sentences or paragraphs that are not highly relevant to defining its event. Without removing such less relevant sentences or paragraphs before detection, the effectiveness of traditional event detection techniques may suffer. In this study, we developed a summary-based event detection (SED) technique that filters less relevant sentences or paragraphs in a news story before performing feature-based event detection. Using a traditional feature-based event detection technique (i.e., INCR) as benchmark, the empirical evaluation results showed that the proposed SED technique could achieve comparable or even better detection effectiveness (measured by miss and false alarm rates) than the INCR technique, for data corpora where the percentage of news stories discussing old events is high. |
目次 Table of Contents |
Chapter 1 Introduction 1 1.1 Background 1 1.2 Research Motivation and Objective 2 1.3 Organization of the Thesis 4 Chapter 2 Literature Review 5 2.1 Event Detection 5 2.2 Text Summarization 9 2.2.1 Edmundson’s Approach 9 2.2.2 Kupiec et al’s Approach 11 2.2.3 Teufel and Moens’ Approach 13 2.2.4 Mani and Bloedorn’s Approach 14 2.2.5 Neto et al’s Approach 16 2.2.6 Myaeng and Jang’s Approach 19 2.2.7 Summary of Text Summarization Approaches 20 Chapter 3 Development of Summary-based Event Detection (SED) Technique 22 3.1 Process of Summary-based Event-Detection (SED) Technique 24 3.2 News Summarization Phase 25 3.2.1 News Summarization Learning Task 26 3.2.2 News Summary Generation Task 32 3.3 Event Detection Phase 33 Chapter 4 Empirical Evaluation 35 4.1 Evaluation Design 35 4.1.1 Data Collection and Summary Preparation 35 4.1.2 Evaluation Criteria for Event Detection 37 4.1.3 Performance Benchmarks for Event Detection 38 4.2 Evaluation Result 38 4.2.1 Parameter Tuning 38 4.2.2 Comparative Evaluation of Event Detection Techniques 43 Chapter 5 Conclusions and Future Research Directions 47 Appendix A: List of Stop Words 49 Appendix B: Sentence Representation Schemes Employed by Existing Text Summarization Approaches 50 References 53 |
參考文獻 References |
[A67] Aguilar, F.J., Scanning the Business Environment, Macmillan Publisher, New York, 1967. [ABD95] Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., and Vilain, M., “MITRE: Description of the Alembic System Used for MUC-6,” Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, November 1995. [AC94] Auster, E., and Choo, C., “How Senior Managers Acquire and Use Information in Environment Scanning,” Information Processing & Management, Vol. 30, No. 5, 1994, pp.607-618. [AOG 99] Aone, C., Okurowski, M., Gorlinsky, J. and Larsen, B., “A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques,” In Mani and Maybury [MM99]. [APL98] Allan, J., Papka, R. and Lavrenko, V., “On-line New Event Detection and Tracking,” Proceedings of SIGIR ’98: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp.37-45. [AZM98] Ahituv, N., Zif, J., and Machlin, I., “Environmental Scanning and Information Systems in Relation to Success in Introducing New Products,” Information and Management, Vol. 33, 1998, pp.201-211. [B58] Baxendale, P.B., “Machine-made Index for Technical Literature - An Experiment,” IBM Journal of Research and Development, Vol. 2, No. 4, 1958, pp.354 361. [B92] Brill, E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, 1992. [B94] Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,” Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994. [BE97] Barzilay, R. and Barzilay, M., “Using Lexical Chains for Text Summarization,” Proceedings of ACL/EACL Workshop on Intelligent Scalable Text Summarization, July 1997. [BK97] Boguraev, B. and Kennedy, C., “Salience-Based Content Characterisation of Text Documents,” Proceedings of ACL/EACL Workshop on Intelligent Scalable Text Summarization, July 1997. [C98] Choo, C. W., “Information Management for the Intelligent Organization: The Art of Scanning the Environment, 2ed.,” Information Today, Medford, NJ, 1998. [C99] Choo, C.W., “The Art of Scanning the Environment,” Bulletin of the American Society for Information Science, 1999, pp.21-24. [CB91] Clark, P. and Boswell, R., “Rule Induction with CN2: Some Recent Improvements,” Proceedings of the Fifth European Conference on Machine Learning, 1991, pp.151-163. [DSP88] Daft, R.L., Sormunen, J., and Parks, D., “Chief Executive Scanning, Environmental Characteristics and Firm Performance: An Empirical Study,” Strategic Management Journal, Vol. 9, 1988, pp.123-139. [E69] Edmundson, H. P., “New Method in Automatic Extraction,” Journal of the ACM, Vol. 16, No. 2, 1969, pp.264-285. [F61] Fano, R., Transmission of Information, MIT Press, 1961. [FC99] Firmin, T. and Chrzanowski, M., “An Evaluation of Automatic Text Summarization Systems,” In Mani and Maybury [MM99]. [FK77] Fahey, L. and King, W. R., “Environmental Scanning for Corporate Planning,” Business Horizons, August 1977, pp.61-71. [HH96] Halliday, M. and Hasan, R., Cohesion in Text, London, Longmans, 1996. [HL97] Hovy, E., and Lin, C. Y., “Automated Text Summarization in SUMMARIST,” In Mani and Maybury [MM99]. [J99] Jones, K. S., “Automatic Summarizing: Factors and Directions,” In Mani and Maybury [MM99]. [K95] Krupka, G., “SRA: Description of the SRA System as Used for MUC-6,” Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, November 1995. [KPC95] Kupiec, J., Pedersen, J., and Chen, F., “A Trainable Document Summarizer,” Proceedings of the 18th ACM-SIGIR Conference, 1995, pp.68-73. [L58] Luhn, H. P., “The Automatic Creation of Literature Abstracts,” IBM Journal of Research and Development, 1958, pp.159-165. [L81] Lawrence, P. R., “Organization and Environment Perspective,” Perspectives on Organization Design and Behavior, A. H. Van De Van and W. F. Joyce (Eds.), Wiley, New York, 1981, pp.311-327. [LH97] Lin, C. Y. and Hovy, E. H., “Identifying Topics by Position,” Proceedings of The Applied Natural Language Processing Conference (ANLP-97), Washington, DC, 1997, pp.283-290. [M99] Marcu, D. “Discourse Trees Are Good Indicators of Importance in Text,” In Mani and Maybury [MM99]. [M01] Mani, I., Automatic Summarization, J. Benjamins Publ. Co. Amsterdam, Philadelphia, 2001. [MB98] Mani, I. and Bloedorn, E., “Machine Learning of Generic and User-Focused Summarization,” Working Notes of the AAAI'98 Spring Symposium on Intelligent Text Summarization, Stanford, CA, pp.69-76. [MJ98] Myaeng, S. H. and Jang, D. H., “Development and Evaluation of a Statistically-Based Document Summarization System,” In Mani and Maybury [MM99]. [MKA92] Morris, A. H., Kasper, G. M., and Adams D. A., “The Effects and Limitations of Automated Text Condensing on Reading Comprehension Performance,” Information Systems Research, March 1992, pp.17-35. [MJ99] Myaeng, S. and Jang, D., “Development and Evaluation of A Statistically Based Document Summarization System,” In Mani and Maybury [MM99]. [MM99] Mani, I. and Maybury, M., Advances in Automatic Text Summarization, MIT Press, Cambridge, Massachusetts, 1999. [MSB97] Mitra, M., Singhal, A., and Buckley, C., “Automatic Text Summarization by Paragraph Extraction,” Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain. [NFK02] Neto, J., Freitas, A., and Kaestner, C., “Automatic Text Summarization Using A Machine Learning Approach,” SBIA, 2002, pp.205-215. [NSK00] Neto, J. L., Santos, A. D., Kaestner, C. A. A., Freitas, A. A., Nievola, J. C., “A Trainable Algorithm for Summarization News Stories,” Proceeding PKDD’2000 Workshop on Machine Learning and Textual Information Access, Lyon, France, September, 2000. [Q92] Quinlan, J., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1992. [RS94] Rino, L., and Scott, D., “Content Selection in Summary Generation,” Proceedings of Third International Conference on the Cognitive Science of Natural Language Processing, 1994. [S97] SPSS, SPSS Base 7.5 Applications Guide, SPSS Inc., Chicago, 1997. [SAB94] Salton, G., Allan, J., Buckley, C., and Singhal, A., “Automatic Analysis, Theme Generation, and Summarization of Machine Readable Texts,” Science, Vol. 264, 1994, pp.1412-1426. [SGH78] Starbuck, W., Greven, A., and Hedberg, B. L., “Responding to Crises,” Journal of Business Administration, Vol. 9, 1978, pp.111-137. [SSW99] Strzalkowski, T., Stein, G., Wang, J., and Wise, B., “A Robust Practical Text Summarization,” In Mani and Maybury [MM99]. [TM97] Teufel, S., and Moens, M., “Sentence Extraction as a Classification Task,” Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, July 1997, pp.58-65. [TM99] Teufel, S. and Moens, M., “Argumentative Classification of Extracted Sentences as A First Step Towards Flexible Abstracting,” In Mani and Maybury [MM99]. [V93] Voutilainen, A., “Nptool: A Detector of English Noun Phrases,” Proceedings of Workshop on Very Large Corpora, Ohio, June 1993. [WB90] Weissberg, R., and Buker, S., Writing up Research: Experimental Research Report Writing for Student of English, Prentice Hall, Inc., 1990. [WBM95] Wnek, K., Bloedorn, E., and Michalski, R., “Selective Inductive Learning Method AQ15C: The Method and User's Guide,” Machine Learning and Inference Laboratory Report ML95-4, George Mason Unviersity, Fairfax, Virginia, 1995. [WL01] Wei, C. and Lee, Y. H., “Event Detection for Supporting Environmental Scanning: An Information Extraction-based Approach,” Proceedings of the 5th Pacific Asia conference on Information Systems (PACIS), 2001. [Y97] Yaari, Y., “Segmentation of Expository Texts by Hierarchical Agglomerative Clustering,” Technical Report, Bar-Ilan University, Iseael, 1997. [YCB99] Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., and Liu, X., “Learning Approaches for Detecting and Tracking News Events,” IEEE Intelligent Systems and Their Applications, 1999, pp.32-43. [YPC98] Yang, Y., Pierce, T. and Carbonell, J., “A Study on Retrospective and Online Event Detection,” Proceedings of SIGIR ’98: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM press, New York, 1998, pp.28-36. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |