Responsive image
博碩士論文 etd-0812103-212948 詳細資訊
Title page for etd-0812103-212948
論文名稱
Title
以文件摘要技術支援事件偵測
Use of Text Summarization for Supporting Event Detection
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2003-07-22
繳交日期
Date of Submission
2003-08-12
關鍵字
Keywords
環境掃描、文件摘要、事件偵測
text summarization, event detection, environmental scanning
統計
Statistics
本論文已被瀏覽 5758 次,被下載 3264
The thesis/dissertation has been browsed 5758 times, has been downloaded 3264 times.
中文摘要
在資訊爆炸以及資訊流通快速的時代,組織所面臨之外部環境也隨之越趨複雜且變化快速,使得組織必須不斷地偵測其面臨的外部環境、及時反應和掌握環境的變化及趨勢。隨著網際網路和線上電子新聞的崛起,有關組織外部環境的資訊量也隨之增加,因此利用資訊科技來輔助組織進行環境掃描已成為組織策略管理中重要的一環。事件偵測技術為協助組織環境掃描的技術之一,其藉由比較新產生的新聞文件與過去的新聞文件之間的文字相似度,判定新產生的新聞文件所描述之新聞事件為已發生過或未發生過之事件。然而,在一般的新聞文件中,記者為了使報導更加完整,會做額外的補充報導。然而,這些補充報導通常跟該新聞文件所欲描述的主題沒有高度相關性,且容易降低事件偵測的準確性。因此,本論文提出以文件摘要技術作為基礎的事件偵測技術,其結合了文件摘要技術,對於每一新聞文件萃取出跟主題有高度相關性的句子來代表
每一篇新聞,在以傳統比較文字相似度的方法來判斷該新聞是否為未發生之新聞事件。以實際的新聞資料來做實驗評估本論文提出的事件偵測技術時,此技術能夠達到與傳統事件偵測技術相似或較好的準確率。
Abstract
Environmental scanning, which acquires and use the information about event, trends, and changes in an organization’s external environment, is an important process in the strategic management of an organization and permits the organization to quickly adapt to the changes of its external environment. Event detection that detects the onset of new events from news documents is essential to facilitating an organization’s environmental scanning activity. However, traditional feature-based event detection techniques detect events by comparing the similarity between features of news stories and incur several problems. For example, for illustration and comparison purpose, a news story may contain sentences or paragraphs that are not highly relevant to defining its event. Without removing such less relevant sentences or paragraphs before detection, the effectiveness of traditional event detection techniques may suffer. In this study, we developed a summary-based event detection (SED) technique that filters less relevant sentences or paragraphs in a news story before performing feature-based event detection. Using a traditional feature-based event detection technique (i.e., INCR) as benchmark, the empirical evaluation results showed that the proposed SED technique could achieve comparable or even better detection effectiveness (measured by miss and false alarm rates) than the INCR technique, for data corpora where the percentage of news stories discussing
old events is high.
目次 Table of Contents
Chapter 1 Introduction 1
1.1 Background 1
1.2 Research Motivation and Objective 2
1.3 Organization of the Thesis 4
Chapter 2 Literature Review 5
2.1 Event Detection 5
2.2 Text Summarization 9
2.2.1 Edmundson’s Approach 9
2.2.2 Kupiec et al’s Approach 11
2.2.3 Teufel and Moens’ Approach 13
2.2.4 Mani and Bloedorn’s Approach 14
2.2.5 Neto et al’s Approach 16
2.2.6 Myaeng and Jang’s Approach 19
2.2.7 Summary of Text Summarization Approaches 20
Chapter 3 Development of Summary-based Event Detection (SED) Technique 22
3.1 Process of Summary-based Event-Detection (SED) Technique 24
3.2 News Summarization Phase 25
3.2.1 News Summarization Learning Task 26
3.2.2 News Summary Generation Task 32
3.3 Event Detection Phase 33
Chapter 4 Empirical Evaluation 35
4.1 Evaluation Design 35
4.1.1 Data Collection and Summary Preparation 35
4.1.2 Evaluation Criteria for Event Detection 37
4.1.3 Performance Benchmarks for Event Detection 38
4.2 Evaluation Result 38
4.2.1 Parameter Tuning 38
4.2.2 Comparative Evaluation of Event Detection Techniques 43
Chapter 5 Conclusions and Future Research Directions 47
Appendix A: List of Stop Words 49
Appendix B: Sentence Representation Schemes Employed by Existing Text Summarization Approaches 50
References 53
參考文獻 References
[A67] Aguilar, F.J., Scanning the Business Environment, Macmillan Publisher,
New York, 1967.
[ABD95] Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., and
Vilain, M., “MITRE: Description of the Alembic System Used for
MUC-6,” Proceedings of the Sixth Message Understanding Conference
(MUC-6), Columbia, Maryland, November 1995.
[AC94] Auster, E., and Choo, C., “How Senior Managers Acquire and Use
Information in Environment Scanning,” Information Processing &
Management, Vol. 30, No. 5, 1994, pp.607-618.
[AOG 99] Aone, C., Okurowski, M., Gorlinsky, J. and Larsen, B., “A Trainable
Summarizer with Knowledge Acquired from Robust NLP Techniques,”
In Mani and Maybury [MM99].
[APL98] Allan, J., Papka, R. and Lavrenko, V., “On-line New Event Detection
and Tracking,” Proceedings of SIGIR ’98: 21st Annual International
ACM SIGIR Conference on Research and Development in Information
Retrieval, 1998, pp.37-45.
[AZM98] Ahituv, N., Zif, J., and Machlin, I., “Environmental Scanning and
Information Systems in Relation to Success in Introducing New
Products,” Information and Management, Vol. 33, 1998, pp.201-211.
[B58] Baxendale, P.B., “Machine-made Index for Technical Literature - An
Experiment,” IBM Journal of Research and Development, Vol. 2, No. 4,
1958, pp.354 361.
[B92] Brill, E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings
of the Third Conference on Applied Natural Language Processing,
Trento, Italy, 1992.
[B94] Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,”
Proceedings of the Twelfth National Conference on Artificial
Intelligence (AAAI-94), Seattle, WA, 1994.
[BE97] Barzilay, R. and Barzilay, M., “Using Lexical Chains for Text
Summarization,” Proceedings of ACL/EACL Workshop on Intelligent
Scalable Text Summarization, July 1997.
[BK97] Boguraev, B. and Kennedy, C., “Salience-Based Content
Characterisation of Text Documents,” Proceedings of ACL/EACL
Workshop on Intelligent Scalable Text Summarization, July 1997.
[C98] Choo, C. W., “Information Management for the Intelligent
Organization: The Art of Scanning the Environment, 2ed.,” Information
Today, Medford, NJ, 1998.
[C99] Choo, C.W., “The Art of Scanning the Environment,” Bulletin of the
American Society for Information Science, 1999, pp.21-24.
[CB91] Clark, P. and Boswell, R., “Rule Induction with CN2: Some Recent
Improvements,” Proceedings of the Fifth European Conference on
Machine Learning, 1991, pp.151-163.
[DSP88] Daft, R.L., Sormunen, J., and Parks, D., “Chief Executive Scanning,
Environmental Characteristics and Firm Performance: An Empirical
Study,” Strategic Management Journal, Vol. 9, 1988, pp.123-139.
[E69] Edmundson, H. P., “New Method in Automatic Extraction,” Journal of
the ACM, Vol. 16, No. 2, 1969, pp.264-285.
[F61] Fano, R., Transmission of Information, MIT Press, 1961.
[FC99] Firmin, T. and Chrzanowski, M., “An Evaluation of Automatic Text
Summarization Systems,” In Mani and Maybury [MM99].
[FK77] Fahey, L. and King, W. R., “Environmental Scanning for Corporate
Planning,” Business Horizons, August 1977, pp.61-71.
[HH96] Halliday, M. and Hasan, R., Cohesion in Text, London, Longmans, 1996.
[HL97] Hovy, E., and Lin, C. Y., “Automated Text Summarization in
SUMMARIST,” In Mani and Maybury [MM99].
[J99] Jones, K. S., “Automatic Summarizing: Factors and Directions,” In
Mani and Maybury [MM99].
[K95] Krupka, G., “SRA: Description of the SRA System as Used for
MUC-6,” Proceedings of the Sixth Message Understanding Conference
(MUC-6), Columbia, Maryland, November 1995.
[KPC95] Kupiec, J., Pedersen, J., and Chen, F., “A Trainable Document
Summarizer,” Proceedings of the 18th ACM-SIGIR Conference, 1995,
pp.68-73.
[L58] Luhn, H. P., “The Automatic Creation of Literature Abstracts,” IBM
Journal of Research and Development, 1958, pp.159-165.
[L81] Lawrence, P. R., “Organization and Environment Perspective,”
Perspectives on Organization Design and Behavior, A. H. Van De Van
and W. F. Joyce (Eds.), Wiley, New York, 1981, pp.311-327.
[LH97] Lin, C. Y. and Hovy, E. H., “Identifying Topics by Position,”
Proceedings of The Applied Natural Language Processing Conference
(ANLP-97), Washington, DC, 1997, pp.283-290.
[M99] Marcu, D. “Discourse Trees Are Good Indicators of Importance in
Text,” In Mani and Maybury [MM99].
[M01] Mani, I., Automatic Summarization, J. Benjamins Publ. Co. Amsterdam,
Philadelphia, 2001.
[MB98] Mani, I. and Bloedorn, E., “Machine Learning of Generic and
User-Focused Summarization,” Working Notes of the AAAI'98 Spring
Symposium on Intelligent Text Summarization, Stanford, CA, pp.69-76.
[MJ98] Myaeng, S. H. and Jang, D. H., “Development and Evaluation of a
Statistically-Based Document Summarization System,” In Mani and
Maybury [MM99].
[MKA92] Morris, A. H., Kasper, G. M., and Adams D. A., “The Effects and
Limitations of Automated Text Condensing on Reading
Comprehension Performance,” Information Systems Research, March
1992, pp.17-35.
[MJ99] Myaeng, S. and Jang, D., “Development and Evaluation of A
Statistically Based Document Summarization System,” In Mani and
Maybury [MM99].
[MM99] Mani, I. and Maybury, M., Advances in Automatic Text Summarization,
MIT Press, Cambridge, Massachusetts, 1999.
[MSB97] Mitra, M., Singhal, A., and Buckley, C., “Automatic Text
Summarization by Paragraph Extraction,” Proceedings of the
ACL’97/EACL’97 Workshop on Intelligent Scalable Text
Summarization, Madrid, Spain.
[NFK02] Neto, J., Freitas, A., and Kaestner, C., “Automatic Text Summarization
Using A Machine Learning Approach,” SBIA, 2002, pp.205-215.
[NSK00] Neto, J. L., Santos, A. D., Kaestner, C. A. A., Freitas, A. A., Nievola, J.
C., “A Trainable Algorithm for Summarization News Stories,”
Proceeding PKDD’2000 Workshop on Machine Learning and Textual
Information Access, Lyon, France, September, 2000.
[Q92] Quinlan, J., C4.5: Programs for Machine Learning, Morgan Kaufmann,
San Mateo, CA, 1992.
[RS94] Rino, L., and Scott, D., “Content Selection in Summary Generation,”
Proceedings of Third International Conference on the Cognitive
Science of Natural Language Processing, 1994.
[S97] SPSS, SPSS Base 7.5 Applications Guide, SPSS Inc., Chicago, 1997.
[SAB94] Salton, G., Allan, J., Buckley, C., and Singhal, A., “Automatic Analysis,
Theme Generation, and Summarization of Machine Readable Texts,”
Science, Vol. 264, 1994, pp.1412-1426.
[SGH78] Starbuck, W., Greven, A., and Hedberg, B. L., “Responding to Crises,”
Journal of Business Administration, Vol. 9, 1978, pp.111-137.
[SSW99] Strzalkowski, T., Stein, G., Wang, J., and Wise, B., “A Robust Practical
Text Summarization,” In Mani and Maybury [MM99].
[TM97] Teufel, S., and Moens, M., “Sentence Extraction as a Classification
Task,” Proceedings of the ACL’97/EACL’97 Workshop on Intelligent
Scalable Text Summarization, Madrid, Spain, July 1997, pp.58-65.
[TM99] Teufel, S. and Moens, M., “Argumentative Classification of Extracted
Sentences as A First Step Towards Flexible Abstracting,” In Mani and
Maybury [MM99].
[V93] Voutilainen, A., “Nptool: A Detector of English Noun Phrases,”
Proceedings of Workshop on Very Large Corpora, Ohio, June 1993.
[WB90] Weissberg, R., and Buker, S., Writing up Research: Experimental
Research Report Writing for Student of English, Prentice Hall, Inc.,
1990.
[WBM95] Wnek, K., Bloedorn, E., and Michalski, R., “Selective Inductive
Learning Method AQ15C: The Method and User's Guide,” Machine
Learning and Inference Laboratory Report ML95-4, George Mason
Unviersity, Fairfax, Virginia, 1995.
[WL01] Wei, C. and Lee, Y. H., “Event Detection for Supporting Environmental
Scanning: An Information Extraction-based Approach,” Proceedings of
the 5th Pacific Asia conference on Information Systems (PACIS), 2001.
[Y97] Yaari, Y., “Segmentation of Expository Texts by Hierarchical
Agglomerative Clustering,” Technical Report, Bar-Ilan University,
Iseael, 1997.
[YCB99] Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T.,
and Liu, X., “Learning Approaches for Detecting and Tracking News
Events,” IEEE Intelligent Systems and Their Applications, 1999,
pp.32-43.
[YPC98] Yang, Y., Pierce, T. and Carbonell, J., “A Study on Retrospective and
Online Event Detection,” Proceedings of SIGIR ’98: 21st Annual
International ACM SIGIR Conference on Research and Development in
Information Retrieval, ACM press, New York, 1998, pp.28-36.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code