Responsive image
博碩士論文 etd-0718105-152616 詳細資訊
Title page for etd-0718105-152616
論文名稱
Title
結合事件主軸摘要之議題回顧機制於新聞報導應用
Topic Retrospection with Storyline-based Summarization on News Reports
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
75
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2005-07-12
繳交日期
Date of Submission
2005-07-18
關鍵字
Keywords
議題回顧、事件偵測與追縱、摘要機制、事件緒
topic retrospection, event threading, summarization, Topic Detection and Tracking (TDT)
統計
Statistics
本論文已被瀏覽 5693 次,被下載 3300
The thesis/dissertation has been browsed 5693 times, has been downloaded 3300 times.
中文摘要
隨著電子新聞資料庫建置完善,其已成為線上新聞閱讀者一個重要的資訊來源。但當使用者面對為數眾多的新聞報導時,仍沒有一個完善的機制協助使用者在短時間內,回顧一個已發生的議題。有鑑於過去在新聞事件偵測與追蹤 (TDT,Topic Detection and Tracking)的研究上,僅單純地考量如何偵測事件,並將其結果以新聞標題列表和關鍵字的方式呈現,本研究認為透過事件主軸的摘要機制,可以更有效地協助讀者在短時間內,獲知事件發展的概念。
因此,本研究中提出一個機制,用以偵測議題中的事件並建構之間的相互關係,再以此關係摘要成一篇議題回顧的報導,做為使用者快速了解議題發展的文本。此機制主要包括三部份:事件界定、建構議題主軸、主軸式摘要。建構出的議題主軸可以提供議題發展脈絡的主幹,並將相關性較低的事件排除。透過找出具代表性的文句,並以議題發展主軸為範本依據,而構成的摘要,除了可以提供足夠的資訊了解議題發展,也可以做為索引,協助使用者找到更多更詳細的資訊。
本研究採用實驗室實驗法,並配合問答模式來驗證提出之機制。從實驗結果發現,本機制可以讓新聞閱讀者更有效且有效率地,獲得事件發展的過程。
Abstract
The electronics newspaper becomes a main source for online news readers. When facing the numerous stories, news readers need some supports in order to review a topic in short time. Due to previous researches in TDT (Topic Detection and Tracking) only considering how to identify events and present the results with news titles and keywords, a summarized text to present event evolution is necessary for general news readers to retrospect events under a news topic.
This thesis proposes a topic retrospection process and implements the SToRe system that identifies various events under a new topic and constructs the relationship to compose a summary which gives readers the sketch of event evolution in a topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The summarization extracts the representative sentences and takes the main theme as the template to compose summary. The summarization not only provides enough information to comprehend the development of a topic, but also can be an index to help readers to find more detailed information.
A lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. From the experimental results, the SToRe system can help news readers more effectively and efficiently to capture the development of a topic.
目次 Table of Contents
致謝 I
論文提要 II
ABSTRACT III
中文摘要 IV
TABLE OF CONTENTS V
LIST OF FIGURES VII
LIST OF TABLES VIII
CHAPTER 1 INTRODUCTION 1
1.1 Research Background 1
1.2 Research Motivation 2
1.3 Research Objectives 3
1.4 Thesis Organization 4
CHAPTER 2 LITERATURE REVIEW 5
2.1 Topic Detection and Tracking 5
2.1.1 Topic Detection 6
2.1.2 Topic Tracking 7
2.2 Self-organizing Maps 8
2.2.1 SOM Algorithm 8
2.2.2 Growing Hierarchical Self-Organizing Map (GHSOM) 11
2.2.3 WEBSOM 12
2.2.4 The labeling method applied with SOM 14
2.3 Event Threading 15
2.4 Text Summarization 17
2.4.1 Summarization Method and Evaluation 18
2.4.2 Chinese Summarization 20
CHAPTER 3 RESEARCH FRAMEWORK 21
3.1 Definition 21
3.2 Problem Modeling 23
CHAPTER 4 TOPIC RETROSPECTION 26
4.1 Preprocess 27
4.2 Event identification 29
4.3 Main storyline construction 30
4.4 Storyline-based Summarization 34
CHAPTER 5 SYSTEM IMPLEMENT AND EXPERIMENTAL DESIGN 36
5.1 Data source 36
5.2 System Implementation 37
5.2.1 Preliminary Result 39
5.3 Experimental Design 40
5.3.1 Subjects 40
5.3.2 Experimental Procedure 41
5.3.3 Examinational Questions 43
CHAPTER 6 EXPERIMENTAL RESULTS 44
6.1 Subject Profile 44
6.2 Evaluating SToRe System 46
CHAPTER 7 CONCLUSION AND FUTURE WORK 52
7.1 Conclusion 52
7.2 Research Limitation 53
7.3 Future Work 53
REFERENCES 55
APPENDIX A SUMMARIZATION RESULT 60
APPENDIX B SNAPSHOT OF THE USER INTERFACE IN EXPERIMENTATION 63
APPENDIX C QUESTIONNAIRES 65
APPENDIX D THE EXAMPLE OF QUESTIONS 66
參考文獻 References
Allan, J., Gupta R. and Khandelwal, V. (2001). Temporal Summaries of News Topics. In Proceedings of SIGIR 2001, pages 10-18
Allan, J., Papka, R. and Lavrenko, V. (1998). On-line new event detection and tracking. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Aone, C., Okurowski, M. E., Gorlinsky, J. and Larsen, B. (1997). A scalable summarization system using robust NLP. In Proceedings of the workshop on intelligent scalable text summarization at the 35th meeting of the association for computational linguistics, and the 8th conference of the European chapter of the association for computational linguistics (pp. 66–73).
Azcarraga, A.P. and Yap, T.N., Jr. (2001). Extracting meaningful labels for WEBSOM text archives. Proceedings of the tenth international conference on Information and knowledge management
Barzilay, R., Elhada, No. and Mckeown K.R. (2002). Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research 17 (2002) 35-55
Berghel, H. (1997). Cyberspace 2000: Dealing with information overload. Communications of the ACM February 1997/Vol. 40, No. 2
Brooks B.S., Kennedy, G.D., Moen, R., and Ranly, D. (1996). News Reporting and Writing. NY:St. Martin's Press.
Brown, J. S., & Duguid, P. (2000). The social life of information. Boston: Harvard Business School Press.
Chen, H.-H., Kuo, J.-J., Huang, S.-J., Lin, C.-J. and Wung H.-C. (2003) A summarization system for Chinese news from multiple sources. Journal of the American Society for Information Science and Technology, 54(13): pages 1224—1236
Chen, Y.-J. (2000). Scalable summarization for Chinese text. Master Dissertation, Department of Computer Science, National Tsing Hua University
Chien, L.-F. (1999). PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Information Processing and Management 35 (1999) pages. 5001—521
Chieu, H. L. and Lee Y. K. (2004). Query Based Event Extraction along a Timeline. In Proceedings of the 25th Annual ACM SIGIR Conference.
Chuang, T.-S. (2001). Discovering issue networks using data mining techniques, Master Dissertation, Department of Information Management, National Sun Yat-Sen University
Dittenbach, M., Merkl, D. and Rauber, A. (2000). The growing hierarchical self-organizing map. Proceedings of the International Joint Conference on Neural Networks (IJCNN)
Dittenbach, M., Rauber, A. and Merkl, D. (2002) Uncovering hierarchical structure in data using the growing hierarchical self-organizing map. Neurocomputing 48 (2002) pages. 199—216
Doran, W.P., Stokes, N., Newman, E., Dunnion, J. and Carthy, J. (2004) A hybrid statistical/linguistic model fro generating news story gists. Proceedings of the 27th annual international conference on Research and development in information retrieval
Duygulu, P., Pan, J.-Y. and Forsyth, D. A. (2004), Towards Auto-Documnetary: Tracking the Evolution of News Stories, In Proceedings of the ACM Multimedia Conference
Franz, M. and McCarley, J.S. (2001). Unsupervised and supervised clustering for topic tracking. In Topic Detection and Tracking Workshop
Fritzke, B. (1995) Growing grid: A self-organizing network with constant neighborhood range and adaptation strength. Neural Processing Letters, 2(5)
Goldstein, J., Mittal, V., Carbonell, J. and Callan, J. (2000) Creating and evaluating multi-document sentence extract summaries. In Eighth International Conference on Information Knowledge Management (CIKM'00)
Hatzivassiloglou, V., Gravano, L. and Maganti, A. (2000). An investigation of linguistic features and clustering algorithms for topical document clustering. In Proceedings of the 23rd Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-00), pages 224—231
Henzinger, M., Chang, B.-W., Milch, B. and Brin, S. (2003) Query-Free News Search, In Proceedings International WWW Conference, Budapest, Hungary
Ho, J. and Tang, R. (2001). Towards an optimal resolution to information overload: an infomediary approach. Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work
Hsueh, J.-F. (2002). Learning ontology from web documents for supporting web query. Master Dissertation, Department of Information Management, National Sun Yat-Sen University
Hui, K., Lam, W. and Meng, H.M. (2001). Automatic event generation from multi-lingual news stories. International Conference on Digital Libraries Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries
Ichiro, I., Hiroshi, M. and Norio, K. (2003). Threading news video topics, Proc. of 5th ACM SIGMM International Workshop on Multimedia Information Retrieval
Jardine, N. and Rijsbergen C.J. Van (1971). The use of hierarchical clustering in information retrieval, Information Storage and Retrieval, 7:217-240
Kaski, S. (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. Proceedings of IJCNN’98, 1998 IEEE International Joint Conference on Neural Networks
Kaski, S., Honkela, T., Lagus, K. And Kohonen, T. (1998) WEBSOM – self-organinizing maps of document collections. Neurocomputing 21 (1998) pages. 101—117
Kleinberg, J. (2002). Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Kohonen, T.(1997). Self-Organizing Maps 2nd Edition, Springer-Verlag Berlin Heidelberg New York
Kohonen, T., Kaski, S., Lagus, K., Saloj
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code