博碩士論文 etd-0626116-121217 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 郭竣齊(Chun-Chi Kuo) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Information Management)
畢業學位 碩士(Master) 畢業時期 104學年第2學期
論文名稱(中) 從新聞挖掘社會議題事件之研究
論文名稱(英) Research on Detecting Emerging Events From News Data
檔案
  • etd-0626116-121217.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:2 年後公開 (2018-07-28 公開)

    電子論文:使用者自訂權限:校內 2 年後、校外 2 年後公開

    論文語文/頁數 英文/51
    統計 本論文已被瀏覽 5083 次,被下載 0 次
    摘要(中) 現今的網路時代裡,網路上的資訊隨著時間源源不絕得被產生出來,例如:網路新聞。因此從網路上大量的資料串流萃取出重要的事件並找出重要社會議題的趨勢是一個重要的研究課題。為了解決這個問題,此研究提出了一個結合中文文字探勘與主題模型的方法從網路上公開的網路新聞自動化偵測社會事件的方法。
    為了驗證我們提出的方法,我們以著名的「蘋果日報」網路新聞為例,以關鍵字「陸客」找出從2008到2015年與大陸觀光客相關的新聞,首先對原始資料透過中文自然語言處理的方法進行前處理,並以在線潛在狄氏分配(Online-LDA)主題模型進行建模,找出隨著時間的變化較大的時間區段,進而萃取出其中發生的新事件。我們設計了一個實驗去驗證我們方法的所找出來的新事件的正確性,其結果顯示我們的方法可以有效地針對新出現的中文新聞進行新事件的事件偵測。
    摘要(英) Nowadays, the Internet provides diversified information. The enormous amounts of information such as online news are generated continuously as time goes by. The rapid-growth amount of online news makes it difficult to manually identify new and emerging events. Thereby, to solve this problem, we propose an approach using text mining techniques and topic modelling to detect the new events from broadcasting Chinese news sources automatically.
    To evaluate our method, we select our dataset from scrapping the Chinese news website of “AppleDaily” from 2008 to 2015, where each news articles of the corpus contains the keyword about Tourists from China. We use Chinese Natural Language Processing tool to preprocess our initial data. We implement Online-LDA topic model to find out new events. In the end, we conduct an experiment to measure the performance of our proposed method. The experimental results show that our proposed online event detection method is effective in detecting and tracking Chinese new events as news arrived in streams.
    關鍵字(中)
  • 文字探勘
  • 主題模型
  • 潛在狄氏分配
  • 事件偵測
  • 中文自然語言處理
  • 關鍵字(英)
  • Topic Model
  • Online-LDA
  • Event Detection
  • Text mining
  • Chinese Natural Language Processing
  • 論文目次 CHAPTER 1 - Introduction+1
    1.1 Background+1
    1.2 Motivation+3
    CHAPTER 2 - Related Work+5
    2.1 Chinese Natural Language Processing+5
    2.2 Topic Model–Online LDA+7
    2.3 Event Detection and Tracking Systems+9
    CHAPTER 3 –The Proposed Approach+11
    3.1 Research Skeleton+11
    3.2 Data Preprocessing and News Schema Construction+13
    3.3 Online LDA clustering+15
    3.4 Jensen–Shannon Divergence+18
    3.5 Emerging Terms Extracting+19
    3.6 Emerging Terms Clustering+20
    3.7 New Even Detecting+22
    CHAPTER 4 – Dataset and Experiment+24
    4.1 Dataset+24
    4.2 Dataset Preprocessing+24
    4.3 Event Detection+25
    4.4 Experiment Settings+28
    CHAPTER 5 – Evaluation Results+30
    5.1 Experiment Result+30
    5.2 Discussion+33
    CHAPTER 6 – Conclusion+38
    Reference+40
    參考文獻 Allan, J., Allan, J., Papka, R., Papka, R., Lavrenko, V., & Lavrenko, V. (1998). On-line New Event Detection and Tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 37–45. http://doi.org/10.1.1.45.9162
    AlSumait, L., Barbará, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. Proceedings - IEEE International Conference on Data Mining, ICDM, 3–12. http://doi.org/10.1109/ICDM.2008.140
    Becker, H. (2011). Identification and Characterization of Events in Social Media.
    Blei, D. M., & Lafferty, J. D. (2006). Dynamic Topic Models. International Conference on Machine Learning, 113–120. http://doi.org/10.1145/1143844.1143859
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(4-5), 993–1022. http://doi.org/10.1162/jmlr.2003.3.4-5.993
    Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(1), 993–1022.
    Cataldi, M., Torino, U., Caro, L. Di, & Schifanella, C. (2010). Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation. Mdmkdd’10, 1–10. http://doi.org/10.1145/1814245.1814249
    Culotta, A. (2010). Towards detecting influenza epidemics by analyzing Twitter messages. 1st Workshop on Social Media Analytics, (May), 115–122. http://doi.org/10.1145/1964858.1964874
    Diao, Q. (2012). Finding Bursty Topics From Microblogs, (July), 8–14.
    Fung, G. P. C., Yu, J. X., Yu, P. S., & Lu, H. (2005). Parameter free bursty events detection in text streams. VLDB ’05 Proceedings of the 31st International Conference on Very Large Data Bases, 1, 181–192. http://doi.org/10.1.1.60.2671
    He, Q., Chang, K., & Lim, E.-P. (2007). Analyzing feature trajectories for event detection. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’07, 207. http://doi.org/10.1145/1277741.1277779
    Hoffman, M. D., Blei, D. M., & Bach, F. (2010). Online Learning for Latent Dirichlet Allocation. Advances in Neural Information Processing Systems, 23, 1–9. http://doi.org/10.1145/1835804.1835928
    Kummerfeld, J. K., Tse, D., Curran, J. R., & Klein, D. (2013). An Empirical Examination of Challenges in Chinese Parsing. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 98–103.
    Lafferty, D. M. B. and J. D. (2006). Correlated Topic Models. Advances in Neural Information Processing Systems 18, 147–154. http://doi.org/10.1145/1143844.1143859
    Landauer, T. K., Dutnais, S. T., Anderson, R., Carroll, D., Fbltz, P., Pumas, G., … Streeter, L. (1997). A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review, 1(2), 211–240. http://doi.org/10.1037/0033-295X.104.2.211
    Lau, J., Collier, N., & Baldwin, T. (2012). On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online. International Conference on Computational Linguistics (COLING), 2(December), 1519–1534. Retrieved from https://www.aclweb.org/anthology/C/C12/C12-1093.pdf
    Levy, R., & Manning, C. (2003). Is it harder to parse Chinese, or the Chinese Treebank? Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL ’03, 1, 439–446. http://doi.org/10.3115/1075096.1075152
    Lin, J. (1991). Divergence Measures on the Shannon Entropy. IEEE Transactions on Information Theory, 37(I), 145–151.
    Ma, W.-Y., & Chen, K.-J. (2003). Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing -, 17, 168–171. http://doi.org/10.3115/1119250.1119276
    Osborne, M., Petrovic, S., & McCreadie, R. (2012). Bieber no more: First Story Detection using Twitter and Wikipedia. SIGIR 2012 Workshop on Time-Aware Information Access, (June).
    Peng, F., Feng, F., & McCallum, A. (2004). Chinese Segmentation and New Word Detection using Conditional Random Fields. Proceedings of Coling 2004: The 20th International Conference on Computational Linguistics, 562–568. http://doi.org/10.3115/1220355.1220436
    Petrović, S., Osborne, M., & Lavrenko, V. (2010). Streaming first story detection with application to twitter. NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, 181–189. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0-80053272732&partnerID=tZOtx3y1
    Qian, X., & Liu, Y. (2012). Joint Chinese word segmentation, POS tagging and parsing. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), (July), 501–511. Retrieved from http://dl.acm.org/citation.cfm?id=2391007
    Reuter, T., Papadopoulos, S., Petkos, G., Mezaris, V., Kompatsiaris, Y., Cimiano, P., … Geva, S. (2013). Social event detection at MediaEval 2013: Challenges, datasets, and evaluation. CEUR Workshop Proceedings, 1043, 1–2.
    Sun, W. (2010). Word-based and Character-based Word Segmentation Models: Comparison and Combination. Coling 2010: Posters, (August), 1211–1219. Retrieved from http://www.aclweb.org/anthology/C10-2139
    Wang, M., Voigt, R., & Manning, C. D. (2014). Two Knives Cut Better Than One: Chinese Word Segmentation with Dual Decomposition. Acl, 193–198. Retrieved from http://www.aclweb.org/anthology/P/P14/P14-2032
    Xue, N. (2003). Chinese Word Segmentation as Character Tagging. Computational Linguistics and Chinese Language Processing, 8(1), 29–48. http://doi.org/10.3115/1119250.1119278
    Yang, Y., Pierce, T., & Carbonell, J. (1998). A Study of Retrospective and On-line Event Detection. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’98, 28–36. http://doi.org/10.1145/290941.290953
    Zhang, Y., & Clark, S. (2007). Chinese Segmentation with a Word-Based Perceptron Algorithm. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, (June), 840–847. Retrieved from http://www.aclweb.org/anthology/P07-1106
    Zhao, W., Shu, B., Jiang, J., & Song, Y. (2012). Identifying event-related bursts via social media activities. Proceedings of the 2012 …, (July), 1466–1477. Retrieved from http://dl.acm.org/citation.cfm?id=2391116
    口試委員
  • 魏志平 - 召集委員
  • 康藝晃 - 委員
  • 黃三益 - 指導教授
  • 口試日期 2016-07-08 繳交日期 2016-07-28

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫