論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available
論文名稱 Title |
利用連結分類辨認相反立場群組 Identifying groups with opposite stances using link-based categorization |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
89 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2005-07-12 |
繳交日期 Date of Submission |
2005-07-15 |
關鍵字 Keywords |
分類、最大切集、連結、語意傾向、超圖形 hypergraph, categorization, semantic orientation, link, max-cut |
||
統計 Statistics |
本論文已被瀏覽 5701 次,被下載 0 次 The thesis/dissertation has been browsed 5701 times, has been downloaded 0 times. |
中文摘要 |
本研究提出一個以連結為基礎的分類方法來辨認部落格社群中,支持和反對某一特定議題的群組。我們將部落格中討論的互動行為建構成一個圖,參與其中的討論者被視為圖中的一個點;緊接著根據語意傾向來建立可能的相反意見連結,相反立場的討論者間會建立起一條連結。本研究使用最大切集的演算法來找出最有可能的正、反意見雙方。使用語意傾向的分類結果與使用本研究所提出簡單連結方法的分類結果作比較;另外,簡單連結方法的分類結果也與利用超圖形強化連結方法的結果作比較。 |
Abstract |
This thesis proposes a link-based approach to identify supporting and opposing groups in a Weblog community. We formulate the interaction behavior as a graph. Bloggers involved in the discussion of one specific issue are formulated as vertices. Semantic orientation is used to construct possible opposite opinion links. Bloggers with opposite stances will form an opposite link. A max-cut algorithm is used latter to obtain the optimal approximation of supporting and opposing groups. The categorization results are compared between semantic orientation classifier and simple link-based categorization. The simple link-based categorization compares then with the enhancement of link-based categorization using hypergraph. |
目次 Table of Contents |
Abstract iii 中文摘要 iv Table of Contents v Lists of Figures vii Lists of Tables viii Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Research Motivation 2 1.3 Research Objectives 3 1.4 Thesis Organization 4 Chapter 2 Literature Review 5 2.1 Weblogs 5 2.2 Text-based Document Categorization 7 2.2.1 Preprocessing 8 2.2.2 Constructing Classifier 11 2.3 Graph Theory 13 2.3.1 Hypergraph 13 2.3.2 Bipartite Graph and Max-cut Algorithm 16 2.4 Semantic Orientation 17 Chapter 3 The Link-based Categorization 19 3.1 Problem Definition 19 3.2 Categorization Process 20 3.3 Data Collection and Preprocessing 23 3.4 Issue Clustering 26 3.5 Semantic Orientation 30 3.5.1 General Inquirer 31 3.5.2 Semantic Orientation Adaptation 32 Chapter 4 Enhancement and Max-cut Algorithm 36 4.1 Enhancement 36 4.1.1 Enhancing Algorithm 37 4.2 Max-cut Algorithm 41 Chapter 5 Experiment Results 44 5.1 Evaluating Criteria 44 5.2 Results 45 Chapter 6 Conclusion and Research Limitation 50 References 53 Appendix A. A POS Tag List 57 Appendix B. A List of The Top 300 Terms with The Highest tf*idf value 58 Appendix C. The List of the Issue Clustering Result 60 Appendix D. A List of General Inquirer for Categories of Positive and Negative 63 |
參考文獻 References |
Agrawal, R., Rajagopalan S., Srikant R., & Xu Y. (2003). Mining newsgroups using networks arising from social behavior. Proceedings of the Twelfth International Conference on World Wide Web, 529-538. Berge, C. (1989). Hypergraphs: combinatorics of finite sets. North Holland Publisher. Berge, C. (1973). Graphs and Hypergraphs. North Holland Publisher. Church, K. W., & Hanks, P. (1989). Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Conference of the Association of Computer Linguistics, 76-83. Dittenbach, M., Merkl, D., & Rauber, A. (2000). The Growing Hierarchical Self-Organizing Map. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. Proceedings of the SeventhIinternational Conference on Information and Knowledge Management, 148-155. Goemans, X. M., & Williamson, P. D. (1994). .879-approximation algorithm for MAX CUT and MAX2SAT. Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing, 422-431. Grimaldi, P. R. (1985). Discrete and Combinatorial Mathematics : An Applied Introduction. Addison-Wesley. Han, E., Karypis, G., Kumar, V., & Mobasher, B. (1998). Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results. Data Engineering Bulletin, 21(2), 15-22. Han, J., & Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers. Herring, C. S., Scheidt, L., Bonus, S., & Wright, E. (2004). Bridging the gap: a genre analysis of weblogs. Proceedings of the 37th Annual Hawaii International Conference on System Sciences. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings the tenth European Conference on Machine Learning. Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partioning irregular graphs. SIAM Journal on Scientific Computing, 20(1), 359-392. Kohonen, T. (1982). Self-organizing formation of topologically correct feature maps. Biological Cybernetics, 43(1), 59-69. Krishnamurthy, S. (2002). The dimensionality of blog conversations: the virtual enactment of September 11. In Maastricht, The Netherlands: Internet Research 3.0. Lewis, D. D., & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval. Manning, C., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press. Miller, R. (2004). WebSPHINX: A personal, customizable Web crawler. http://www-2.cs.cmu.edu/~rcm/websphinx/ Rauber, A., Merkl, D., Dittenbach, M., & Pampalk, E. (2004). GHSOM: The Growing Hierarchical Self-Organizing Map. http://www.ifs.tuwien.ac.at/~andi/ghsom/ Rijsbergen, C. J. van (1979). Information Retrieval. Butterworth 1979. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management. Semin, G. R., & Fiedler, K. (1988). The cognitive functions of linguistic categories in describing persons: Social cognition and language. Journal of Personality and Social Psychology, 54, 558-568. Stone, P. J. (2005). General Inquirer: A computer-assisted approach for content analysis of textual data. http://www.wjh.harvard.edu/~inquirer/ Toutanova, K., & Manning, C. D. (2000). Enriching the knowledge sources used in maximum entropy part-of-speech tagger. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 63-70. Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with cyclic dependency network. Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics, 252-259. Turney, P. D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning. Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: inference of semantic orientation from association. ACM Transaction on Information System, 21(4), 315-346. Vapnik, N. V. (1995). The Nature of Statistical Learning Theory. Berlin: Springer-Verlag. Wiener, E., Pederson, O. J., & Weigend, S. A. (1995). A neural network approach to topic spotting. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval. Yang, Y., & Pedersen, O. J. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, 412-420. Yandell, H., & Heath, S. (2003). GenJavaCore: An extensive string library from generationjava.com |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外均不公開 not available 開放時間 Available: 校內 Campus:永不公開 not available 校外 Off-campus:永不公開 not available 您的 IP(校外) 位址是 18.224.32.86 論文開放下載的時間是 校外不公開 Your IP address is 18.224.32.86 This thesis will be available to you on Indicate off-campus access is not available. |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |