Responsive image
博碩士論文 etd-0715105-212511 詳細資訊
Title page for etd-0715105-212511
論文名稱
Title
利用連結分類辨認相反立場群組
Identifying groups with opposite stances using link-based categorization
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
89
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2005-07-12
繳交日期
Date of Submission
2005-07-15
關鍵字
Keywords
分類、最大切集、連結、語意傾向、超圖形
hypergraph, categorization, semantic orientation, link, max-cut
統計
Statistics
本論文已被瀏覽 5699 次,被下載 0
The thesis/dissertation has been browsed 5699 times, has been downloaded 0 times.
中文摘要
本研究提出一個以連結為基礎的分類方法來辨認部落格社群中,支持和反對某一特定議題的群組。我們將部落格中討論的互動行為建構成一個圖,參與其中的討論者被視為圖中的一個點;緊接著根據語意傾向來建立可能的相反意見連結,相反立場的討論者間會建立起一條連結。本研究使用最大切集的演算法來找出最有可能的正、反意見雙方。使用語意傾向的分類結果與使用本研究所提出簡單連結方法的分類結果作比較;另外,簡單連結方法的分類結果也與利用超圖形強化連結方法的結果作比較。
Abstract
This thesis proposes a link-based approach to identify supporting and opposing groups in a Weblog community. We formulate the interaction behavior as a graph. Bloggers involved in the discussion of one specific issue are formulated as vertices. Semantic orientation is used to construct possible opposite opinion links. Bloggers with opposite stances will form an opposite link. A max-cut algorithm is used latter to obtain the optimal approximation of supporting and opposing groups. The categorization results are compared between semantic orientation classifier and simple link-based categorization. The simple link-based categorization compares then with the enhancement of link-based categorization using hypergraph.
目次 Table of Contents
Abstract iii
中文摘要 iv
Table of Contents v
Lists of Figures vii
Lists of Tables viii

Chapter 1 Introduction 1
1.1 Research Background 1
1.2 Research Motivation 2
1.3 Research Objectives 3
1.4 Thesis Organization 4
Chapter 2 Literature Review 5
2.1 Weblogs 5
2.2 Text-based Document Categorization 7
2.2.1 Preprocessing 8
2.2.2 Constructing Classifier 11
2.3 Graph Theory 13
2.3.1 Hypergraph 13
2.3.2 Bipartite Graph and Max-cut Algorithm 16
2.4 Semantic Orientation 17
Chapter 3 The Link-based Categorization 19
3.1 Problem Definition 19
3.2 Categorization Process 20
3.3 Data Collection and Preprocessing 23
3.4 Issue Clustering 26
3.5 Semantic Orientation 30
3.5.1 General Inquirer 31
3.5.2 Semantic Orientation Adaptation 32
Chapter 4 Enhancement and Max-cut Algorithm 36
4.1 Enhancement 36
4.1.1 Enhancing Algorithm 37
4.2 Max-cut Algorithm 41
Chapter 5 Experiment Results 44
5.1 Evaluating Criteria 44
5.2 Results 45
Chapter 6 Conclusion and Research Limitation 50
References 53
Appendix A. A POS Tag List 57
Appendix B. A List of The Top 300 Terms with The Highest tf*idf value 58
Appendix C. The List of the Issue Clustering Result 60
Appendix D. A List of General Inquirer for Categories of Positive and Negative 63
參考文獻 References
Agrawal, R., Rajagopalan S., Srikant R., & Xu Y. (2003). Mining newsgroups using networks arising from social behavior. Proceedings of the Twelfth International Conference on World Wide Web, 529-538.
Berge, C. (1989). Hypergraphs: combinatorics of finite sets. North Holland Publisher.
Berge, C. (1973). Graphs and Hypergraphs. North Holland Publisher.
Church, K. W., & Hanks, P. (1989). Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Conference of the Association of Computer Linguistics, 76-83.
Dittenbach, M., Merkl, D., & Rauber, A. (2000). The Growing Hierarchical Self-Organizing Map. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. Proceedings of the SeventhIinternational Conference on Information and Knowledge Management, 148-155.
Goemans, X. M., & Williamson, P. D. (1994). .879-approximation algorithm for MAX CUT and MAX2SAT. Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing, 422-431.
Grimaldi, P. R. (1985). Discrete and Combinatorial Mathematics : An Applied Introduction. Addison-Wesley.
Han, E., Karypis, G., Kumar, V., & Mobasher, B. (1998). Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results. Data Engineering Bulletin, 21(2), 15-22.
Han, J., & Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.
Herring, C. S., Scheidt, L., Bonus, S., & Wright, E. (2004). Bridging the gap: a genre analysis of weblogs. Proceedings of the 37th Annual Hawaii International Conference on System Sciences.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings the tenth European Conference on Machine Learning.
Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partioning irregular graphs. SIAM Journal on Scientific Computing, 20(1), 359-392.
Kohonen, T. (1982). Self-organizing formation of topologically correct feature maps. Biological Cybernetics, 43(1), 59-69.
Krishnamurthy, S. (2002). The dimensionality of blog conversations: the virtual enactment of September 11. In Maastricht, The Netherlands: Internet Research 3.0.
Lewis, D. D., & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval.
Manning, C., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
Miller, R. (2004). WebSPHINX: A personal, customizable Web crawler.
http://www-2.cs.cmu.edu/~rcm/websphinx/
Rauber, A., Merkl, D., Dittenbach, M., & Pampalk, E. (2004). GHSOM: The Growing Hierarchical Self-Organizing Map.
http://www.ifs.tuwien.ac.at/~andi/ghsom/
Rijsbergen, C. J. van (1979). Information Retrieval. Butterworth 1979.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management.
Semin, G. R., & Fiedler, K. (1988). The cognitive functions of linguistic categories in describing persons: Social cognition and language. Journal of Personality and Social Psychology, 54, 558-568.
Stone, P. J. (2005). General Inquirer: A computer-assisted approach for content analysis of textual data.
http://www.wjh.harvard.edu/~inquirer/
Toutanova, K., & Manning, C. D. (2000). Enriching the knowledge sources used in maximum entropy part-of-speech tagger. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 63-70.
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with cyclic dependency network. Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics, 252-259.
Turney, P. D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning.
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: inference of semantic orientation from association. ACM Transaction on Information System, 21(4), 315-346.
Vapnik, N. V. (1995). The Nature of Statistical Learning Theory. Berlin: Springer-Verlag.
Wiener, E., Pederson, O. J., & Weigend, S. A. (1995). A neural network approach to topic spotting. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval.
Yang, Y., & Pedersen, O. J. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, 412-420.
Yandell, H., & Heath, S. (2003). GenJavaCore: An extensive string library from generationjava.com
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.23.103.87
論文開放下載的時間是 校外不公開

Your IP address is 3.23.103.87
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code