Responsive image
博碩士論文 etd-0715105-212511 詳細資訊
Title page for etd-0715105-212511
Identifying groups with opposite stances using link-based categorization
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
hypergraph, categorization, semantic orientation, link, max-cut
本論文已被瀏覽 5707 次,被下載 0
The thesis/dissertation has been browsed 5707 times, has been downloaded 0 times.
This thesis proposes a link-based approach to identify supporting and opposing groups in a Weblog community. We formulate the interaction behavior as a graph. Bloggers involved in the discussion of one specific issue are formulated as vertices. Semantic orientation is used to construct possible opposite opinion links. Bloggers with opposite stances will form an opposite link. A max-cut algorithm is used latter to obtain the optimal approximation of supporting and opposing groups. The categorization results are compared between semantic orientation classifier and simple link-based categorization. The simple link-based categorization compares then with the enhancement of link-based categorization using hypergraph.
目次 Table of Contents
Abstract iii
中文摘要 iv
Table of Contents v
Lists of Figures vii
Lists of Tables viii

Chapter 1 Introduction 1
1.1 Research Background 1
1.2 Research Motivation 2
1.3 Research Objectives 3
1.4 Thesis Organization 4
Chapter 2 Literature Review 5
2.1 Weblogs 5
2.2 Text-based Document Categorization 7
2.2.1 Preprocessing 8
2.2.2 Constructing Classifier 11
2.3 Graph Theory 13
2.3.1 Hypergraph 13
2.3.2 Bipartite Graph and Max-cut Algorithm 16
2.4 Semantic Orientation 17
Chapter 3 The Link-based Categorization 19
3.1 Problem Definition 19
3.2 Categorization Process 20
3.3 Data Collection and Preprocessing 23
3.4 Issue Clustering 26
3.5 Semantic Orientation 30
3.5.1 General Inquirer 31
3.5.2 Semantic Orientation Adaptation 32
Chapter 4 Enhancement and Max-cut Algorithm 36
4.1 Enhancement 36
4.1.1 Enhancing Algorithm 37
4.2 Max-cut Algorithm 41
Chapter 5 Experiment Results 44
5.1 Evaluating Criteria 44
5.2 Results 45
Chapter 6 Conclusion and Research Limitation 50
References 53
Appendix A. A POS Tag List 57
Appendix B. A List of The Top 300 Terms with The Highest tf*idf value 58
Appendix C. The List of the Issue Clustering Result 60
Appendix D. A List of General Inquirer for Categories of Positive and Negative 63
參考文獻 References
Agrawal, R., Rajagopalan S., Srikant R., & Xu Y. (2003). Mining newsgroups using networks arising from social behavior. Proceedings of the Twelfth International Conference on World Wide Web, 529-538.
Berge, C. (1989). Hypergraphs: combinatorics of finite sets. North Holland Publisher.
Berge, C. (1973). Graphs and Hypergraphs. North Holland Publisher.
Church, K. W., & Hanks, P. (1989). Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Conference of the Association of Computer Linguistics, 76-83.
Dittenbach, M., Merkl, D., & Rauber, A. (2000). The Growing Hierarchical Self-Organizing Map. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. Proceedings of the SeventhIinternational Conference on Information and Knowledge Management, 148-155.
Goemans, X. M., & Williamson, P. D. (1994). .879-approximation algorithm for MAX CUT and MAX2SAT. Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing, 422-431.
Grimaldi, P. R. (1985). Discrete and Combinatorial Mathematics : An Applied Introduction. Addison-Wesley.
Han, E., Karypis, G., Kumar, V., & Mobasher, B. (1998). Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results. Data Engineering Bulletin, 21(2), 15-22.
Han, J., & Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.
Herring, C. S., Scheidt, L., Bonus, S., & Wright, E. (2004). Bridging the gap: a genre analysis of weblogs. Proceedings of the 37th Annual Hawaii International Conference on System Sciences.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings the tenth European Conference on Machine Learning.
Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partioning irregular graphs. SIAM Journal on Scientific Computing, 20(1), 359-392.
Kohonen, T. (1982). Self-organizing formation of topologically correct feature maps. Biological Cybernetics, 43(1), 59-69.
Krishnamurthy, S. (2002). The dimensionality of blog conversations: the virtual enactment of September 11. In Maastricht, The Netherlands: Internet Research 3.0.
Lewis, D. D., & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval.
Manning, C., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
Miller, R. (2004). WebSPHINX: A personal, customizable Web crawler.
Rauber, A., Merkl, D., Dittenbach, M., & Pampalk, E. (2004). GHSOM: The Growing Hierarchical Self-Organizing Map.
Rijsbergen, C. J. van (1979). Information Retrieval. Butterworth 1979.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management.
Semin, G. R., & Fiedler, K. (1988). The cognitive functions of linguistic categories in describing persons: Social cognition and language. Journal of Personality and Social Psychology, 54, 558-568.
Stone, P. J. (2005). General Inquirer: A computer-assisted approach for content analysis of textual data.
Toutanova, K., & Manning, C. D. (2000). Enriching the knowledge sources used in maximum entropy part-of-speech tagger. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 63-70.
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with cyclic dependency network. Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics, 252-259.
Turney, P. D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning.
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: inference of semantic orientation from association. ACM Transaction on Information System, 21(4), 315-346.
Vapnik, N. V. (1995). The Nature of Statistical Learning Theory. Berlin: Springer-Verlag.
Wiener, E., Pederson, O. J., & Weigend, S. A. (1995). A neural network approach to topic spotting. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval.
Yang, Y., & Pedersen, O. J. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, 412-420.
Yandell, H., & Heath, S. (2003). GenJavaCore: An extensive string library from
電子全文 Fulltext
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是
論文開放下載的時間是 校外不公開

Your IP address is
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
開放時間 available 已公開 available

QR Code