博碩士論文 etd-0627116-214010 詳細資訊

[回到前頁查詢結果 | 重新搜尋]

姓名 羅珮綺(Pei-chi Lo) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Information Management)
畢業學位 碩士(Master) 畢業時期 104學年第2學期
論文名稱(中) 基於修辭結構的評論品質分析
論文名稱(英) Quality Analysis of User Reviews Using Discourse Structure
  • etd-0627116-214010.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。

    紙本論文:2 年後公開 (2018-07-28 公開)

    電子論文:使用者自訂權限:校內 2 年後、校外 2 年後公開

    論文語文/頁數 英文/57
    統計 本論文已被瀏覽 5040 次,被下載 0 次
    摘要(中) Web 2.0的興起,帶動大眾言論的新浪潮:越來越多人在網際網路上發表想法、意見,甚至是對於特定商品的評論。在這麼大量的文字資料中,除了真實並且有參考價值的資料之外,也有可能包含惡意的垃圾訊息,或者是沒有意義的文字。因此,要如何判斷文字的品質變成了很重要的課題之一,這也成為文字探勘領域的熱門議題。
    在過去的研究中通常只以文字詞語的特性作為分析的標的,但這樣的分析方式只能觀察到文字詞語的組合,或是評論整體長短等數據型的資料,並沒有辦法對於文章整體的架構及內容有所了解。修辭結構理論(Rhetorical Structure Theory, RST)透過分析子句和子句之間的關係,將文章轉換成包含階層特性的樹狀結構,讓我們對於整個文章結構有全面性的了解。我們加入修辭結構理論作為分析的標的,使文章品質分析的結果更精確,也更符合人類語言的特性。
    我們提出一個文字品質分析流程,使用了NLTK(Natural Language Toolkit)進行自然語言處理,找出可能影響文字品質的影響因素,並使用不同的分類模型分析Amazon購物網站的真實使用者評論。我們比較了加入RST特徵前和加入後的結果,實驗證明我們提出的模型比起單純採用文字詞語的模型有顯著的進步。
    摘要(英) The emergence of Web 2.0 has led to a new era of user generated content. More and more people tend to share their thinking, opinion and even user review of a specific product. Among all these textual data, not only useful paragraphs that are worth considering are in-cluded, there are also malicious comments and spams. Thus, how to determine the quality of text has become a serious problem, and a popular issue in the domain of text mining.
    Most of the existing studies use the characteristic of tokens as the analysis target, which cannot represent the comprehensive structure of a document, but only the combination of words, or statistical data such as the length of the document. Rhetorical Structure Theory (RST) transform a document to a tree that have the characteristic of level according to the relation between text spans. By incorporating the concept of RST, we can have a more accu-rate result on analyzing text quality.
    We proposed a process to analyze the quality of text, using NLTK (Natural Language Toolkit) to process the text, finding the potential predictor, and use different classification model to analyze real user reviews from Amazon.com. Also, an evaluation process is con-ducted to compare the result before and after adding RST features. Experiment shows that our model outperforms previous models which only consider the structure of word tokens.
  • 修辭結構理論
  • 文字探勘
  • 自然語言處理
  • 文字品質分析
  • 使用者評論分析
  • 關鍵字(英)
  • Text Mining
  • Rhetorical Structure Theory
  • User Review
  • Quality Analysis
  • Natural Language Processing
  • 論文目次 Chapter 1 -Introduction 1
    1.1 Background and Motivation 1
    1.2 Research Purpose 2
    1.3 Expected Results and Contribution 2
    1.4 Thesis Organization 3
    Chapter 2 -Literature Review 4
    2.1 Rhetorical Structure 4
    2.2 Rhetorical Structure Theory In The Domain Of Text Mining 8
    2.3 Quality analysis 9
    Chapter 3 -Problem Definition 11
    3.1 Preliminaries 11
    3.2 Problem Description 13
    Chapter 4 -Research Approach 15
    4.1 Research Architecture 15
    4.2 Data Preparation 17
    4.3 Natural Language Processing 19
    4.4 Discourse Parsing 20
    4.5 Feature Building 24
    4.6 Quality Determination 28
    Chapter 5 -Experiments 30
    5.1 Dataset 30
    5.2 Experimental Design 33
    5.3 Experiment Result 34
    5.3.1 NLP/ Discourse Parsing Result 34
    5.3.2 Prediction of Quality 36
    5.3.3 RST Weighting Scheme 38
    Chapter 6 -Conclusion 43
    Chapter 7 -References 45
    參考文獻 Bash, E. (2015). Natural Language processing with python. PhD Proposal (Vol. 1). http://doi.org/10.1017/CBO9781107415324.004
    Berkeley, U. C., Joshua, J. J., Peace, E., & Iii, W. J. (2012). Stative Adjectives and Verbs in English, (1), 926–929.
    Bott, R. (2014). Use of Discourse Knowledge to Improve Lexicon-based Sentiment Analysis. Igarss 2014, (1), 1–5. http://doi.org/10.1007/s13398-014-0173-7.2
    DeVurie, D. & Prendinger, H. (2009). A Novel Discourse Parser Based on Support Vector Machine Classification. Proceedings of ACL ’09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2(August), 665–673. http://doi.org/10.3115/1690219.1690239
    De Rainville, F.-M., Fortin, F.-A., Gardner, M.-A., Parizeau, M., & Gagne, C. (2012). {DEAP} - Enabling Nimbler Evolutions. SIGEvolution Newsletter of the ACM Special Interest Group on Genetic and Evolutionary Computation, 6(2), 17–26. Retrieved from https://github.com/DEAP/notebooks
    Dellarocas, C. (2003). The digitization of word-of-mouth: promise and challenges of online reputation mechanisms. Management Science, (December), 1–38. http://doi.org/10.1287/mnsc.49.10.1407.17308
    Fang, H., Lu, W., Wu, F., Zhang, Y., Shang, X., Shao, J., & Zhuang, Y. (2015). Topic aspect-oriented summarization via group selection. Neurocomputing, 149(PC), 1613–1619. http://doi.org/10.1016/j.neucom.2014.08.031
    Feng, V. W., & Hirst, G. (2014a). A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 511–521).
    Feng, V. W., & Hirst, G. (2014b). Two-pass Discourse Segmentation with Pairing and Global Features. ArXiv E-Prints, 1407.8215. Retrieved from http://arxiv.org/abs/1407.8215
    Gagn, C. (2012). DEAP : Evolutionary Algorithms Made Easy. Journal of Machine Learning Research, 13, 2171–2175. http://doi.org/
    Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498–1512. http://doi.org/10.1109/TKDE.2010.188
    Heerschop, B., Goossen, F., Hogenboom, A., Frasincar, F., Kaymak, U., & De Jong, F. (2011). Polarity analysis of texts using discourse structure. Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), 1061–1070. http://doi.org/10.1145/2063576.2063730
    Hogenboom, A., Frasincar, F., de Jong, F., & Kaymak, U. (2015). Using Rhetorical Structure in Sentiment Analysis. Commun. ACM, 58(7), 69–77. http://doi.org/10.1145/2699418
    Ittoo, A., & Prof, A. (n.d.). Predicting Review Helpfulness A Machine Learning & Natural Language Processing based Approach Background • Online reviews.
    Joty, S., & Ng, R. T. (2015). CODRA : A Novel Discriminative Framework for Rhetorical Analysis. Computational Linguistics, (January), 1–50.
    Korfiatis, N., García-Bariocanal, E., & Sánchez-Alonso, S. (2012). Evaluating content quality and helpfulness of online product reviews: The interplay of review helpfulness vs. review content. Electronic Commerce Research and Applications, 11(3), 205–217. http://doi.org/10.1016/j.elerap.2011.10.003
    Krishnamoorthy, S. (2015). Linguistic features for review helpfulness prediction. Expert Systems with Applications, 42(7), 3751–3759. http://doi.org/10.1016/j.eswa.2014.12.044
    Li, F., Liu, N., Jin, H., Zhao, K., Yang, Q., & Zhu, X. (2011). Incorporating reviewer and product information for review rating prediction. IJCAI International Joint Conference on Artificial Intelligence, 1820–1825. http://doi.org/10.5591/978-1-57735-516-8/IJCAI11-305
    Liu, J., Cao, Y., Lin, C.-Y., Huang, Y., & Zhou, M. (2007). Low-Quality Product Review Detection in Opinion Summarization. Computational Linguistics, (June), 334–342. Retrieved from http://acl.ldc.upenn.edu/D/D07/D07-1035.pdf
    Lu, Y., Tsaparas, P., Ntoulas, A., & Polanyi, L. (2010). Exploiting social context for review quality prediction. Proceedings of the 19th International Conference on World Wide Web - WWW ’10, 691–700. http://doi.org/10.1145/1772690.1772761
    Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. http://doi.org/10.1515/text.1.1988.8.3.243
    McAuley, J., Targett, C., Shi, Q., & Hengel, A. Van Den. (2015). Image-based Recommendations on Styles and Substitutes. Proceeding of 38th ACM SIGIR, 1–11. http://doi.org/10.1145/2766462.2767755
    Moghaddam, S., Jamali, M., & Ester, M. (2011). Review recommendation: personalized prediction of the quality of online reviews. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, 2249–2252. http://doi.org/10.1145/2063576.2063938
    Mudambi, S. M., & Schuff, D. (2010). What Makes a Helpful Online Review? a Study of Customer Reviews on Amazon.Com 1, 34(1), 185–200. Retrieved from http://ssrn.com/abstract=2175066
    Otterbacher, J., & Arbor, A. (2009). “ Helpfulness ” in Online Communities : A Measure of Message Quality. Proceedings of the 27th International Conference on Human Factors in Computing Systems - CHI ’09, 955–964. http://doi.org/10.1145/1518701.1518848
    Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 3(1), 115–124. http://doi.org/10.3115/1219840.1219855
    Pavlou, P. a., & Dimoka, A. (2006). The nature and role of feedback text comments in online marketplaces: Implications for trust building, price premiums and seller differentiation. Information Systems Research, 17(4), 392–414. http://doi.org/10.1287/isre.1060.0106
    Pavlou, P. A., Huigang, L., & Yajiong, X. (2007). Understanding and Mitigating Uncertainty in Online Exchange Relationships: A Principal--Agent Perspective. Mis Quarterly, 31(1), 105–136. http://doi.org/10.2307/25148783
    Rainville, F. De, Fortin, F., Gardner, M., Parizeau, M., & Gagné, C. (2012). DEAP : A Python Framework for Evolutionary Algorithms. Companion Proc. of the Genetic and Evolutionary Computation Conference, 85–92. http://doi.org/doi:10.1145/2330784.2330799
    Rubin, V. L., & Lukoianova, T. (2014). Truth and Deception at the Rhetorical Structure Level. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, (January 2016). http://doi.org/10.1002/asi.23216
    Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology NAACL 03, 1(June), 149–156. http://doi.org/10.3115/1073445.1073475
    Taboada, M., Voll, K., & Brooke, J. (2008). Extracting sentiment as a function of discourse structure and topicality. Technical Report (Vol. 20). Retrieved from http://www.sfu.ca/~mtaboada/docs/Taboada_Voll_Brooke_TR.pdf
    Tang, J., Gao, H., Hu, X., & Liu, H. (2013). Context-aware review helpfulness rating prediction. Proceedings of the 7th ACM Conference on Recommender Systems - RecSys ’13, 1–8. http://doi.org/10.1145/2507157.2507183
    Wang, X., Yoshida, Y., Hirao, T., Sudoh, K., & Nagata, M. (2015). Summarization Based on Task-Oriented Discourse Parsing. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(8), 1358–1367. http://doi.org/10.1109/TASLP.2015.2432573
    Yang, F., Shanmugasundaran, J., Riedewald, M., & Gehrke, J. (2006). Hilda : A High-Level Language for Data-Driven Web Applications. In Proceedings of the 22nd International Conference on Data Engineering (ICDE06) (pp. 32–43). http://doi.org/10.1109/ICDE.2006.75
    Yang, G., Wen, D., Kinshuk, Chen, N. S., & Sutinen, E. (2015). A novel contextual topic model for multi-document summarization. Expert Systems with Applications, 42(3), 1340–1352. http://doi.org/10.1016/j.eswa.2014.09.015
    Zhang, Y., & Zhang, D. (2014). Automatically predicting the helpfulness of online reviews. Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014, (1), 662–668. http://doi.org/10.1109/IRI.2014.7051953
  • 魏志平 - 召集委員
  • 康藝晃 - 委員
  • 黃三益 - 指導教授
  • 口試日期 2016-07-08 繳交日期 2016-07-28

    [回到前頁查詢結果 | 重新搜尋]