Responsive image
博碩士論文 etd-0627116-214010 詳細資訊
Title page for etd-0627116-214010
論文名稱
Title
基於修辭結構的評論品質分析
Quality Analysis of User Reviews Using Discourse Structure
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-07-08
繳交日期
Date of Submission
2016-07-28
關鍵字
Keywords
修辭結構理論、文字探勘、自然語言處理、文字品質分析、使用者評論分析
Text Mining, Rhetorical Structure Theory, User Review, Quality Analysis, Natural Language Processing
統計
Statistics
本論文已被瀏覽 6049 次,被下載 153
The thesis/dissertation has been browsed 6049 times, has been downloaded 153 times.
中文摘要
Web 2.0的興起,帶動大眾言論的新浪潮:越來越多人在網際網路上發表想法、意見,甚至是對於特定商品的評論。在這麼大量的文字資料中,除了真實並且有參考價值的資料之外,也有可能包含惡意的垃圾訊息,或者是沒有意義的文字。因此,要如何判斷文字的品質變成了很重要的課題之一,這也成為文字探勘領域的熱門議題。
在過去的研究中通常只以文字詞語的特性作為分析的標的,但這樣的分析方式只能觀察到文字詞語的組合,或是評論整體長短等數據型的資料,並沒有辦法對於文章整體的架構及內容有所了解。修辭結構理論(Rhetorical Structure Theory, RST)透過分析子句和子句之間的關係,將文章轉換成包含階層特性的樹狀結構,讓我們對於整個文章結構有全面性的了解。我們加入修辭結構理論作為分析的標的,使文章品質分析的結果更精確,也更符合人類語言的特性。
我們提出一個文字品質分析流程,使用了NLTK(Natural Language Toolkit)進行自然語言處理,找出可能影響文字品質的影響因素,並使用不同的分類模型分析Amazon購物網站的真實使用者評論。我們比較了加入RST特徵前和加入後的結果,實驗證明我們提出的模型比起單純採用文字詞語的模型有顯著的進步。
Abstract
The emergence of Web 2.0 has led to a new era of user generated content. More and more people tend to share their thinking, opinion and even user review of a specific product. Among all these textual data, not only useful paragraphs that are worth considering are in-cluded, there are also malicious comments and spams. Thus, how to determine the quality of text has become a serious problem, and a popular issue in the domain of text mining.
Most of the existing studies use the characteristic of tokens as the analysis target, which cannot represent the comprehensive structure of a document, but only the combination of words, or statistical data such as the length of the document. Rhetorical Structure Theory (RST) transform a document to a tree that have the characteristic of level according to the relation between text spans. By incorporating the concept of RST, we can have a more accu-rate result on analyzing text quality.
We proposed a process to analyze the quality of text, using NLTK (Natural Language Toolkit) to process the text, finding the potential predictor, and use different classification model to analyze real user reviews from Amazon.com. Also, an evaluation process is con-ducted to compare the result before and after adding RST features. Experiment shows that our model outperforms previous models which only consider the structure of word tokens.
目次 Table of Contents
Chapter 1 -Introduction 1
1.1 Background and Motivation 1
1.2 Research Purpose 2
1.3 Expected Results and Contribution 2
1.4 Thesis Organization 3
Chapter 2 -Literature Review 4
2.1 Rhetorical Structure 4
2.2 Rhetorical Structure Theory In The Domain Of Text Mining 8
2.3 Quality analysis 9
Chapter 3 -Problem Definition 11
3.1 Preliminaries 11
3.2 Problem Description 13
Chapter 4 -Research Approach 15
4.1 Research Architecture 15
4.2 Data Preparation 17
4.3 Natural Language Processing 19
4.4 Discourse Parsing 20
4.5 Feature Building 24
4.6 Quality Determination 28
Chapter 5 -Experiments 30
5.1 Dataset 30
5.2 Experimental Design 33
5.3 Experiment Result 34
5.3.1 NLP/ Discourse Parsing Result 34
5.3.2 Prediction of Quality 36
5.3.3 RST Weighting Scheme 38
Chapter 6 -Conclusion 43
Chapter 7 -References 45
參考文獻 References
Bash, E. (2015). Natural Language processing with python. PhD Proposal (Vol. 1). http://doi.org/10.1017/CBO9781107415324.004
Berkeley, U. C., Joshua, J. J., Peace, E., & Iii, W. J. (2012). Stative Adjectives and Verbs in English, (1), 926–929.
Bott, R. (2014). Use of Discourse Knowledge to Improve Lexicon-based Sentiment Analysis. Igarss 2014, (1), 1–5. http://doi.org/10.1007/s13398-014-0173-7.2
DeVurie, D. & Prendinger, H. (2009). A Novel Discourse Parser Based on Support Vector Machine Classification. Proceedings of ACL ’09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2(August), 665–673. http://doi.org/10.3115/1690219.1690239
De Rainville, F.-M., Fortin, F.-A., Gardner, M.-A., Parizeau, M., & Gagne, C. (2012). {DEAP} - Enabling Nimbler Evolutions. SIGEvolution Newsletter of the ACM Special Interest Group on Genetic and Evolutionary Computation, 6(2), 17–26. Retrieved from https://github.com/DEAP/notebooks
Dellarocas, C. (2003). The digitization of word-of-mouth: promise and challenges of online reputation mechanisms. Management Science, (December), 1–38. http://doi.org/10.1287/mnsc.49.10.1407.17308
Fang, H., Lu, W., Wu, F., Zhang, Y., Shang, X., Shao, J., & Zhuang, Y. (2015). Topic aspect-oriented summarization via group selection. Neurocomputing, 149(PC), 1613–1619. http://doi.org/10.1016/j.neucom.2014.08.031
Feng, V. W., & Hirst, G. (2014a). A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 511–521).
Feng, V. W., & Hirst, G. (2014b). Two-pass Discourse Segmentation with Pairing and Global Features. ArXiv E-Prints, 1407.8215. Retrieved from http://arxiv.org/abs/1407.8215
Gagn, C. (2012). DEAP : Evolutionary Algorithms Made Easy. Journal of Machine Learning Research, 13, 2171–2175. http://doi.org/10.1.1.413.6512
Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498–1512. http://doi.org/10.1109/TKDE.2010.188
Heerschop, B., Goossen, F., Hogenboom, A., Frasincar, F., Kaymak, U., & De Jong, F. (2011). Polarity analysis of texts using discourse structure. Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), 1061–1070. http://doi.org/10.1145/2063576.2063730
Hogenboom, A., Frasincar, F., de Jong, F., & Kaymak, U. (2015). Using Rhetorical Structure in Sentiment Analysis. Commun. ACM, 58(7), 69–77. http://doi.org/10.1145/2699418
Ittoo, A., & Prof, A. (n.d.). Predicting Review Helpfulness A Machine Learning & Natural Language Processing based Approach Background • Online reviews.
Joty, S., & Ng, R. T. (2015). CODRA : A Novel Discriminative Framework for Rhetorical Analysis. Computational Linguistics, (January), 1–50.
Korfiatis, N., García-Bariocanal, E., & Sánchez-Alonso, S. (2012). Evaluating content quality and helpfulness of online product reviews: The interplay of review helpfulness vs. review content. Electronic Commerce Research and Applications, 11(3), 205–217. http://doi.org/10.1016/j.elerap.2011.10.003
Krishnamoorthy, S. (2015). Linguistic features for review helpfulness prediction. Expert Systems with Applications, 42(7), 3751–3759. http://doi.org/10.1016/j.eswa.2014.12.044
Li, F., Liu, N., Jin, H., Zhao, K., Yang, Q., & Zhu, X. (2011). Incorporating reviewer and product information for review rating prediction. IJCAI International Joint Conference on Artificial Intelligence, 1820–1825. http://doi.org/10.5591/978-1-57735-516-8/IJCAI11-305
Liu, J., Cao, Y., Lin, C.-Y., Huang, Y., & Zhou, M. (2007). Low-Quality Product Review Detection in Opinion Summarization. Computational Linguistics, (June), 334–342. Retrieved from http://acl.ldc.upenn.edu/D/D07/D07-1035.pdf
Lu, Y., Tsaparas, P., Ntoulas, A., & Polanyi, L. (2010). Exploiting social context for review quality prediction. Proceedings of the 19th International Conference on World Wide Web - WWW ’10, 691–700. http://doi.org/10.1145/1772690.1772761
Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. http://doi.org/10.1515/text.1.1988.8.3.243
McAuley, J., Targett, C., Shi, Q., & Hengel, A. Van Den. (2015). Image-based Recommendations on Styles and Substitutes. Proceeding of 38th ACM SIGIR, 1–11. http://doi.org/10.1145/2766462.2767755
Moghaddam, S., Jamali, M., & Ester, M. (2011). Review recommendation: personalized prediction of the quality of online reviews. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, 2249–2252. http://doi.org/10.1145/2063576.2063938
Mudambi, S. M., & Schuff, D. (2010). What Makes a Helpful Online Review? a Study of Customer Reviews on Amazon.Com 1, 34(1), 185–200. Retrieved from http://ssrn.com/abstract=2175066
Otterbacher, J., & Arbor, A. (2009). “ Helpfulness ” in Online Communities : A Measure of Message Quality. Proceedings of the 27th International Conference on Human Factors in Computing Systems - CHI ’09, 955–964. http://doi.org/10.1145/1518701.1518848
Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 3(1), 115–124. http://doi.org/10.3115/1219840.1219855
Pavlou, P. a., & Dimoka, A. (2006). The nature and role of feedback text comments in online marketplaces: Implications for trust building, price premiums and seller differentiation. Information Systems Research, 17(4), 392–414. http://doi.org/10.1287/isre.1060.0106
Pavlou, P. A., Huigang, L., & Yajiong, X. (2007). Understanding and Mitigating Uncertainty in Online Exchange Relationships: A Principal--Agent Perspective. Mis Quarterly, 31(1), 105–136. http://doi.org/10.2307/25148783
Rainville, F. De, Fortin, F., Gardner, M., Parizeau, M., & Gagné, C. (2012). DEAP : A Python Framework for Evolutionary Algorithms. Companion Proc. of the Genetic and Evolutionary Computation Conference, 85–92. http://doi.org/doi:10.1145/2330784.2330799
Rubin, V. L., & Lukoianova, T. (2014). Truth and Deception at the Rhetorical Structure Level. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, (January 2016). http://doi.org/10.1002/asi.23216
Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology NAACL 03, 1(June), 149–156. http://doi.org/10.3115/1073445.1073475
Taboada, M., Voll, K., & Brooke, J. (2008). Extracting sentiment as a function of discourse structure and topicality. Technical Report (Vol. 20). Retrieved from http://www.sfu.ca/~mtaboada/docs/Taboada_Voll_Brooke_TR.pdf
Tang, J., Gao, H., Hu, X., & Liu, H. (2013). Context-aware review helpfulness rating prediction. Proceedings of the 7th ACM Conference on Recommender Systems - RecSys ’13, 1–8. http://doi.org/10.1145/2507157.2507183
Wang, X., Yoshida, Y., Hirao, T., Sudoh, K., & Nagata, M. (2015). Summarization Based on Task-Oriented Discourse Parsing. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(8), 1358–1367. http://doi.org/10.1109/TASLP.2015.2432573
Yang, F., Shanmugasundaran, J., Riedewald, M., & Gehrke, J. (2006). Hilda : A High-Level Language for Data-Driven Web Applications. In Proceedings of the 22nd International Conference on Data Engineering (ICDE06) (pp. 32–43). http://doi.org/10.1109/ICDE.2006.75
Yang, G., Wen, D., Kinshuk, Chen, N. S., & Sutinen, E. (2015). A novel contextual topic model for multi-document summarization. Expert Systems with Applications, 42(3), 1340–1352. http://doi.org/10.1016/j.eswa.2014.09.015
Zhang, Y., & Zhang, D. (2014). Automatically predicting the helpfulness of online reviews. Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014, (1), 662–668. http://doi.org/10.1109/IRI.2014.7051953
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code