Responsive image
博碩士論文 etd-0819114-155606 詳細資訊
Title page for etd-0819114-155606
論文名稱
Title
利用文字探勘技術萃取旅館評價文章之研究
Use Text Mining Techniques to Identify Noteworthy Hotel Reviews from Travel Forums
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
52
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-07-31
繳交日期
Date of Submission
2014-09-22
關鍵字
Keywords
文件分類、詞彙網路、潛在狄氏分配、支持向量機器、多義詞歧義消解
Latent Dirichlet allocation, Text classification, Word-sense disambiguation, SVM, WordNet
統計
Statistics
本論文已被瀏覽 6101 次,被下載 1350
The thesis/dissertation has been browsed 6101 times, has been downloaded 1350 times.
中文摘要
用戶生成內容(User-generated content ,UGC)是由使用者自行創建的內容,而由此特性建置的網站,近年快速增加,使用者生成內容的數量也因此不斷的擴增。而我們的研究以著名的旅遊網站TripAdvisor.com為例,該網站的內容由使用者自行建置,在TripAdvisor.com中使用者共同分享旅遊經驗,包含景點、旅館、餐廳等,像這樣使用者生成內容的旅遊網站重點在於使用者真實體驗後的感受與經驗的分享,而其他使用者可依循先前使用者的經驗了解更加了解飯店、餐廳。而這樣的資訊不單旅遊者會瀏覽,相關旅遊產業的工作人員也十分關注。

  TripAdvisor.com評論內容數量龐大且雜亂,對想了解自家飯店在TripAdvisor網站價評管理人員而言是十分龐大的負擔。針對此問題,我們的研究提供給旅館管理者快速、準確且值得被關注評論內容,其值得被關注的評論由旅館從業人員提供,經由訪談、內容分析,將其值得被關注評論內容區分為幾個特性,包含內容特徵、文字情緒特徵以及文章品質。利用這些特性與文字探勘技術共建置的分類模型,經由我們的研究證實,內容特徵具有最大的影響,其次是情緒特徵以及文章品質。其結果可提供給旅館管理者,做為內容管理和針對值得被關注的評論回覆,能提高網路評價並增加自家飯店的能見度,並在競爭激烈的旅遊行業取得成功的關鍵。
Abstract
The advance of user-generated content (UGC) inspires knowledge sharing among Internet users. A good example is the well-known travel site TripAdvisor.com, which enables users to share their experiences and express their opinions on attractions, accommodations, restaurants, etc. The UGC about travel provide precious information to the users as well as staff in travel industry. In particular, how to find reviews that are noteworthy to hotel is critical to the success of hotels in the competitive travel industry.
We have employed two hotel managers to conduct a preliminary examination on the hotel reviews of Tripadvisor.com and found noteworthy reviews can be characterized by their content features, sentiment features, and quality. Through the experiments using tripadvisor.com data, we found that all the features are important in identifying noteworthy hotel reviews. Specifically, content features have been shown to have most impact, followed by sentiment and quality.
目次 Table of Contents
CHAPTER 1- Introduction 1
1.1 Background 1
1.2 Motivation 2
CHAPTER 2- Literature Review 4
2.1 Content Feature Identification 4
2.2 Polarity Recognition 9
Emotion Identification 10
Negation and Quantifiers 11
2.3 Quality of Product Review 16
2.4 Recommended review in tourism domain 17
CHAPTER 3- Problem Definition 19
3.1 Noteworthy Reviews 19
3.2 Research Problem Definition 21
CHAPTER 4- The Approach 23
4.1 Topics Extraction 23
4.2 Sentiment Detection 27
4.3 Quality of Review Measure 29
4.4 Classification Model Construction 30
CHAPTER 5- Evaluation 33
5.1 Tripadvisor web crawler 33
5.2 Select 500 reviews for experts labeling class 34
5.3 Selection attribute from LDA 35
5.4 Performance Results 37
CHAPTER 6- Conclusions 43
References 44
參考文獻 References
References
Agirre, E. and A. Soroa (2009). Personalizing pagerank for word sense disambiguation. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics.

Baccianella, S., et al. (2010). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of the International Conference on Language Resources and Evaluation.

Carrillo de Albornoz, J., et al. (2010). A hybrid approach to emotional sentence polarity and intensity classification. Proceedings of the Fourteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics.

Cheung, M. Y., et al. (2009). "Credibility of electronic word-of-mouth: Informational and normative determinants of on-line consumer recommendations." International Journal of Electronic Commerce 13(4): 9-38.

Chorus, C. G., et al. (2006). "Travel information as an instrument to change car-drivers’ travel choices: a literature review." European Journal of Transport and Infrastructure Research 6(4): 335-364.

Councill, I. G., et al. (2010). What's great and what's not: learning to classify the scope of negation for improved sentiment analysis. Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics.

de Albornoz, J. C., et al. (2012). UCM-I: a rule-based syntactic approach for resolving the scope of negation. Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics.

de Albornoz, J. C., et al. (2011). A joint model of feature mining and sentiment analysis for product review rating. Advances in information retrieval, Springer: 55-66.

Dıaz, A., et al. "UCM at TREC-2012: Does negation influence the retrieval of medical reports?" Proceedings of the Text REtrieval Conference.

Esuli, A. and F. Sebastiani (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of the International Conference on Language Resources and Evaluation.

Han, J., et al. (2006). Data mining: concepts and techniques, Morgan kaufmann.

Klein, D. and C. D. Manning (2003). Accurate unlexicalized parsing. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics.

Liu, J., et al. (2007). Low-Quality Product Review Detection in Opinion Summarization. Proceedings of the Conference on Empirical Methods in Natural Language Processing.

Morante, R. (2010). "Descriptive analysis of negation cues in biomedical texts." Proceedings of the International Conference on Language Resources and Evaluation.

Navigli, R. (2009). "Word sense disambiguation: A survey." ACM Computing Surveys (CSUR) 41(2): 1-10.

O'Mahony, M. P. and B. Smyth (2009). Learning to recommend helpful hotel reviews. Proceedings of the third ACM conference on Recommender systems, ACM.

O’Mahony, M. P. and B. Smyth (2010). "A classification-based review recommender." Knowledge-Based Systems 23(4): 323-329.

Pang, B. and L. Lee (2008). "Opinion mining and sentiment analysis." Foundations and trends in information retrieval 2(1-2): 1-135.

Salton, G. and M. J. McGill (1986). "Introduction to modern information retrieval."
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code