國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,社群網路情緒分析：使用表情符號特徵於推文極性分類,Sentiment Analysis on Social Network: Using Emoticon Characteristic for Twitter Polarity Classification

論文名稱 Title	社群網路情緒分析：使用表情符號特徵於推文極性分類 Sentiment Analysis on Social Network: Using Emoticon Characteristic for Twitter Polarity Classification
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	55
研究生 Author	曾子瑄 Tzu-Hsuan Tseng
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	吳宗憲 Chung-Hsien Wu
口試委員 Advisory Committee	王新民, 李宗南, 禹良治 Hsin-Min Wang; Chung-Nan Lee; Liang-Chih Yu
口試日期 Date of Exam	2017-07-21	繳交日期 Date of Submission	2017-08-22
關鍵字 Keywords	情緒分析、字詞嵌入、類神經網路、極性分類、機器學習 Machine Learning, Polarity Classification, Sentiment Analysis, Word Embedding, Neural Network
統計 Statistics	本論文已被瀏覽 5700 次，被下載 43 次 The thesis/dissertation has been browsed 5700 times, has been downloaded 43 times.

中文摘要
本論文針對SemEval研討會任務四中的子任務A實作了一個推特情緒分析系統，即為英文推文的極性分類，我們基於先前參加SemEval-2017的方法加以改進。我們的情緒分類系統由資料前處理、字詞嵌入與情緒分類器組成。在資料前處理上，進行了一系列的步驟，包括表情文字正規化、特定字尾分離以及hashtag斷詞，能夠降低資料複雜度與增加詞向量的涵蓋率使模型能更好的學習。在字詞嵌入部份，我們使用GloVe提供之預訓練詞向量。由於推文中會包含表情圖案，但在許多預訓練詞向量中，包含少量甚至不包含表情圖案向量，我們認為這些表情圖案對於推特情緒分類是重要的特徵，所以我們透過類神經網路來訓練表情圖案的向量，藉由與表情圖案有相關的文字去訓練表情圖案向量，包含了表情圖案的描述與表情圖案的前後文。最後我們使用LSTM與CNN模型作為情緒分類器，為了讓模型不要過度訓練，在訓練時加入驗證資料，模型藉由驗證資料的準確度決定是否要停止訓練。與我們先前的系統相比，本論文提出的方法在LSTM與CNN模型中，能夠分別得到約4%與5%平均召回率的提升。
Abstract
This study aims to improve the sentiment analysis system based on our previous system participating in SemEval-2017. We implemented a Twitter sentiment analysis system for SemEval Task 4 Subtask A—message polarity classification for English. The sentiment analysis system consists of data pre-processing, word embedding and sentiment classifier. In order to decrease the data complexity and increase the coverage on word vector for a better model learning, we performed a series of data pre-processing, including emoticon normalization, specific suffix splitting and hashtag segmentation. In word embedding, we utilized pre-trained word vector provided by GloVe. We believe that emoji in Tweet is an important characteristic for Twitter sentiment classification, but many pre-trained sets of word vectors contain few or no emoji representations. We embedded emojis into the vector space by the neural network training. We trained emoji vector with their relevant words which contains description and context of emoji. The models of LSTM and CNN were used as our sentiment classifiers. In order to prevent the model over training, we added the validation data during the model training. If there is no improvement on accuracy of validation data, the model stops training. The average recall of our proposed method increased 4% for LSTM model and 5% for CNN model than our previous system.

目次 Table of Contents
論文審定書 i Acknowledgments ii 摘要 iii ABSTRACT iv Table of Contents v List of Tables vii List of Figures ix Chapter 1 研究背景與動機 1 1.1 研究背景 1 1.2 研究動機 2 Chapter 2 相關研究 4 2.1 深層神經網路 4 2.1.1 前饋神經網路 4 2.1.2 遞迴神經網路 6 2.1.3 卷積神經網路 9 2.2 Word Embedding 12 2.3 SemEval 13 2.3.1 歷年SemEval回顧 14 2.3.2 SemEval-2017競賽成果 16 Chapter 3 研究方法與步驟 21 3.1 資料前處理 21 3.1.1 表情文字正規化 22 3.1.2 特定字尾分離 23 3.1.3 hashtag斷詞 23 3.2 表情圖案Embedding 25 3.2.1 表情圖案及其描述 26 3.2.2 表情圖案skip gram 28 3.3 工具 29 Chapter 4 實驗 30 4.1 實驗設定 30 4.1.1 LSTM 31 4.1.2 CNN 31 4.2 資料集與基準實驗 31 4.3 資料前處理實驗 32 4.4 表情圖案Embedding實驗 33 4.4.1 表情圖案及其描述 34 4.4.2 表情圖案skip gram 36 4.4.3 只針對含表情圖案的推文進行測試 36 4.5 實驗結果總整理 37 Chapter 5 結論與未來展望 39 Bibliography 41

參考文獻 References
[1]B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment Classification Using Machine Learning Techniques,” in Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 79–86, Association for Computational Linguistics, 2002. [2] B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, “From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series,” ICWSM, vol. 11, no. 122-129, pp. 1–2, 2010. [3] J. Bollen, H. Mao, and X. Zeng, “Twitter Mood Predicts the Stock Market,” Journal of computational science, vol. 2, no. 1, pp. 1–8, 2011. [4] A. Go, R. Bhayani, and L. Huang, “Twitter Sentiment Classification Using Distant Supervision,” CS224N Project Report, Stanford, vol. 1, no. 12, 2009. [5] P. Nakov, A. Ritter, S. Rosenthal, F. Sebastiani, and V. Stoyanov, “SemEval-2016 Task 4: Sentiment Analysis in Twitter,” Proceedings of SemEval, pp. 1–18, 2016. [6] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber, “A Novel Connectionist System for Unconstrained Handwriting Recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 5, pp. 855–868, 2009. [7] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech Recognition with Deep Recurrent Neural Networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pp. 6645–6649, IEEE, 2013. [8] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014. [9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012. [11] Y. Bengio, P. Simard, and P. Frasconi, “Learning Long-Term Dependencies with Gradient Descent is Difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994. [12] S. Hochreiter and J. Schmidhuber, “Long short-Term Memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [13] K. Cho, B. Van Merrienboer, D. Bahdanau, and Y. Bengio, “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches,” arXiv preprint arXiv:1409.1259, 2014. [14] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” arXiv preprint arXiv:1408.5882, 2014. [15] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv preprint arXiv:1301.3781, 2013. [16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and Their Compositionality,” in Advances in neural information processing systems, pp. 3111–3119, 2013. [17] J. Deriu, M. Gonzenbach, F. Uzdilli, A. Lucchi, V. De Luca, and M. Jaggi, “SwissCheese at SemEval-2016 Task 4: Sentiment Classification using an Ensemble of Convolutional Neural Networks with Distant Supervision,” Proceedings of SemEval, pp. 1124–1128, 2016. [18] X. Xu, H. Liang, and T. Baldwin, “UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification,” Proceedings of SemEval, pp. 183–189, 2016. [19] S. Giorgis, A. Rousas, J. Pavlopoulos, P. Malakasiotis, and I. Androutsopoulos, “aueb. twitter. sentiment at SemEval-2016 Task 4: A Weighted Ensemble of SVMs for Twitter Sentiment Analysis,” Proceedings of SemEval, pp. 96–99, 2016. [20] H. Hamdan, “SentiSys at SemEval-2016 Task 4: Feature-based System for Sentiment Analysis in Twitter,” Proceedings of SemEval, pp. 190–197, 2016. [21] M. Nabil, A. Atyia, and M. Aly, “CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification,” in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016. [22] M. Cliche, “BB twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs,” arXiv preprint arXiv:1704.06125, 2017. [23] T. H. Yang, T. H. Tseng, and C. P. Chen, “deepSA at SemEval-2017 Task 4: Interpolated Deep Neural Networks for Sentiment Analysis in Twitter,” in Proceedings of the 11th International Workshop on Semantic Evaluation, (Vancouver, Canada), 2017. [24] S. Rosenthal, N. Farra, and P. Nakov, “SemEval-2017 task 4: Sentiment analysis in Twitter,” in Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval ’17, (Vancouver, Canada), Association for Computational Linguistics, August 2017. [25] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global Vectors for Word Representation,” in EMNLP, vol. 14, pp. 1532–1543, 2014. [26] F. Barbieri, F. Ronzano, and H. Saggion, “What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis.,” 2016

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0722117-023140.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS