Responsive image
博碩士論文 etd-0530117-135858 詳細資訊
Title page for etd-0530117-135858
論文名稱
Title
以文字探勘分析大眾論壇內容探討兩岸工作議題的差異與關聯
Exploring the Differences and Associations of the Working Issues between Two Sides of the Taiwan Strait from Public Forums Using Texting Mining
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
80
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-06-22
繳交日期
Date of Submission
2017-06-30
關鍵字
Keywords
相似度分析、主題模型、隱含狄力克雷分佈、兩岸工作、文字探勘
topic modeling, cross-strait work, text mining, LDA, similarity analysis
統計
Statistics
本論文已被瀏覽 5975 次,被下載 351
The thesis/dissertation has been browsed 5975 times, has been downloaded 351 times.
中文摘要
在國際化的浪潮之下,國與國之間的交流越來越頻繁,而台灣與中國之間,更是存在獨特且敏感的情勢,除了政治因素外,地理及文化也深刻影響兩岸之間的關係。隨著兩岸經貿發展的推進,台商對於中國的投資也影響了兩岸之間的工作環境,越來越多的工作機會出現在中國,也吸引了台灣人至中國工作的想法。
現今大眾在網路上討論議題的風氣盛行,除了在台灣工作的大眾本來就會在網路上討論工作議題外,在中國工作的員工也開始在大眾論壇上,討論著在中國工作的議題,因此提供了許多資料幫助我們瞭解社會大眾對在中國工作與台灣工作的不同議題的討論。從最基層的大眾想法為資料,可以以和過往不同的觀點分析兩岸間工作的議題,也許能夠挖掘出更多先前並未被發現或探討的問題。
本研究利用LDA方法、相似度分析、字詞頻率圖及文字雲和字詞網絡圖等文字探勘方法,從大眾論壇PTT上之Workinchina板及Salary板中,蒐集2009年1月至2016年10月之中國與台灣工作議題之文章,分析兩岸工作議題在大眾論壇中討論之情形,利用LDA挖掘大眾討論的潛在主題;從詞頻圖及文字雲中探討兩岸工作議題的變化趨勢;在字詞網絡圖中,以主題的交集性找出討論議題的潛在出現原因;最後是相似度分析出兩岸討論版結構與討論議題核心的差異,了解兩岸工作議題的討論方向。
利用自動化方法對大眾論壇上的資料,在數量上及時間上做更廣泛的分析,加上外部資料的佐證與說明,也讓公眾對於兩岸工作領域的想法更容易被顯現出來。研究結果也能提供公部門、企業、及回饋給大眾,以快速了解及獲取兩岸間工作領域中知識,提升兩岸工作領域間交流的品質。
Abstract
Under the trend of internationalization, the communications between countries are becoming more and more frequent, and there is a unique and sensitive situation between Taiwan and China. In addition to political factors, geography and culture factors have a profound impact on cross-strait relationship. With the economic and trade development of cross-strait, Taiwan's investment in China also affected the working environment between the two sides, not only more and more job opportunities in China, but also attracted Taiwanese to work in China.
Today, more and more people discuss issues on the Internet. Besides the people who work in Taiwan, the staff working in China also began to discuss the issues of working in China, thus providing a lot of information to help us understand the discussion of the different issues of working in China and working in Taiwan. Using data from grassroots public idea, we can analyze cross-strait work issues from new views, and explore more knowledge not be found previously.
In our research, we collected articles in “Workinchina” board and “Salary board” from the public forum PTT in January 2009 to October 2016. We used LDA, similarity analysis, word frequency chart and word cloud, and word network map methods to analyze the situation of cross-strait work issues discussed in the public forum. Using automated methods to analyze more extensive data on the public forum in quantity and time. With external evidence, public opinions on cross-strait work issues are more likely to be revealed. The research findings can provide feedback for government, business, and public to quickly understand and gain knowledge in cross-strait work issues and enhance the quality of communications between Taiwan and china.
目次 Table of Contents
論文審定書 i
誌謝 ii
中文摘要 iii
Abstract iv
目錄 v
圖目錄 viii
表目錄 x
第一章 緒論 1
1.1研究背景 1
1.2 研究動機 2
1.3 研究目的 2
1.4研究架構 3
第二章 文獻探討 4
2.1兩岸工作議題 4
2.2文字探勘與對中文資料之處理 6
2.2.1文字探勘 6
2.3 LDA(Latent Dirichlet allocation) 9
2.4 Adjusted Cosine Similarity 12
第三章 研究方法 14
3.1資料蒐集及前置處理 15
3.1.1資料來源及蒐集 16
3.1.2資料前處理方法 17
3.2 LDA Topic modeling 19
3.2.1 LDA主題數量及參數 19
3.2.2 LDAVis呈現方法 22
3.3詞頻圖與文字雲 23
3.4字詞網絡圖 27
3.5 議題相似性計算 28
3.5.1 工作議題相似度 28
3.5.2 兩岸探討議題相似度 29
3.6小結 30
第四章 實驗結果 31
4.1 LDA結果觀察 31
4.1.1 LDA挖掘之潛在主要主題與資料集之對照 31
4.1.2 LDA自動化方法挖掘之主題與現實時事之相關對照 35
4.1.3小結 39
4.2文字雲與詞頻圖結果觀察 40
4.2.1 Workinchina詞頻圖及文字雲結果分析 40
4.2.2 Salary詞頻圖及文字雲分析 45
4.2.3小結 50
4.3字詞網絡圖結果觀察 50
4.4相似度結果觀察 53
4.4.1工作議題相似度分析結果 54
4.4.2兩岸探討議題相似度 59
4.5綜合討論 62
第五章 結論 64
5.1結論 64
5.2 未來研究建議 65
參考文獻 66
參考文獻 References
1. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
2. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading Tea Leaves: How humans interpret topic models. Advances in Neural Information Processing Systems (pp. 288-296).
3. Chen, X., Xia, M., Cheng, J., Tang, X., & Zhang, J. (2016). Trend prediction of internet public opinion based on collaborative filtering. In 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (pp. 583–588).
4. Christensen, K., Nørskov, S., Frederiksen, L., & Scholderer, J. (2017). In Search of New Product Ideas: Identifying Ideas in Online Communities by Machine Learning and Text Mining. Creativity and Innovation Management, 26(1), 17–30.
5. Dörre, J., Gerstl, P., & Seiffert, R. (1999). Text Mining: Finding Nuggets in Mountains of Textual Data. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp. 398–401).
6. Feinerer, I. (2015). Introduction to the tm Package Text Mining in R.
7. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the
National Academy of Sciences, 101(suppl 1), 5228–5235.
8. Hsieh, B. W., Chen, W. L., & Chen, J. H. (2015). Video summarization of timestamp
comments videos based on concept of folksonomy. In 7th International Conference
of Soft Computing and Pattern Recognition (SoCPaR) (pp. 193–198).
9. Hsieh, H. Y., Klyuev, V., Zhao, Q., & Wu, S. H. (2014). SVR-based outlier detection and its application to hotel ranking. In IEEE 6th International Conference on Awareness Science and Technology (iCAST) (pp. 1–6).
10. Huang, T. C.-K., Chen, Y.-L., & Chen, M.-C. (2016). A novel recommendation model with Google similarity. Decision Support Systems, 89, 17–27.
11. Ma, B., Zhang, N., Liu, G., Li, L., & Yuan, H. (2016). Semantic search for public opinions on urban affairs: A probabilistic topic modeling-based approach. Information Processing & Management, 52(3), 430–445.
12. Ma, W. Y., & Chen, K. J. (2003, July). Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. In Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17 (pp. 168-171). Association for Computational Linguistics.
13. Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications, 42(3), 1314–1324.
14. Peng, K. H., Liou, L. H., Chang, C. S., & Lee, D. S. (2015). Predicting personality traits of Chinese users based on Facebook wall posts. In Wireless and Optical Communication Conference (WOCC), 24th (pp. 9–14).
15. Sayoud, H. (2015). Segmental Analysis-Based Authorship Discrimination between the Holy Quran and Prophet's Statements. Digital Studies/Le champ numérique.
16. Sievert, C., & Shirley, K. E. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces (pp. 63–70).
17. Skillicorn, D., & Leuprecht, C. (2015). Deception in speeches of candidates for public office. Journal of Data Mining and Digital Humanities, 43.
18. Smith, D. A., Cordell, R., & Dillon, E. M. (2013). Infectious texts: Modeling text reuse in nineteenth-century newspapers. In Big Data, IEEE International Conference on (pp. 86–94). IEEE.
19. Tan, A.-H. (1999). Text mining: The state of the art and the challenges. In Proceedings of the PAKDD Workshop on Knowledge Disocovery from Advanced Databases (Vol. 8, pp. 65–70).
20. Tran, T., & Lee, K. (2016, August). Understanding citizen reactions and Ebola-related information propagation on social media. In Advances in Social Networks Analysis and Mining (ASONAM), IEEE/ACM International Conference on (pp. 106-111). IEEE.
21. 劉梅雀. (2015). 影響工作滿意度前因之探討 -以兩岸之醫療人員為例.國立高雄 應用大學高階經營管理研究所碩士在職專班.
22. 吳菁芳. (2004). 海峽兩岸員工工作價值觀差異之比較研究. 國立中央大學圖書 館.
23. 朱永健. (2012). 兩岸員工組織承諾比較分析之研究. 國立高雄應用大學資訊管 理學系碩士專班.
24. 林名彥. (2015). 應用文字探勘技術於客訴資料之研究-以台大 PPT 論壇為例. 龍華科技大學資訊管理系碩士班.
25. 江衍雄. (2003). 台商張老師刊物-58 期.
26. 莊博貴. (2003). 兩岸員工工作價值觀、工作投入、工作滿足與組織承諾之研究
-以某電纜公司為例. 國立成功大學企業管理學系(EMBA)專班.
69
27. 行政院主計總處. (2016). 就業者平均每年工時.
28. 行政院主計總處. (2017). 104 國人赴海外工作人數統計.
29. 行政院勞委會. (2010). 勞工季刊 23 期.
30. 陳世文. (2010). 兩岸員工工作價值觀、工作滿足與組織承諾之差異性探討. 南
臺科技大學人力資源管理研究所.
31. 陳人豪. (2001). 兩岸員工工作價值觀與工作特性對工作態度之影響. 國立中央
大學圖書館.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code