國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,利用文字探勘技術尋找具有潛力的企業:以汽車產業為例,Finding Potential Business through Text Mining Techniques Based on Automotive Industry

論文名稱 Title	利用文字探勘技術尋找具有潛力的企業:以汽車產業為例 Finding Potential Business through Text Mining Techniques Based on Automotive Industry
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	53
研究生 Author	蔡佳芠 Jia-Wun Cai
指導教授 Advisor	林耕霈 Keng-Pei Lin
召集委員 Convenor	楊政融 Cheng-Jung Yang
口試委員 Advisory Committee	戴志華 Chih-Hua Tai
口試日期 Date of Exam	2017-07-21	繳交日期 Date of Submission	2017-07-21
關鍵字 Keywords	企業績效、word2vec、文字探勘、麥氏生產力指數、資料包絡分析法 text mining, word2vec, enterprise performance, data envelopment analysis, Malmquist productivity index
統計 Statistics	本論文已被瀏覽 6111 次，被下載 54 次 The thesis/dissertation has been browsed 6111 times, has been downloaded 54 times.

中文摘要
近年來，由於科技的突破大大改變了人類的生活模式，網際網路的發達，全球化競爭加速，經濟全球化已成為一個顯著的趨勢，人力與知識在全球流動的速度愈來愈快，跨國組織與多國公司的影響力也隨之升高，就產業發展而言，單一技術已無法滿足需求，創新主要來自跨領域技術的整合。本研究將針對分析汽車產業中跨領域技術合作與跨國供應鏈關係，我們利用國際的汽車產業技術趨勢預測未來具有發展潛力的台灣汽車產業供應鏈公司，並加以扶植以強化產業的競爭力。由於資訊科技的發達與進步，越來越多的財經資訊或新聞以電子化形式呈現，有許多的公司提供多樣的訊息，企業決策者可以利用新聞媒體資訊去輔助投資決策，本研究利用文字探勘技術來萃取出當前被熱烈討論的字詞視為新技術或新產品，且利用word2vec方法延伸這些被熱烈討論的新技術字詞，這些以word2vec方法找出的擴展詞皆為與新技術字詞語意相似的詞，另一方面，專利是最完備且公開的技術文件，每篇專利都代表著研發的產出，專利文件裡面清楚說明使用的方法與技術細節，因此，我們尋找出新技術趨勢後，利用技術趨勢找出專利佈局，我們以新技術或新產品的擴展詞作為關鍵詞，至美國專利資料庫搜尋專利文件內容中含有關鍵詞的專利，並且鎖定這些有在美國專利資料庫中申請專利的台灣公司，進而找出具有發展潛力的公司。本研究以基於資料包絡分析法的麥氏生產力指數(DEA-based Malmquist productivity index)作為評估供應鏈公司企業績效之方法，實驗的結果顯示，以本研究所提出的方法找出的台灣供應鏈公司確實是具有發展潛力的。
Abstract
In recent years, the breakthrough of science and technology has greatly changed human life. With the development of Internet and acceleration in global competition, economic globalization has become a major trend, and the influence of international organizations and multinational corporations are also becoming prominent. In terms of industrial development, single technologies can no longer meet the demand. Most innovations involve the technological integration across different domain. This study will focus on the analysis of technology collaboration across domain and international supply chain relation in automotive industry, to predict the promising Taiwanese automotive supply chain companies for the future. With the development of information technology, enterprise decision makers can use news media to guide investment decisions. In this study, text mining is used to extract “hot” terms of new technologies and products from the news. These words are then used to find others potentially related to them, by using word2vec to search for words semantically similar as these new technologies and products, i.e. extended terms. On the other hand, patents are undoubtedly the most complete technical documents available to the public. Each patent represents the output of research and development. We thus use the extended terms of new technologies and products as keywords to search for patents with documents containing these keywords in the USPTO database, and identify these Taiwanese supply chain companies with patents in USPTO database. This study adopts DEA-based Malmquist productivity index to evaluate the enterprise performance of supply chain companies. The experimental results show that the Taiwan companies identified with this method are indeed promising in their growth.

目次 Table of Contents
CHAPTER 1-Introduction ....................................................................................... 1 1.1. Background and Motivation............................................................................. 1 1.2. Results and Contribution................................................................................. 4 1.3. Overall Architecture......................................................................................... 4 CHAPTER 2-Literature Review............................................................................... 5 2.1. Text Mining ...................................................................................................... 5 2.2. Word2Vec ........................................................................................................ 8 2.3. Data Envelopment Analysis (DEA)................................................................. 10 CHAPTER 3-Methodology .................................................................................... 14 3.1. Research Process.......................................................................................... 14 3.2. Data Collection .............................................................................................. 16 3.3. Terms of Novel Technology Extraction .......................................................... 17 3.4. Keywords Analysis ........................................................................................ 20 3.5. Potential Business Discovery......................................................................... 22 CHAPTER 4-Experimental Results and Evaluation ............................................. 25 4.1. Dataset description........................................................................................ 25 4.2. Result on Terms of Novel Technology Extraction ......................................... 26 4.3. Result on Keywords Analysis........................................................................ 27 4.4. Result on Potential Business Discovery ....................................................... 29 4.5. Evaluation ..................................................................................................... 32 CHAPTER 5-Conclusion and Future work............................................................ 41 References ........................................................................................................... 42

參考文獻 References
[1] R. Y. K. Lau, S. S. Y. Liao, K. F. Wong, and D. K. W. Chiu, “Web 2.0 environmental scanning and adaptive decision support for business mergers and acquisitions,” MIS Quarterly, vol. 36, no. 4, pp. 1239–1268, 2012. [2] T. H. Ong, H. Chen, W. K. Sung, and B. Zhu, “Newsmap: A Knowledge Map for Online News,” Decision Support Systems, vol. 39, no. 4, pp. 583–597, 2005. [3] A. Bernstein, S. Clearwater, and F. Provost, “The Relational Vector–Space Model and Industry Classification,” Proceedings of the Learning Statistical Models from Relational Data Workshop at the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 8–18, 2003. [4] E. J. de Fortuny, T. De Smedt, D. Martens, and W. Daelemans, “Evaluating and understanding text-based stock price prediction models,” Inform. Process. Manag., vol. 50, no. 2, pp. 426–441, 2014. [5] A. H. Huang, A. Y. Zang, and R. Zheng, “Evidence on the information content of text in analyst reports,” The Account. Rev., vol. 89, no. 6, pp. 2151–2180, 2014. [6] R. S. Campbell, “Patent trends as a technological forecasting tool,” World Patent Information, vol. 5, pp. 137–143, 1983. [7] S. Jung, “Importance of using patent information,” WIPO—Most intermediate training course on practical intellectual property issues in business, organized by the World Intellectual Property Organization (WIPO), Geneva, November 10–14, 2003. [8] D. K. Despotis, D. Sotiros, and G. Koronakos, “A network DEA approach for series multi-stage processes,” Omega, vol. 61, pp. 35–48, 2016. [9] P. Wanke, C. P. Barros, and O. R. Nwaogbe, “Assessing productive efficiency in Nigerian airports using Fuzzy-DEA,” Transport Policy, vol. 49, pp. 9–19, 2016. [10] H. Ahn, and N. V. Novoa, “The decoy effect in relative performance evaluation and the debiasing role of DEA,” Eur. J. Oper. Res., vol. 249, pp. 959–967, 2016. [11] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in databases,” AI magazine, vol. 17, no. 3, pp. 37–54, 1996. [12] Mihalcea, R. “The text mining handbook: Advanced approaches to analyzing unstructured data,” Computational Linguistics, vol. 34, no. 1, pp. 125–127, 2007. [13] R. Feldman, M. Fresko, Y. Kinar, Y. Lindell, O. Liphstat, M. Rajman, Y. Schler, and O. Zamir, “Text mining at the term level,” Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, pp. 65–73, 1998. [14] Raymond J. Mooney and Un Yong Nahm, “Text Mining with Information Extraction,” Multilingualism and Electronic Language Management: Proceedings of the 4th International MIDP Colloquium, pp. 141–160, 2003. [15] V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan, “Mining of concurrent text and time series,” In Proceedings of the 6th KDD Workshop on Text Mining, pp. 37–44, 2000. [16] R. P. Schumaker and H. Chen, “Textual analysis of stock market prediction using breaking financial news: The AZFin text system,” ACM Transactions on Information Systems, vol. 27, pp. 1–19, 2009. [17] E. J. Ruiz, V. Hristidis, C. Castillo, A. Gionis, and A. Jaimes, “Correlating financial time series with micro-blogging activity,” In Proceedings of the fifth ACM international conference on Web search and data mining, pp. 513–522, 2012. [18] R. Feldman, R. Benjamin, R. Bar-Haim, and M. Fresko, “The Stock Sonar - Sentiment analysis of stocks based on a hybrid approach,” In Proceedings of 23rd IAAI Conference on Artificial Intelligence, 2011. [19] J. Bollen, H. N. Mao, and X. J. Zeng, “Twitter mood predicts the stock market,” Journal of Computational Science, vol. 2, pp. 1–8, 2011. [20] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, vol. 24, pp. 513–523, 1988. [21] Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain, “Neural probabilistic language models,” In Innovations in Machine Learning, pp. 137–186, 2006. [22] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” In Proceedings of the 25th International Conference on Machine learning, pp. 160–167, 2008. [23] A. Mnih and G. E. Hinton, “A scalable hierarchical distributed language model,” In Advances in neural information processing systems, pp. 1081–1088, 2009. [24] M. Baroni, G. Dinu, and G. Kruszewski, “Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors,” In Proceedings of the 52nd Annual Meeting of the ACL, pp. 238–247, 2014. [25] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” The Journal of Machine Learning Research, pp. 2493–2537, 2011. [26] J. Turian, L. Ratinov, and Y. Bengio, “Word representations: a simple and general method for semi-supervised learning,” In Proceedings of the 48th annual meeting of the ACL, pp. 384–394, 2010. [27] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [28] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” In 26th Annual Conference on Neural Information Processing Systems, pp. 3111–3119, 2013. [29] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003. [30] L. Niu, X. Dai, J. Zhang, and J. Chen, “Topic2Vec: learning distributed representations of topics,” In Asian Language Processing (IALP), 2015 International Conference on IEEE, pp. 193–196, 2015. [31] A. Charnes, W. W. Cooper, and E. Rhodes, “Measuring the Efficiency of Decision Making Units,” European Journal of Operational Research, vol. 2, pp. 429–444, 1978. [32] M. J. Farrell, “The Measurement of Productive Efficiency,” Journal of the Royal Statistical Society, Series A, vol. 120, no. 3, pp. 253–290, 1957. [33] R. D. Banker, A. Charnes, and W. W. Cooper, “Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis,” Management Science, vol. 30, pp. 1078–1092, 1984. [34] M. Nishimizu and J. M. Page, “Total Factor Productivity Growth, Technical Progress and Technical Efficiency Change: Dimensions of Productivity Change in Yugoslavia, 1965-78,” Economic Journal, vol. 92, pp. 920–936, 1982. [35] R. Färe, S. Grosskopf, M. Norris, and Z. Zhang, “Productivity Growth, Technical Progress and Efficiency Changes in Industrialized Countries,” American Economic Review, vol. 84, pp. 66–83, 1994. [36] D. J. Aigner and S. F. Chu, “On Estimating the Industry Production Function,” American Economic Review, vol. 58, pp. 826–839, 1968. [37] D. W. Caves, L. R. Christensen, and W. E. Diewert, “Multilateral Comparison of Output, Input and Productivity Using Superlative Index Numbers,” Economic Journal, vol. 92, pp. 73–86, 1982. [38] W. Zhong, W. Yuan, S. X. Li, and Z. Huang, “The performance evaluation of regional R&D investments in China: An application of DEA based on the first official China economic census data,” Omega, vol. 39, pp. 447–455, 2011. [39] F. M. Tseng, Y. J. Chiu, and J. S. Chen, “Measuring business performance in the high-tech manufacturing industry: A case study of Taiwan's large-sized TFT-LCD panel companies,” Omega, vol. 37, pp. 686–697, 2009. [40] W. P.Wong and K. Y.Wong, “Supply chain performance measurement system using DEA modeling,” Industrial Management & Data Systems, vol. 107, pp. 361–381, 2007. [41] T. Wilson, J. Wiebe, and P. Hofimann, “Recognizing contextual polarity in phrase-level sentiment analysis,” Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354, 2005.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0621117-123439.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS