Responsive image
博碩士論文 etd-0709116-190114 詳細資訊
Title page for etd-0709116-190114
論文名稱
Title
探討文字指標對於企業績效的影響
The impact of textual indicators on business performance
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
82
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-07-21
繳交日期
Date of Submission
2016-08-09
關鍵字
Keywords
潛在狄立克里分配、支援向量機、Tobit迴歸、企業績效、文字探勘
business performance, latent Dirchlet allocation, Tobit regression, support vector machine, text mining
統計
Statistics
本論文已被瀏覽 5889 次,被下載 451
The thesis/dissertation has been browsed 5889 times, has been downloaded 451 times.
中文摘要
由於近年來資訊科技的發達與進步,企業決策者或是投資市場的參與者能夠很容易地從網際網路上獲得它們所需要的資訊,而在眾多的資訊當中,財經新聞可以反映出投資大眾的看法與建議,並且提供各種不同的觀點來探討企業發展的走向。然而,相較於財務報表中的財務數據,我們相信財經新聞中的文字資訊能夠帶來更多有價值的訊息,像是企業可能的信用違約,亦或是市場上有關內部經營的人事異動之傳言,而財務報表往往需要滿足特定的會計準則,可能導致失去其真實的樣貌,同時,企業所提供的相關數據都具有特定的時間點,如果投資者要以此作為投資的決策,將會面臨到時效性問題,文字資訊則能彌補這樣的缺點。
為了要處理大量的文字資訊,本研究應用潛在狄立克里分配(latent dirichlet allocation, LDA)模式找出新聞中的潛藏主題,並且利用文字探勘技術分析企業的年報資訊,建構出企業績效詞庫(Business performance-corpus),接著,將主題中的關鍵字與其配對,每個主題中分別對應到的詞,再與新聞的關鍵字配對,透過這樣的過程將文字資訊量化成文字指標,此指標稱作企業績效強度指標(Intensity of Business Performance-corpus index, IBPCI)。
從過去的文獻中,我們發現幾乎沒有學者應用文字資訊在企業績效相關的研究,而且大多只採用財務數字來衡量績效,有鑑於此,本研究將財務指標結合文字指標來驗證文字資訊對於企業績效的影響,根據實驗的結果,無論在支援向量機(SVM)或是Tobit迴歸分析(Tobit regression)中,都證實了文字資訊在企業績效的衡量上有其影響力,財務與文字的結合,表現都勝過於單單只用財務指標來衡量,也說明文字指標可以扮演一個輔助財務數字的腳色。在未來研究的延伸上,更能進一步應用本研究建立文字指標的方法,建構出一個企業績效預測模型輔助決策者判斷公司的經營績效。
Abstract
Due to the explosion of the information technology, decision makers or market participants can easily access to the information from the Internet in order to form their investment decision. Among all of the information, the financial news can reflect the publics’ opinions and provide immediate information about the corporate’s operating situation. In comparison with the traditional accounting-based ratio analysis, the textual information derived from financial news can provide market participants much more instant, relevant and valuable cues related to the corporate operation status. It is very essential information for decision-makers to adjust their strategies.
To deal with the large amount of textual information, this study proposes a novel decision-making architecture. Latent Dirichlet allocation (LDA), one of the topic modelling techniques, is implemented to extract the useful topics that are related to corporate operating status from large amount of textual information. The extracted topics are matched with the corpus to determine the indicators, namely “Intensity of Business Performance-corpus index (IBPCI)”, that can transform the textual information to numerical ratios.
To examine the effectiveness of the IBPCI indicators, the experimental designs in this study was divided into two scenarios: (1) SVM and (2) Tobit regression. According to the experimental results, the IBPCI indicators not only can enhance the model’s forecasting performance (SVM and Tobit regression) as well as facilitate the model’s explanation ability. The decision makers can take this model as a roadmap to modify their investment strategies as well as to maximize their personnel wealth.
目次 Table of Contents
論文審定書 i
誌謝 ii
中文摘要 iii
英文摘要 iv
CHAPTER 1 Introduction 1
1.1 Overview 1
1.2 Objective of the research 3
1.3 Organization of the research 4
CHAPTER 2 Literature Review 5
2.1 Financial variables 5
2.2 The application of text mining 9
2.3 Latent Dirichlet Allocation 11
2.4 Data Envelopment Analysis 18
CHAPTER 3 Proposed Approach 22
Step 1: Define “Superior-performing” and “Inferior-performing” firms 25
Step 2: Construct the business performance corpus 28
Step 3: Implement the Latent Dirichlet Allocation 30
Step 4: Quantify the textual indicator 31
Step 5: Select the financial variables 34
Step 6: Examine the impact of textual indicator 35
CHAPTER 4 Experiments and Results 41
4.1 Experiment design 41
Data description 41
Indicator establishment 43
Objective of the experiments 46
Performance measure 47
4.2 Experiment I 48
4.3 Experiment II 50
CHAPTER 5 Conclusions 59
5.1 Concluding remarks 59
5.2 Future work 61
REFERENCE 62
APPENDIX 68
參考文獻 References
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, 23(4), 589-609.

Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis.Management science, 30(9), 1078-1092.

Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of accounting research, 71-111.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

Brainard, W. C., & Tobin, J. (1968). Pitfalls in financial model building. The American Economic Review, 58(2), 99-122.

Caves, D. W., Christensen, L. R., & Diewert, W. E. (1982). The economic theory of index numbers and the measurement of input, output, and productivity. Econometrica: Journal of the Econometric Society, 1393-1414.

Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST),2(3), 27.

Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European journal of operational research, 2(6), 429-444.

Charnes, A., Cooper, W. W., Lewin, A. Y., & Seiford, L. M. (Eds.). (2013). Data envelopment analysis: Theory, methodology, and applications. Springer Science & Business Media.

Cecchini, M., Aytug, H., Koehler, G. J., & Pathak, P. (2010). Making words work: Using financial text as a predictor of financial events. Decision Support Systems, 50(1), 164-175.

Chen, K. T., Lu, H. M., Chen, T. J., Li, S. H., Lian, J. S., & Chen, H. (2011). Giving context to accounting numbers: The role of news coverage. Decision Support Systems, 50(4), 673-679.

Chen, Y. M., Yang, D. H., & Lin, F. J. (2013). Does technological diversification matter to firm performance? The moderating role of organizational slack.Journal of Business Research, 66(10), 1970-1975.

Cook, W. D., Cooper, W. W., Seiford, L. M., & Tone, K. (2001). Data envelopment analysis: a comprehensive text with models, applications, references and DEA-solver software.

Darling, W. M. (2011, December). A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 642-647).

De Fortuny, E. J., De Smedt, T., Martens, D., & Daelemans, W. (2014). Evaluating and understanding text-based stock price prediction models.Information Processing & Management, 50(2), 426-441.

Farrell, M. J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society. Series A (General), 120(3), 253-290.

Färe, R., Grosskopf, S., Norris, M., & Zhang, Z. (1994). Productivity growth, technical progress, and efficiency change in industrialized countries. The American economic review, 66-83.

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, (6), 721-741.

Geva, T., & Zahavi, J. (2014). Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news. Decision support systems, 57, 212-223.

Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235.

Griffiths, T. L., Steyvers, M., Blei, D. M., & Tenenbaum, J. B. (2004). Integrating topics and syntax. In Advances in neural information processing systems (pp. 537-544).

Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55(3), 685-697.

Hall, D., Jurafsky, D., & Manning, C. D. (2008, October). Studying the history of ideas using topic models. In Proceedings of the conference on empirical methods in natural language processing (pp. 363-371). Association for Computational Linguistics.

Hisano, R., Sornette, D., Mizuno, T., Ohnishi, T., & Watanabe, T. (2013). High quality topic extraction from business news explains abnormal financial market volatility. PloS one, 8(6), e64846.

Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42(1-2), 177-196.

Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification.

Kao, C., & Liu, S. T. (2004). Predicting bank performance with financial forecasts: A case of Taiwan commercial banks. Journal of Banking & Finance,28(10), 2353-2368.

Krestel, R., Fankhauser, P., & Nejdl, W. (2009, October). Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems (pp. 61-68). ACM.

Lienou, M., Maître, H., & Datcu, M. (2010). Semantic annotation of satellite images using latent dirichlet allocation. IEEE Geoscience and Remote Sensing Letters, 7(1), 28-32.

Lin, T. T., Lee, C. C., & Chiu, T. F. (2009). Application of DEA in analyzing a bank’s operating performance. Expert systems with applications, 36(5), 8883-8891.

Liu, Q., He, Q., & Shi, Z. (2008, May). Extreme support vector machine classifier. In Pacific-Asia conference on knowledge discovery and data mining(pp. 222-233). Springer Berlin Heidelberg.

Lukins, S. K., Kraft, N. A., & Etzkorn, L. H. (2010). Bug localization using latent Dirichlet allocation. Information and Software Technology, 52(9), 972-990.

Lu, Y. C., Shen, C. H., & Wei, Y. C. (2013). Revisiting early warning signals of corporate credit default using linguistic analysis. Pacific-Basin Finance Journal,24, 1-21.

Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999). Cambridge: MIT press.

Mahajan, A., Dey, L., & Haque, S. M. (2008, December). Mining financial news for major events and their impacts on the market. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on (Vol. 1, pp. 423-426). IEEE.

Mensah, Y. M. (1984). An examination of the stationarity of multivariate bankruptcy prediction models: A methodological study. Journal of Accounting Research, 380-395.

Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of accounting research, 109-131.

Schumaker, R. P., & Chen, H. (2009). A quantitative stock prediction system based on financial news. Information Processing & Management, 45(5), 571-583.

Schumaker, R. P., Zhang, Y., Huang, C. N., & Chen, H. (2012). Evaluating sentiment in financial news articles. Decision Support Systems, 53(3), 458-464.

Seemakurthi, P., Zhang, S., & Qi, Y. (2015, April). Detection of fraudulent financial reports with machine learning techniques. In Systems and Information Engineering Design Symposium (SIEDS), 2015 (pp. 358-361). IEEE.

Seiford, L. M., & Thrall, R. M. (1990). Recent developments in DEA: the mathematical programming approach to frontier analysis. Journal of econometrics, 46(1), 7-38.

Shafiei, M. M., & Milios, E. E. (2006, December). Latent Dirichlet co-clustering. In Sixth International Conference on Data Mining (ICDM'06) (pp. 542-551). IEEE.

Somasundaram, K., & Murphy, G. C. (2012, February). Automatic categorization of bug reports using latent dirichlet allocation. In Proceedings of the 5th India software engineering conference (pp. 125-130). ACM.

Tetlock, P. C., SAAR‐TSECHANSKY, M. A. Y. T. A. L., & Macskassy, S. (2008). More than words: Quantifying language to measure firms' fundamentals.The Journal of Finance, 63(3), 1437-1467.

Tirunillai, S., & Tellis, G. J. (2014). Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent dirichlet allocation.Journal of Marketing Research, 51(4), 463-479.

Tseng, F. M., Chiu, Y. J., & Chen, J. S. (2009). Measuring business performance in the high-tech manufacturing industry: A case study of Taiwan's large-sized TFT-LCD panel companies. Omega, 37(3), 686-697.

Vega, C. (2006). Stock price reaction to public and private information. Journal of Financial Economics, 82(1), 103-133.

Wang, B., Huang, H., & Wang, X. (2012). A novel text mining approach to financial time series forecasting. Neurocomputing, 83, 136-145.

Wong, W. P., & Wong, K. Y. (2007). Supply chain performance measurement system using DEA modeling. Industrial Management & Data Systems, 107(3), 361-381.

Zhong, W., Yuan, W., Li, S. X., & Huang, Z. (2011). The performance evaluation of regional R&D investments in China: An application of DEA based on the first official China economic census data. Omega, 39(4), 447-455.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code