國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,應用資料探勘技術分析股市漲跌型態之研究,Applying Data Mining Technique to Analyze Sequential Patterns in the Stock Market in Taiwan

論文名稱 Title	應用資料探勘技術分析股市漲跌型態之研究 Applying Data Mining Technique to Analyze Sequential Patterns in the Stock Market in Taiwan
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	91 學年度第 2 學期 The spring semester of Academic Year 91	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	114
研究生 Author	葉明威 Ming-Wei Yeh
指導教授 Advisor	梁定澎 T.P. Liang
召集委員 Convenor	魏志平 Chih-Ping Wei
口試委員 Advisory Committee	黃三益 Huang San - Yi
口試日期 Date of Exam	2003-07-07	繳交日期 Date of Submission	2003-07-10
關鍵字 Keywords	關聯式法則、股市分析、序列特徵、資料探勘 Data Mining, Association Rull, Stock Analysis, Sequential pattern
統計 Statistics	本論文已被瀏覽 5939 次，被下載 116 次 The thesis/dissertation has been browsed 5939 times, has been downloaded 116 times.

中文摘要
利用資料探勘技術，來分析股票市場的研究，由股票市場過去的歷史資料來建立分析模型，協助投資決策。股票市場的表現為所有投資者決策的集合，而投資者的投資行為有其時間關聯性，以台灣的股票市場為例，股票市場的表現存在有類股輪動的現象，許多法人或投資者會依公司政策或其它因素，在某一特定期間內積極投資特定產業，且有順序性的轉換至下一個產業；此外根據景氣循環的理論，投資法人在購買股票時會根據產業特性及景氣狀況來決定投資標的物，而呈現循環的投資策略。上述投資次序性現象可以利用資料探勘(Data Mining)的技術進行知識的挖掘，尤其資料探勘中的序列相關分析(Sequential Pattern)即是用來研究兩件事件前後順序的關係，而近來年此項技術演算法已有重大的突破，因此利用此一技術分析股票市場的行為，能夠開展一個新的研究方向。本研究的目的在於歸納台灣股票市場的投資序列特徵。根據過去股票市場的歷史性交易資料來探勘出台灣股票市場不同個股間的序列相關模式，據此建構台灣股票市場的行為模式以提供股票投資決策者作正確的決策。
Abstract
Our research adopts data mining technique to analyze stock market and build the analysis model, from the historical data of the stock market, to assist investment decision. The performance of the stock market is the collection of all individuals’ decisions, taking the Taiwan’s stock market for instance, there is a phenomenon that all the prices of the stocks in the same industry will raise in turn, and a lot of corporations and investors will invest some industry more actively and then invest another industry sequentially according the strategies of the corporations or other reasons. Besides, based on the theory of recurring prosperity, investors and corporations will decide the target of investment by the characteristics of the industry and the status of the prosperity and show a recurring investment strategy. The phenomenon of sequential investment can be discovered by using Data Mining technique, especially the Sequential Pattern Analysis in Data Mining technique. The Sequential Pattern Analysis is used to analyze the sequential relation between two things, and this technique has been improved greatly in recent days. Using this technique to analyze the behavior of stock market can be a whole new research topic. The object of this research is to generalize a sequential pattern of the investment in Taiwan’s stock market. Based on the history transaction data of Taiwan’s stock market, we mine for the sequential pattern of different stocks in Taiwan’s stock market and then build the behavior model of Taiwan’s stock market in order to help the stock investors to make the correct decisions.

目次 Table of Contents
第壹章、緒論 5 第一節、背景與目的 5 第二節、研究方法 7 第三節、研究步驟 9 第貳章、文獻探討 10 第一節、資料探勘之技術與回顧 11 第二節、股票市場技術分析之相關文獻探討 15 第三節、資料探勘於股票市場之相關研究 18 第四節、指數採樣股採樣過程與相關文獻探討 22 第參章、研究操作型定義 29 第一節、波段選取方法 30 第二節、研究對象 33 第三節、股價超漲與超跌之操作型定義 37 第四節、績效評估 39 第肆章、研究方法與設計 42 第一節、規則產生 43 第二節、定義關聯式法則 47 第三節、研究分析型態定義 49 第四節、連漲天數之定義 50 第五節、雛型系統設計 51 第伍章、研究結果分析 52 第一節、規則表產生 53 第二節、衡量績效方法與指標 56 第三節、投資規則的績效評估 58 第四節、績效評估結果 63 第五節、影響規則準確度之因素 66 第陸章、研究結論與建議 72 第一節、研究結果 73 第二節、研究貢獻 74 第三節、後續研究建議與限制 75 參考文獻 76 中文部份 76 西文部份 78 附錄索引 83 附錄 1 分析期間各波段曲線圖 83 附錄 2 上漲波段規則表 89 附錄 3 下跌波段規則表 93 附錄 4 投資報酬實驗結果 96 附錄 4 1 使用平均連漲天數作為股票持有策略 96 附錄 4 2 可配對規則之投資報酬實驗結果 100 附錄 5 各規則之超額報酬 104 附錄 5 1 使用平均連漲天數作為股票持有策略 104 附錄 5 2 規則配對法之股票持有策略 108 附錄 6 規則配對投資與連漲天數投資結果比較 111

參考文獻 References
中文部份王邵佑，「隨機指標（KD值）投資績效之實證研究」，國立台北大學企業管理學系碩士論文1999 何旭輝，「應用不確定模式於股票投資之研究」，成功大學企業管理研究所碩士論文，1998。吳文文，「台灣電子股分類及其股價預測之研究－運用類神經網路之實証分析」，朝陽科技大學企業管理研究所碩士論文，2001 吳孟儒，「以輸入資訊內涵觀點構建台灣股價指數類神經網路預測模式之研究」，義守大學管理科學研究所碩士論文，2001 吳健良，「基因演算法在股市預測與交易策略之研究」，中山大學企業管理研究所碩士論文，1999。呂國宏，「運用演化式類神經網路預測台灣股市行為之研究」，國立政治大學資訊管理研究所碩士論文，2001 杜金龍，技術指標－在台灣股市應用的訣竅，台北，金錢文化，1998，初版。范揚明，「模糊理論在股票投資決策上的應用」，暨南大學資訊管理研究所碩士論文，2000。高惠雯，「心理線在股票投資中的效用」，淡江大學數學系碩士論文，1994 張佑瑋，「運用基因演算法整合技術指標以支援證券投資決策之研究」，中山大學資訊管理研究所碩士論文，1998。莊樹人，「投資組合之最適資料長度」，銘傳大學經濟研究所碩士論文，2000 黃彥聖，1995，移動平均法投資績效，管理評論，14卷第1期，47-68。楊美齡譯，「漫步華爾街-股市的終身理財之道」，天下文化出版有限公司，台北市，1995。葉怡成，「類神經網路模式運用與實作」，儒林圖書有限公司，台北市，1994。廖廣毅，「以類神經網路預測股價指數漲跌」，元智大學工業工程研究所碩士論文，1999 劉大成，「結合遺傳演化與範例學習法進行台灣股市行為預測之研究」，政治大學資訊管理研究所碩士論文，1998。劉克一，「以遺傳演算法演化類神經網路在股價預測上的應用」，真理大學管理科學研究所碩士論文，2001 劉維琪、劉玉珍、黃建順、潘璟靜，1995，「台灣股市日內價格變動分析」，證券市場發展季刊，第7卷, 47-71。蔡金豐，「類神經網路於台灣股市預測之應用」，高雄第一科技大學電腦與通訊工程研究所碩士論文，2001 錢善生，「模糊理論與專家系統在台灣股市之應用」，台灣大學資訊管理研究所碩士論文，1995。翁許細，「指數基金特性與設計方式之研究-以台灣為例」，台灣大學財務金融研究所未出版碩士論文，民國八十三年六月許經仟，「台灣股價指數基金之建構與績效評估」，中山大學財務管理研究所未出版碩士論文，民國八十四年六月朱俊儒，「指數基金特性及建構後調整策略之研究---以台灣為例」，東吳大學，企業管理研究所未出版碩士論文，民國八十六年六月謝素娟，「現貨指數之模擬研究」，淡江大學財務金融研究所未出版碩士論文，民國八十八年五月西文部份 Wang L.H., “A fuzzy analysis of systematic risk under price limits: The case of the Taiwan Stock Market,” International Journal of Management, Vol.17, No.4, 2000, pp.435-442. Agrawa, R.L. and Srikant, R., “Mining Sequential Patterns,” In Proc. of the 11th Intl. Conf. on Data Engineering, March 1995. Agrawal, R., Psaila, G., Wimmers, E.L., and Zait, M., “Querying Shapes of Histories,” In Proc. of the 21st Intl. Conf. On Very Large Data Bases, September 1995. Agrawal, R. and Srikant, R., “Fast Algorithms for Mining Association Rules,” Proceedings of 1994 International Conference on Very Large Data Bases, Santiago, Chile, Sep, 1994, pp. 487-499. Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” Proceedings of 1995 International Conference on Data Engineering, Taipei, Taiwan, March 1995. Agrawal, R. and Srikant, R., “Mining Sequential Patterns: Generalizations and Performance Improvements,” Research Report RJ 9994, IBM Almaden Research Center, San Jose, California, Dec, 1995. Agrawal, R., Imielinski, T. and Swami, A., “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of 1993 ACM-SIGMOD (International Conference on Management of Data), Washington, D.C., May 1993, pp.207-216. Agrawal, R., Lin, K., Sawhney, H. S. and Shim, K., “Fast Similarity Search in the Presence of Noise, Scaling and Translation in Time-Series Databases,” Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, September 1995. Agrawal, R., Psaila, G., Wimmers, E. L. and Zait, M., “Querying Shapes of Histories,” Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 502-514. Asquith, P., David, W. and Mullins, Jr., “Equity issues and offering dilution,” Journal of Financial Economics, 15, 1986, pp.61-89. Berry, M. J. A. and Linoff, G., Data Mining Techniques: For Marketing Sale and Customer Support, John Wiley & Sons, Inc., 1997. Bhandari, I., Colet, E., Parker, J., Pines, Z., Pratap, R. and Ramanujam, K., “Advanced Scout: Data Mining and Knowledge Discovery in NBA Data,” Data Mining and Knowledge Discovery, Vol. 1, No. 1, January 1997, pp. 121-125. Bogle John C. , “The Implications of Style Analysis for Mutual Fund Performance Evaluation ,”The Journal of Portfolio Management , Summer 1998 , pp.34-42 Chen, M.S., Park, J.S., and Yu, P.S., “Efficient Data Mining for Path Traversal Patterns,” IEEE Trans. on Knowledge and Data Engineering, Vol.10, 1998, pp.209–221. Chen, M. S., Han, J., and Philip, S. Y., “Data Mining: An Overview from a Database Perspective,” IEEE Transaction on Knowledge and Data Engineering, Vol. 8, No. 6, December, 1996, pp. 866-883. Chorafas, D. N., Chaos Theory in the Financail Markets, probus Publishing Co., 1994. Clark, P. and Niblett, T., “The CN2 Induction Algorithm,” Machine Learning, Vol. 3, 1989, pp.261-283. Cathy, A., F., David , and M., Keith , “The Design of Index Funds and Alternative Methods of Replication ,”The Investment Analyst , 82 , October, 1986 , pp.16~23 Congdon, C., “A Passive Style in an Active World,” Pensions , June 1987,pp.31~33 Estivill-Castro, V. and Murray, A. T., Spatial Clustering for Data Mining with Generic Algorithms, Technical Report FIT-TR-97-10, Faculty of Information Management, Queensland University of Technology, September, 1997. Everitt, B. S., Cluster Aanlysis, John Wiiley & Sons, Inc., 1993. Faloutsos, C., Ranganathan, M. and Manolopoulos, Y., “Fast Subsequence Matching in Time-Series Databases,” Proceedings of the ACM SIGMOD conference on Management of Data, May 1994. Famili, A., Shen, W., M., Weber, R. and Simoudies, E, “Data Preprocessing and Intelligent Data Analysis,” Intelligent Data Analysis, Vol. 1, 1997, pp. 3-23. Fayyard, U. M., Piatesky-Shapiro, G., and Smyth, P., From Data Mining to Knowledge Discovery: An Overview, Chap. 1 in advances in Knowledge Discovery and Data Mining, AAAI Press/ The MIT Press Menlo Park, California, 1996, pp. 1-34. Franchini, L., Spagnolo, C., Rossini, D., Smeraldi, E., Bellodi, L. and Politi, E., “A Neural Network Approach to The Outcome Definition on First Treatment with Sertraline in a Psychiatric Population,” Artificial Intelligence in Medicine, Vol. 23, 2001, pp. 239-248. Garofalakis, M.N., Rastogi, R. and Shim, K.,“SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,” Bell Labs Tech. Memorandum BL0112370- 990223-03TM, February 1999. George , C., “Portfolio Selection Based on Return, Risk,and Relative Performance,” Financial Analysts Journal , March-April 1995,pp.54~60 Glen , A., Jr., Larsen , and G.R., Bruce, “Empirical Insights on Indexing,” The Journal of Portfolio Management, Fall 1998, pp.51~60 Gregory , C., and Leland , H., “Cash Management for Index Tracking,” Financial Analysts Journal , Novermber-December 1995 , pp.75~80 Ha, S. H. and Park, S. C., “Application of Data Mining Tools to Hotel Data Mart on the Intranet for Database Marketing,” Expert Systems With Applications, Vol. 15, 1998, pp.1-31. Hair, J. F., Anderson, R. E., Tathan, R. L., and Black, W. C., Multivariate Data Analysis: Chapter 1 Examining your Data, 5th edition, Prentice-Hall Inc. Press, 1998, pp. 35-85. Han, J., Fu, Y. and Tang, S., “Advances of the DBLearn System for Knowledge Discovery in Large Databases,” Proc. of 1995 Int’l Joint Conf. on Artificial Intelligence (IJCAI’95), Montreal, Canada, Aug, 1995, pp.2049-2050. J. DeRisi, L. Penland, P. O. Brown, M. L. Bittner, P. S. Meltzer, M. Ray, Y. Chen, Y. A. Su and J. M. Trent, “Use of a cDNA microarray to analyze gene expression patterns in human cancer.” Nature Genetics 14: 457-460, 1996. Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data, Prentice Hall, 1988. John, M., “An Empirical Comparison of Pruning Methods for Decision Tree Induction,” Manufactured in the Newseland, 1989, pp. 319-341. John, M., “An Empirical Comparison of selection measures for Decision Tree Induction," Machine Learning, 1989, pp. 227-243. Kaufman, L. and Rousseeuw, P.J., “Finding Groups in Data: An Introduction to Cluster Analysis,” John Wiley & Sons, Inc.,New York, NK, 1990. Koninenko, I., Bratko, I., and Roskar, E., Experiments in Automatic Learning of Medical Diagnostic Rules, Technical Report, Jozef Stenfan Institute, Ljubljana, 1984. Kurgan, A., Cios, J., Tadeusiewicz, R., Ogiela, M. and Goodenday, S., Knowledge Discovery Approach to Automated Cardiac SPECT Diagnosis,” Artificial Intelligence in Medicine, Vol. 23, 2001, pp. 149-169. M. Schena, D. Shalon, R. W. Davis and P. O. Brown, “Quantitative monitoring of gene expression patterns with a complementary DNA microarray.” Science 270: 467-470, 1995. Mannila, H. and Toivonen, H. “Discovering Generalized Episodes Using Minimal Occurrences,” In Proc. of the 2nd Intl. Conf. on Knowledge Discovery and Data Mining, August 1996. Mannila, H., Toivonen H., and Verkamo, A.I., “Discovering Frequent Episodes in Sequences,” In Proc. of the 1st Intl. Conf. on Knowledge Discovery and Data Mining, August 1995. Ming-Syan Chen, Jiawei Han, and Philip S. Yu, “Data mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No.6, December 1996. Meade, N.,and G.R. , Salkin , “Index Funds-Construction and Performance Measurement,” Journal of the Operational Research Society, Vol. 40, No.10,1989, pp.871~879 Ng, R.S., Lakshmanan, H.J. and Pang, A., “Exploratory Mining and Pruning Optimizations of Constrained Association Rules”. In Proc. of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, June 1998. Ng, R. and Han, J., “Efficient and Effective Clustering Methods forSpatial Data Mining,” Proceedings of the 20th Conference on Very Large Data Bases, Santiago, Chile, 1994. Park, J. and Edington, W., “A Sequential Neural Network Model for Diabetes Prediction,” Artificial Intelligence in Medicine, Vol. 23, 2001, pp. 277-293. Quinlan, J. R., “Induction of Decision Tree,” Machine Learning, Vol. 1, 1986, pp. 81-106. Quinlan, J. R., “Unknown Attributes Values in Induction,” Machine Learning, Vol. 4, 1989, pp. 89-116. Ragel, A. and Cremilleux, B., “MVC－a preprocessing method to deal with missing values,” Knowledge-Based Systems, Vol. 12, 1999, pp.285-291. Richard, J. Tewelews and Edward S. Bradley , The Stock Market ,4nd ., NJ ： John Wiley & Son, Inc.,1982, pp.372~388. Richards, G., Rayward-Smith, V. J., Sonksen, H., Carey, S. and Weng, C., “Data Mining for Indicators of Early Mortality in a Database of Clinical Records,” Artificial Intelligence in Medicine, Vol. 22, 2001, pp. 215-231. Shapiro, A. The role of structured induction in expert system, PH.D Thesis, University of Edimburgh, 1983. Srikant, R. and Agrawal, R. “Mining Sequential Patterns: Generalizations and Performance Improvements,”. In Proc. of the 5th Intl. Conf. on Extending Database Technology (EDBT’96), March 1996. Srikant, R., Vu, Q. and Agrawal, R. “Mining Association Rules with Item Constraints,” In Proc. of the 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997. Swan, W. M., A Systems Analysis of Scheduled Air Transportation Networks, Report FTL-R79-5, MIT, Cambridge, MA, 1979. Sweeney, R.J., “Some New Filter Rule Tests ： Methods and Results,” Journal of Financial and Quantitative Analysis, Vol.23, September 1988, pp.285-300. Sharpe William F. , “The Arithmetic of Active Management ,” Financial Analysts Journal , January-February 1991 , pp.1-3 Thomas, S. and Sarawagi, S., "Mining Generalized Association Rules and Sequential Patterns Using SQL Queries,” Proc. of the 4th Int'l Conference on Knowledge Discovery in Databases and Data Mining, New York, Aug, 1998. Wang J. T.., Chirn G.. W., Marr T. G., Shapiro B., Shasha D., and Zhang K.. “Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results”. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, May 1994. Westphal, C. and Blaxton, T., Data Mining Solutions, John Wiley & Sons, Inc., 1998. Zhang, T., Ramarkrishnan, R. and Livny, M., “BIRCH: An Efficient Data Clustering Method for Very Large Database,” Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, 1996.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內公開，校外永不公開 restricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.140.185.123 論文開放下載的時間是校外不公開 Your IP address is 3.140.185.123 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS