國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,時間序列分類分析方法：技術發展與評估 ,Time-Series Classification: Technique Development and Empirical Evaluation

論文名稱 Title	時間序列分類分析方法：技術發展與評估 Time-Series Classification: Technique Development and Empirical Evaluation
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	90 學年度第 2 學期 The spring semester of Academic Year 90	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	53
研究生 Author	楊景婷 Ching-Ting Yang
指導教授 Advisor	魏志平 Chih-Ping Wei
召集委員 Convenor	林東清 Tung-Ching Lin
口試委員 Advisory Committee	林福仁 Fu-Ren Lin
口試日期 Date of Exam	2002-07-23	繳交日期 Date of Submission	2002-07-31
關鍵字 Keywords	流失預測、時間序列相似度計算、時間序列分類分析、最近鄰居分類分析、電信業資料探勘、資料探勘 Telecommunications Data Mining, Time-Series Similarity, Data Mining, k Nearest Neighbor Classification, Churn Prediction, Time-Series Classification
統計 Statistics	本論文已被瀏覽 5757 次，被下載 4044 次 The thesis/dissertation has been browsed 5757 times, has been downloaded 4044 times.

中文摘要
現實生活中許多決策行為的預測，是利用時間序列(time-series)性質的資料，我們稱這類的應用為時間序列的分類分析問題。在過去分類分析方法的研究中，主要是集中於用單一(atomic)或彼此間獨立的屬性值，來學習建構出一個分類架構(classification model)。傳統分類分析技術處理時間序列的分類分析問題時，最直接的方式是將時間序列性質的資料透過統計的方式(例如：平均數計算、總合計算等)，轉變成非時間序列性資料。然而，這樣的統計轉換方式通常會造成部份的資訊流失。在本研究中，我們提出了結合最近鄰居法(k Nearest Neighbor Classification Approach)的時間序列分類分析方法(Time-Series Classification Technique)。實證評估的結果顯示，相較統計轉換方式處理時間序列資料的方法，我們所提出的時間序列分類分析方法有比較好的表現。
Abstract
Many interesting applications involve decision prediction based on a time-series sequence or a set of time-series sequences, which are referred to as time-series classification problems. Past classification analysis research predominately focused on constructing a classification model from training instances whose attributes are atomic and independent. Direct application of traditional classification analysis techniques to time-series classification problems requires the transformation of time-series data into non-time-series data attributes by applying some statistical operations (e.g., average, sum, etc). However, such statistical transformation often results in information loss. In this thesis, we proposed the Time-Series Classification (TSC) technique, based on the nearest neighbor classification approach. The result of empirical evaluation showed that the proposed time-series classification technique had better performance than the statistical-transformation-based approach.

目次 Table of Contents
TABLE OF CONTENTS...........................................................I LIST OF FIGURES.............................................................III LIST OF TABLES..............................................................IV CHAPTER 1. INTRODUCTION..................................................... 1 1.1 RESEARCH BACKGROUND..................................................... 1 1.2 RESEARCH MOTIVATION AND OBJECTIVES...................................... 2 1.3 ORGANIZATION OF THE THESIS.............................................. 5 CHAPTER 2. LITERATURE REVIEW................................................ 6 2.1 CLASSIFICATION ANALYSIS TECHNIQUES...................................... 6 2.1.1 Decision Tree Induction............................................... 6 2.1.2 Backpropagation Neural Network........................................ 9 2.1.3 Nearest Neighbor Classification.......................................10 2.2 SIMILARITY SEARCH OF TIME-SERIES DATA...................................12 2.2.1 Shape Search Method...................................................12 2.2.2 Discrete Fourier Transformation Method................................14 2.2.3 Window Matching/Assembly Method.......................................16 CHAPTER 3. DEVELOPMENT OF TIME-SERIES CLASSIFICATION (TSC) TECHNIQUE........20 3.1 SELECTION OF CLASSIFICATION STRATEGY FOR TSC PROBLEM....................20 3.2 PROPOSED TIME-SERIES CLASSIFICATION TECHNIQUE...........................23 CHAPTER 4. EMPIRICAL EVALUATION.............................................25 4.1 APPLICATION BACKGROUND: CHURN PREDICTION IN TELECOMMUNICATIONS DOMAIN ...25 4.2 DATA COLLECTION.........................................................26 4.3 GENERATION OF EVALUATION DATA SETS AND EVALUATION PROCESS...............28 4.4 EVALUATION CRITERIA.....................................................29 4.5 BENCHMARK TECHNIQUE.....................................................29 4.6 PARAMETER TUNING EXPERIMENTS............................................30 4.6.1 Effects on Window Size (w)............................................31 4.6.2 Effects on Gap Size (g)...............................................33 4.6.3 Effects on Window Difference Threshold (e)............................34 4.6.4 Effects on Scale Difference Threshold (q).............................36 4.6.5 Effects on the Number of Nearest Neighbors Selected (k)...............38 4.7 EMPIRICAL EVALUATION....................................................39 4.7.1 Comparative Evaluation................................................40 4.7.2 Sensitivity to Degree of Asymmetry in Class Distribution..............43 CHAPTER 5. CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS.......................45 REFERENCES..................................................................46 APPENDIX A: PARAMETER TUNING RESULTS........................................50 APPENDIX B: EMPIRICAL EVALUATION RESULTS....................................53

參考文獻 References
References Agrawal, R., Faloutsos, C., and Swami, A., “Efficient Similarity Search in Sequence Databases,” Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, October 1993, pp.69-84. Agrawal, R., Imeilinski, T., and Swami, A., “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington DC, 1993, pp.207-216. Agrawal, R., Lin, K., Sawhney, H. S., and Shim, K., “Fast Similarity Search in the Presence of Noise, Scaling and Translation in Time-Series Databases,” Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, September 1995a. Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” Proceedings of the 1995 Conference on Data Engineering, Taipei, Taiwan, 1995b, pp.3-14. Agrawal, R., Psaila, G., Wimmers, E. L., and Zait, M., “Querying Shapes of Histories,” Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, September 1995c, pp.502-514. Breiman, L., Friedman, J., Olshen, R., and Stone, C., Classification and Regression Trees, Wadsworth, Pacific Grove, 1984. Bonchi, F., Giannotti, F., Gozzi, C., Manco, G., Nanni, M., Pedreschi, D., Renso, C., and Ruggieri, S., “Web Log Data Warehousing and Mining for Intelligent Web Caching,” Data and Knowledge Engineering, Vol. 39, No. 2, November 2001, pp.165-189. Berson, A., Smith, S., and Thearling, K., “Customer Retention,” Chapter 12 in Building Data Mining Applications for CRM, McGraw-Hill, New York, NY, 2000. Berry, M. J. and Linoff, G., “Data Mining Techniques: for Marketing, Sales, and Customer Support,” John Wiley & sons, Inc., New York, 1997. Cover, T. M. and Hart, P. E., “Nearest Neighbor Pattern Classification,” IEEE Transactions on Information Theory, Vol. IT-13, No. 1, 1967, pp.21-27. Chen, M. S., Han, J., and Yu, P. S., “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, 1997. Esposito, F., Malerba, D. and Semeraro, G., “A Comparative Analysis of Methods for Pruning Decision Trees,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 5, May 1997, pp.476-491. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y., “Fast Subsequence Matching in Time-Series Databases,” Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, May 1994, pp.419-429. Frawley, W., Piatetsky, S.G., and Matheus, C., “Knowledge Discovery in Databases: An Overview,” AI magazine, Fall 1992, pp.213-228 Gerpott, T. J., Rams, W., and Schindler, A., “Customer Retention, Loyalty, and Satisfaction in the German Mobile Cellular Telecommunications Market,” Telecommunications Policy, Vol. 25, No. 10-11, November 2001, pp.885-906. Han, J. and Kamber, M., Data Mining: Conceptes and Techniques, Morgan Kaufmann Publishers, 2001. Kass, G. V., “An Exploratory Technique for Investigating Large Quantities of Categorical Data,” Applied Statistics, Vol. 29, 1980, pp.119-127. Kappert, C. B. and Omta, S. W. F., “Neural Networks and Business Modeling-An Application of Neural Modeling Techniques to Prospect Profiling in the Telecommunications Industry,” Proceedings of the 30th Hawaii International Conference on System Sciences, Vol. 5, 1997, pp.465-473. Lin, F., Chou, S., Pan, S., and Chen, Y., “Mining Time Dependency Patterns in Clinical Pathways,” International Journal of Medical Informatics, Vol. 62, No. 1, June 2001, pp.11-25. Lin, F. Y. and McClean, S., “A Data Mining Approach to the Prediction of Corporate Failure,” Knowledge-Based Systems, Vol. 14, No. 3-4, June 2001, pp.189-195. Lesh, N., Zaki, M. J., and Ogihara, M., “Scalable Feature Mining for Sequential Data,” IEEE Intelligent Systems, Vol. 15, No. 2, March/April 2000, pp.48-56. Mingers, J., “An Empirical Comparison of Pruning Methods for Decision Tree Induction,” Machine Learning, Vol. 4, No. 2, 1989b, pp.227-243. Murthy, S. K., Kasif, S., and Salzberg, S., “A System for Induction of Oblique Decision Trees,” Journal of Artificial Intelligence Research, Vol. 2, 1994, pp.1-32. Mannila, H., Toivonen, H., and Verkamo, A. I., “Discovering Frequent Episodes in Sequences,” Proceedings of First International Conference on Knowledge Discovery and Data Mining (KDD’95), Montreal, Canada, August 1995, pp.210-215. Niblett, T. and Bratko, I., “Learning Decision Rules in Noisy Domains,” Research and Development in Expert Systems III: Proceedings of the 6th Technical Conference of the British Computer Society Specialist Group on Expert Systems, Brignton, December 1986, pp.25-34. Oppenheim, A., V. and Schafer, R., W., Digital Signal Processing, Prentice-Hall, Englewood Cliffs, N J, 1975. Quinlan, J.R., “Induction of Decision Trees,” Machine Learning, Vol. 1, No. 1, 1986, pp.81-106. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993. Rafiei, D., “On Similarity-Based Queries for Time Series Data,” Proceedings of 15th International Conference on Data Engineering, 1999. Shaw, M. J., C. Subramaniam, G. W. Tan, and M. E. Welge, “Knowledge Management and Data Mining for Marketing,” Decision Support Systems, Vol. 31, No. 1, May 2001, pp.127-137. Song, H. S., J. K. and Kim, S. H., “Mining the Change of Customer Behavior in An Internet Shopping Mall,” Expert Systems with Applications, Vol. 21, No. 3, October 2001, pp.157-168. SPSS Inc., “Working with Telecommunications: Churning in the Telecommunications Industry, SPSS White Paper, 1999 (available at: http://www.spss.com/downloads/papers/ ) Wei, C., I. Chiu, “Turning Telecommunications Call Details to Churn Prediction: A Data Mining Approach,” Expert System with Applications, August 2002. Wei, C., Hwang, S. Y. and Yang, W. S., “Mining Frequent Temporal Patterns in Process Databases,” Proceedings of 10th Workshop on Information Technologies and systems (WITS 2000), Brisbane, Australia, December 2000b, pp.175-180. Wei, C., Piramuthu, S. and Shaw, M. J., “Knowledge Discovery and Data Mining,” To appear in Handbook of Knowledge Management, C. Holesapple (Ed.), 2002. Xia, B. B, “Similarity Search in Time Series Data Sets,” Unpublished Master Thesis, Simon Fraser University, December 1997. Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T. and Liu, X., “Learning Approaches for Detecting and Tracking News Events,” IEEE Intelligent Systems, Vol. 14, No. 4 , July-Aug. 1999, pp.32-43. Yang, Y., Pierce, T. and Carbonell, J., “A Study on Retrospective and On-line Event Detection,” Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, 1998, pp.28-36.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0731102-205308.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS