國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,時間序列分類問題之訓練策略,Training Strategies for the Time Series Classification Problem

論文名稱 Title	時間序列分類問題之訓練策略 Training Strategies for the Time Series Classification Problem
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	104 學年度第 1 學期 The fall semester of Academic Year 104	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	69
研究生 Author	郭冠呈 Guan-cheng Guo
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	陳世中 Shih-Chung Chen
口試委員 Advisory Committee	黃國璽, 彭永興, 謝孫源 Kuo-Si Huang; Yung-Hsing Peng; Sun-Yuan Hsieh
口試日期 Date of Exam	2015-08-20	繳交日期 Date of Submission	2015-09-02
關鍵字 Keywords	時間序列分類、訓練策略、最長共同子序列、可變間隔最長共同子序列相似性測量、可變間隔最長共同子序列、行為知識空間、動態時間校正 Time series classification, Training strategy, Variable gap longest common subsequence, Longest common subsequence, Variable gap longest common subsequence similarity measure, Behavior knowledge space, Dynamic time warping
統計 Statistics	本論文已被瀏覽 5744 次，被下載 443 次 The thesis/dissertation has been browsed 5744 times, has been downloaded 443 times.

中文摘要
時間序列分類問題的研究已有數十年，動態時間校正 (DTW) 演算法提供了強大的方法用於測量兩條時間序列的距離，但是DTW演算法並非適合所有種類的時間序列問題。 2014年，彭永興博士和楊昌彪教授定義可變間隔最長共同子序列 (VGLCS)問題，VGLCS是最長共同子序列 (LCS) 的變形並加入間隔 (gap) 的限制，而我們改造VGLCS演算法提出了可變間隔最長共同子序列相似性測量 (VGS) 演算法，將VGLCS演算法應用於測量兩條由實數所組成的時間序列之間的相似度，另外，我們提出了一個訓練方法來獲得合適的間隔限制及參數用於VGS演算法，然後，我們用VGS演算法來處理時間序列分類問題，除此之外，為了降低錯誤率，我們應用行為知識空間 (BKS)方法來結合三個分類器的結果，包括DDTW (微分動態時間校正)、DTWW (彎曲窗口動態時間校正)、LCS/VGS，建立出一個分類器。在實驗中，我們使用UCR網站所提供的資料集進行實驗，實驗結果顯示，相較於先前知名的DTWW方法，對於較小的資料集，BKS方法改進錯誤率的幅度大約21%~22%，對於較大的資料集，BKS方法改進錯誤率的幅度大約17%。
Abstract
The time series classification problem has been studied for decades. The dynamic time warping (DTW) algorithm provides a powerful way to measure the distance between two time series. However, the DTW algorithm may not be suitable for all time series of various types. In 2014, Peng and Yang defined the variable gap longest common subsequence (VGLCS) problem, which is a variant of the longest common subsequence (LCS) problem with gap constraints. With slight modification on the VGLCS algorithm, we propose an algorithm of the variable gap LCS similarity measurement (VGS) for measuring the similarity of two time series consisting of real numbers. We propose a training approach to get proper gap constraints and the parameters for the VGS algorithm. Then, we use the VGS algorithm for solving the time series classification problems. In addition, to reduce the error rates, we apply the behavior knowledge space (BKS) method to build ensemble classifiers by combining three classifiers, including DDTW (derivative dynamic time warping), DTWW (DTW with warping window) and LCS/VGS. The datasets for experiments are obtained from the UCR web site. The experimental results show that the BKS method improves the error rate about 21%~22% on small datasets, and about 17% on large datasets, over the previously best-known DTWW method.

目次 Table of Contents
論文審定書 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i THESIS VERIFICATION FORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 謝辭 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . v LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 The Longest Common Subsequence Problem . . . . . . . . . . . . . . 5 2.2 The Variable Gap Longest Common Subsequence Problem . . . . . . 6 2.3 Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 The Temporal-Proximity-Based Classification . . . . . . . . . 7 2.3.2 The Representation-Based Classification . . . . . . . . . . . . 10 2.3.3 The Model-Based Classification . . . . . . . . . . . . . . . . . 10 2.3.4 Other Classification Methods . . . . . . . . . . . . . . . . . . 11 2.4 The Behavior Knowledge Space Method . . . . . . . . . . . . . . . . 11 Chapter 3. The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . 13 3.1 The Variable Gap LCS Similarity Measurement Algorithm . . . . . . 13 3.2 Training Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 22 4.1 Performance Comparison of Various Algorithms . . . . . . . . . . . . 25 4.2 Classification with Behavior Knowledge Space . . . . . . . . . . . . . 37 Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

參考文獻 References
[1] American Contract Bridge League, “American contract bridge league homepage." http://www.acbl.org/learn_page/how-to-play-bridge/how-to-keep-score/, 2015. [2] H.-Y. Ann, C.-B. Yang, C.-T. Tseng, and C.-Y. Hor, A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings," Information Processing Letters, Vol. 108, No. 6, pp. 360-364, 2008. [3] A. Bagnall, J. Lines, J. Hills, and A. Bostrom, “Time-series classification with COTE: the collective of transformation-based ensembles," IEEE Transactions on Knowledge and Data Engineering, Vol. 27, No. 9, pp. 2522-2535, 2015. [4] K. Buza, A. Nanopoulos, and L. Schmidt-Thieme, “Time-series classification based on individualised error prediction," Proceedings of IEEE 13th International Conference on Computational Science and Engineering (CSE), Hong Kong, China, pp. 48-54, Dec. 2010. [5] Y. Chen, B. Hu, E. Keogh, and G. E. Batista, “DTW-D: Time series semi-supervised learning from a single example," Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, pp. 383-391, Aug. 2013. [6] K.-Y. Cheng, K.-S. Huang, and C.-B. Yang, “The longest common subsequence problem with the gapped constraint," Proceedings of the 30th Workshop on Combinatorial Mathematics and Computation Theory, Hualien, Taiwan, pp. 80-85, 2013. [7] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Querying and mining of time series data: Experimental comparison of representations and distance measures," Proceedings of the VLDB Endowment, Vol. 1, No. 2, pp. 1542-1552, Aug. 2008. [8] J. Grossman, M. Grossman, and R. Katz, The first systems of weighted differential and integral calculus. Non-Newtonian Calculus, 2006. [9] Y.-P. Guo, Y.-H. Peng, and C.-B. Yang, “Efficient algorithms for the flexible longest common subsequence problem," Proceedings of the 31st Workshop on Combinatorial Mathematics and Computation Theory, Taipei, Taiwan, pp. 1-8, 2014. [10] L. Gupta, D. L. Molfese, R. Tammana, and P. G. Simos, “Nonlinear alignment and averaging for estimating the evoked potential," IEEE Transactions on Biomedical Engineering, Vol. 43, No. 4, pp. 348-356, Apr. 1996. [11] D. S. Hirschberg, “A linear space algorithm for computing maximal common subsequences," Communications of the ACM, Vol. 18, pp. 341-343, 1975. [12] C.-Y. Hor, “Machine learning approaches for the protein and RNA sequence analysis," Ph. D. Dissertation, Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, 2014. [13] C.-J. Hsu, K.-S. Huang, C.-B. Yang, and Y.-P. Guo, “Flexible dynamic time warping for time series classification," Procedia Computer Science 51, International Conference On Computational Science, ICCS 2015, Reykjavík, Iceland, pp. 2838-2842, June 2015. [14] K.-S. Huang, C.-B. Yang, and K.-T. Tseng, “Fast algorithms for finding the common subsequence of multiple sequences," Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 1006-1011, 2004. [15] K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng, “Efficient algorithms for finding interleaving relationship between sequences," Information Processing Letters, Vol. 105, No. 5, pp. 188-193, 2008. [16] K.-S. Huang, C.-B. Yang, K.-T. Tseng, Y.-H. Peng, and H.-Y. Ann, “Dynamic programming algorithms for the mosaic longest common subsequence problem," Information Processing Letters, Vol. 102, No. 2, pp. 99-103, 2007. [17] Y. S. Huang and C. Y. Suen, “The behavior-knowledge space method for combination of multiple classifiers," Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '93), pp. 347-352, June 1993. [18] J. W. Hunt and T. G. Szymanski, “A fast algorithm for computing longest common subsequences," Communications of the ACM, Vol. 20, No. 5, pp. 350-353, 1977. [19] C. S. Iliopoulos and M. S. Rahman, “Algorithms for computing variants of the longest common subsequence problem," Theoretical Computer Science, Vol. 395, pp. 255-267, 2008. [20] C. S. Iliopoulos and M. S. Rahman, “New efficient algorithms for the LCS and constrained LCS problems," Information Processing Letters, Vol. 106, No. 1, pp. 13-18, 2008. [21] F. Itakura, “Minimum prediction residual principle applied to speech recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 23, No. 1, pp. 67-72, Feb. 1975. [22] A. C. Jalba, M. H. Wilkinson, J. B. Roerdink, M. M. Bayer, and S. Juggins, “Automatic diatom identification using contour analysis by morphological curvature scale spaces," Machine Vision and Applications, Vol. 16, No. 4, pp. 217-228, 2005. [23] Y.-S. Jeong, M. K. Jeong, and O. A. Omitaomu, “Weighted dynamic time warping for time series classification," Pattern Recognition, Vol. 44, No. 9, pp. 2231-2240, 2011. [24] E. Keogh, Q. Zhu, B. Hu, Y. Hao, X. Xi, L. Wei, and C. A. Ratanamahatana, “The UCR time series classification/clustering homepage." http://www.cs.ucr.edu/~eamonn/time_series_data/, 2011. [25] E. Keogh and S. Kasetty, “On the need for time series data mining benchmarks: A survey and empirical demonstration," Data Mining and Knowledge Discovery, Vol. 7, No. 4, pp. 349-371, Oct. 2003. [26] E. J. Keogh and M. J. Pazzani, “Derivative dynamic time warping," Proceedings of the First SIAM International Conference on Data Mining, Vol. 1, Chicago, IL, USA, pp. 5-7, Apr. 2001. [27] A. Kuzmanic and V. Zanchi, “Hand shape classification using DTW and LCSS as similarity measures for vision-based gesture recognition system," EUROCON, 2007. The International Conference on Computer as a Tool, Warsaw, Poland, pp. 264-269, Sep. 2007. [28] J. Medhi, Statistical methods: an introductory text. New Age International, 1992. [29] D. W. Mitchell, “More on spreads and non-arithmetic means," The Mathematical Gazette, pp. 142-144, 2004. 52 [30] T. Oates, L. Firoiu, and P. R. Cohen, “Clustering time series with hidden markov models and dynamic time warping," Proceedings of the IJCAI-99 Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, Stockholm, Sweden, pp. 17-21, Aug. 1999. [31] Y.-H. Peng and C.-B. Yang, “Finding the gapped longest common subsequence by incremental suffix maximum queries," Information and Computation, Vol. 237, pp. 95-100, Oct. 2014. [32] Y.-H. Peng, C.-B. Yang, K.-S. Huang, C.-T. Tseng, and C.-Y. Hor, “Efficient sparse dynamic programming for the merged LCS problem with block constraints," International Journal of Innovative Computing, Information and Control, Vol. 6, pp. 1935-1947, 2010. [33] Y.-H. Peng, C.-B. Yang, K.-S. Huang, and K.-T. Tseng, “An algorithm and applications to sequence alignment with weighted constraints," International Journal of Foundations of Computer Science, Vol. 21, pp. 51-59, 2010. [34] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh, “Searching and mining trillions of time series subsequences under dynamic time warping," Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, pp. 262-270, Aug. 2012. [35] S. Rani and G. Sikka, “Recent techniques of clustering of time series data: A survey," International Journal of Computer Applications, Vol. 52, No. 15, pp. 1-9, Aug. 2012. [36] C. A. Ratanamahatana and E. Keogh, “Making time-series classification more accurate using learned constraints," Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, pp. 11-22, Apr. 2004. [37] T. M. Rath and R. Manmatha, “Word image matching using dynamic time warping," Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition., Vol. 2, Madison, Wisconsin, USA, pp. 521-527, 2003. [38] Š. Raudys and F. Roli, “The behavior knowledge space fusion method: Analysis of generalization error and strategies for performance improvement," Multiple Classifier Systems (T. Windeatt and F. Roli, eds.), Vol. 2709 of Lecture Notes in Computer Science, pp. 55-64, 2003. [39] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 26, No. 1, pp. 43-49, Feb. 1978. [40] S. L. Salzberg, “On comparing classifiers: Pitfalls to avoid and a recommended approach," Data Mining and Knowledge Discovery, Vol. 1, No. 3, pp. 317-328, 1997. [41] M. Vlachos, G. Kollios, and D. Gunopulos, “Discovering similar multidimensional trajectories," Proceedings of 18th International Conference on Data Engineering, Washington, DC, USA, pp. 673-684, 2002. [42] X. Xi, E. Keogh, C. Shelton, L. Wei, and C. A. Ratanamahatana, “Fast time series classification using numerosity reduction," Proceedings of the 23rd International Conference on Machine Learning, ICML '06, Pittsburgh, Pennsylvania, USA, pp. 1033-1040, June 2006. [43] Z. Xing, J. Pei, and E. Keogh, “A brief survey on sequence classification," ACM SIGKDD Explorations Newsletter, Vol. 12, No. 1, pp. 40-48, June 2010. [44] L. Ye and E. Keogh, Time series shapelets: a new primitive for data mining," Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, pp. 947-956, June 2009.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0801115-183737.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS