國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個基於限制規則在時間序列資料庫中探勘有興趣之頻繁週期性樣式的方法,A Restriction-Based Algorithm for Mining Interesting Frequent Periodic Patterns in Time Series Databases

論文名稱 Title	一個基於限制規則在時間序列資料庫中探勘有興趣之頻繁週期性樣式的方法 A Restriction-Based Algorithm for Mining Interesting Frequent Periodic Patterns in Time Series Databases
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	103 學年度第 2 學期 The spring semester of Academic Year 103	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	87
研究生 Author	王心怡 Hsin-yi Wang
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	陳健輝 Gen-Huey Chen
口試委員 Advisory Committee	李建億, 郭大維 Chien-i Lee; Tei-Wei Kuo
口試日期 Date of Exam	2015-06-05	繳交日期 Date of Submission	2015-06-25
關鍵字 Keywords	資料探勘、頻繁出現樣式、時間資料探勘、時間樣式、週期性 Frequent Patterns, Data Mining, Time-series Data Mining, Time-series Patterns, Periodicity
統計 Statistics	本論文已被瀏覽 5736 次，被下載 56 次 The thesis/dissertation has been browsed 5736 times, has been downloaded 56 times.

中文摘要
近年來，與時間相關的資料探勘(time-series data mining)已經成為重要的議題並且吸引許多相關研究。其中一個與時間相關的資料探勘主要是分析時間資料及找出與時間有關並且週期性出現的樣式(periodicity patterns)。週期性模式資料探勘可以應用在股票預測、電腦網絡故障分析和檢測安全漏洞、地震預測、和基因的分析等。它的困難的困難點是因為它不僅需要使用在時間序列資料庫中的資訊去找出頻繁模式，同時也需要確保它們在同週期長度頻繁出現。因此，Nishi學者等人提出一個新的演算法，其中，這個演算法只關心是頻繁的時間從時間序列資料庫中找出一個具有彈性週期的樣式。然而，Nishi學者等人提出的演算法。在尋找用戶有興趣的樣式時，會有一些問題存在。當在建立長度為1的頻繁出現樣式時，需要花費很多時間來存儲所有這些頻繁的陣列中的樣式。再者，當檢查長度為k(k≧2)之候選人是否會成為頻繁出現的樣式時，它需要檢查所有候選人是否頻繁出現，而不是專注於生成用戶有興趣的樣式。因此，為了避免這些問題，並提高了性能，我們提出了PB ( Restriction-Based )演算法有效率的找出有興趣並頻繁週期性出現的樣式。我們使用了一些刪除的策略(pruning strategy)當在產生長度為1的頻繁出現樣式時。我們不僅可以應用於這些刪除的策略來產生週期性頻繁出現產生長度為1的樣式，還要滿足限制規則。另外，我們也提出了一個方法去專注於產生用戶有興趣的樣式。因此，我們的算法能夠避免得到不想要的結果。根據模擬的結果，我們證明了我們提出的PB演算法比Nishi學者等人提出的演算法更有效率。
Abstract
In recent years, time-series data mining has been considered as an important topic attracting many researchers. One of the most important topics in the time-series databases is to find the periodicity of the patterns. Periodic pattern mining is useful in predicting the stock price movement, computer network fault analysis and detection of security breach, earth-quake prediction, and gene expression analysis. It is difficult because it not only needs to use the information in the time-series database to find out the frequent patterns, but also needs to make sure the patterns which are frequent patterns occurring in the similar period length. Therefore, a new concept of finding time-series periodic patterns is proposed by Nishi et al., which cares about the patterns that are frequent for a flexible period of time from the time-series database. Nishi et al. also states the concept to define the flexible period patterns. However, the algorithm proposed by Nishi et al. has some problems for finding the user interesting patterns. When they derive frequent periodic 1-patterns, they need many times to store all the patterns which is frequent in the array. Moreover, when generating candidate periodic k-pattern ( k≧2 ), Nishi et al.'s algorithm may check all candidate periodic k-patterns instead of focus on generating the user interesting patterns. It also wastes execution time. Therefore, to avoid these problems and improve the performance, we propose a Restriction-Based algorithm to efficiently find out the user interesting patterns. We present the pruning strategies during deriving frequent periodic 1-patterns. These pruning strategies not only can be applied to check whether the items are frequent periodic 1-patterns or not but also satisfy the restriction for the generating the candidate patterns. The strategy could reduce the execution time. Furthermore, we also propose a join policy to focus on generating the user interesting patterns. Therefore, our algorithm can avoid getting unwanted results. From our simulation results, we show that our Restriction-Based algorithm is more efficient than Nishi et al.'s algorithm.

目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Periodic Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 11 2. A Survey of Algorithms for Mining Periodic Patterns . . . . . . . 12 2.1 The Max-Subpattern Hit Set Approach . . . . . . . . . . . . . . . . . 12 2.2 The PPA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 The Suffix Tree Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 The Suffix-Tree-Based Algorithm . . . . . . . . . . . . . . . . 17 2.3.1.1 The First Phase: Suffix-Tree-Based Representation . 18 2.3.1.2 The Second Phase: Periodicity Detection Algorithm Using the Suffix-Tree . . . . . . . . . . . . . . . . . . 18 2.3.2 The Improved Suffix-Tree-Based Algorithm . . . . . . . . . . . 19 2.3.2.1 Discretization Technique . . . . . . . . . . . . . . . . 21 2.3.2.2 The Mining Process . . . . . . . . . . . . . . . . . . 21 2.3.2.3 Joining of Two Patterns . . . . . . . . . . . . . . . . 22 3. The Restriction-Based Algorithm . . . . . . . . . . . . . . . . . . . . 26 3.1 Notations and De nitions . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 The Mining Algorithm with Pruning Strategies . . . . . . . . . . . . 27 3.2.1 The Joining Policy . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 The Pruning Strategy . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Finding User Interested Patterns . . . . . . . . . . . . . . . . . . . . 39 3.4 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1 The Performance Model . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.1 Uniform Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.2 IBM Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

參考文獻 References
[1] “IBM Quest Synthetic Data Generator.” http://sourceforge.net/projects/ibmquestdatagen/, 2010. [2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,"Proc. of the 20th Int. Conf. on VLDB, Vol. 1215, pp. 487-499, 1994. [3] R. Agrawal and R. Srikant, “Mining Sequential Patterns," Proc. of the 11th Int. Conf. on Data Eng., pp. 3-14, 1995. [4] M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database Perspective," IEEE Trans. on Knowledge and Data Eng., Vol. 8, No. 6, pp. 866-883, Dec. 1996. [5] J. Han, G. Dong, and Y. Yin, “Efficient Mining of Partial Periodic Patterns in Time Series Database," Proc. of the 15th Int. Conf. on Data Eng., pp. 106-115, 1999. [6] J. Han, W. Gong, and Y. Yin, “Mining Segment-Wise Periodic Patterns in Time-Related Databases," Proc. Int. Conf. on Knowledge Discovery and Data Mining, pp. 214-218, 1998. [7] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation," Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data, Vol. 29, pp. 1-12, 2000. [8] G. C. Lan, T. P. Hong, V. S. Tseng, and S. L. Wang, “Applying the Maximum Utility Measure in High Utility Sequential Pattern Mining," Expert Systems with Applications, Vol. 41, No. 11, pp. 5071-5081, Sept. 2014. [9] G. Lee, U. Yun, and K. H. Ryu, “Sliding Window Based Weighted Maximal Frequent Pattern Mining over Data Streams," Expert Systems with Applications, Vol. 41, No. 2, pp. 694-708, Feb. 2014. [10] S. Ma and J. L. Hellerstein, “Mining Partially Periodic Event Patterns with Unknown Periods," Proc. of the 17th Int. Conf. on Data Eng., pp. 205-214, 2001. [11] F. Masseglia, P. Poncelet, and M. Teisseire, “Efficient Mining of Sequential Patterns with Time Constraints: Reducing the Combinations," Expert Systems with Applications, Vol. 36, No. 2, pp. 2677-2690, March 2009. [12] M. A. Nishi, C. F. Ahmed, M. Samiullah, and B. S. Jeong, “Effective Periodic Pattern Mining in Time Series Databases," Expert Systems with Applications, Vol. 40, No. 8, pp. 3015-3027, June 2013. [13] B. Ozden, S. Ramaswamy, and A. Silberschatz, “Cyclic Association Rules," Proc. of the 14th Int. Conf. on Data Eng., pp. 412-421, 1998. [14] J. Pei, J. Han, B. Mortazavi Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu, “Mining Sequential Patterns by Pattern-growth: The Prefixspan Approach," IEEE Trans. on Knowledge and Data Eng., Vol. 16, No. 11, pp. 1424-1440, Nov. 2004. [15] F. Rasheed, M. Alshalalfa, and R. Alhajj, “Efficient Periodicity Mining in Time Series Databases Using Suffix Trees," IEEE Trans. on Knowledge and Data Eng., Vol. 23, No. 1, pp. 79-94, Jan. 2011. [16] A. Y. Rodríguez-González, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, and J. Ruiz-Shulcloper, “Mining Frequent Patterns and Association Rules Using Similarities," Expert Systems with Applications, Vol. 40, No. 17, pp. 6823-6836, Dec. 2013. [17] C. Sheng, W. Hsu, and M. L. Lee, “Mining Dense Periodic Patterns in Time Series Data," Proc. of the 22nd Int. Conf. on Data Eng., pp. 115-115, 2006. [18] J. Wang and J. Han, “BIDE: Efficient Mining of Frequent Closed Sequences," Proc. of the 20th Int. Conf. on Data Eng., pp. 79-90, 2004. [19] H. W. Wu and A. J. Lee, “Mining Closed Flexible Patterns in Time-series Databases," Expert Systems with Applications, Vol. 37, No. 3, pp. 2098-2107, March 2010. [20] K. J. Yang, T. P. Hong, Y. M. Chen, and G. C. Lan, “Projection-Based Partial Periodic Pattern Mining for Event Sequences," Expert Systems with Applications, Vol. 40, No. 10, pp. 4232-4240, Aug. 2013. [21] K. J. Yang, T. P. Hong, Y. M. Chen, and G. C. Lan, “An Efficient Pruning and Filtering Strategy to Mine Partial Periodic Patterns from a Sequence of Event Sets," Int. Journal of Data Warehousing and Mining, Vol. 10, No. 2, pp. 18-38, April 2014. [22] W. Yang and G. Lee, “Efficient Partial Multiple Periodic Patterns Mining without Redundant Rules," Proc. of the 28th Int. Computer Conf. on Software and Applications, pp. 430-435, 2004. [23] S. J. Yen, Y. S. Lee, Y. T. Guo, and J. Y. Gu, “Mining Frequent Patterns from Incremental Databases," Proc. of the 2011 IEEE Int. Conf. on Machine Learning and Cybernetics, Vol. 1, pp. 73-79, 2011. [24] U. Yun, G. Lee, and K. H. Ryu, “Mining Maximal Frequent Patterns by Considering Weight Conditions over Data Streams," Knowledge-Based Systems, Vol. 55, pp. 49-65, Jan. 2014. [25] M. J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, Vol. 42, No. 1-2, pp. 31-60, Jan. 2001. [26] Z. Zhao, D. Yan, and W. Ng, “Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases," Proc. of the 15th Int. Conf. on Extending Database Technology, pp. 74-85, 2012.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0525115-130621.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS