國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個以最早出現時間點來探勘時序資料庫中之最新且頻繁樣式的方法,The Earliest-Time-Point Approach to Mining Frequent Up-to-Date Patterns in Temporal Databases

論文名稱 Title	一個以最早出現時間點來探勘時序資料庫中之最新且頻繁樣式的方法 The Earliest-Time-Point Approach to Mining Frequent Up-to-Date Patterns in Temporal Databases
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	102 學年度第 2 學期 The spring semester of Academic Year 102	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	88
研究生 Author	黃馨葆 Hsin-pao Huang
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	陳健輝 Gen-huey Chen
口試委員 Advisory Committee	王友群, 李建億 You-Chiun Wang; Chien-i Lee
口試日期 Date of Exam	2014-06-20	繳交日期 Date of Submission	2014-06-24
關鍵字 Keywords	資料探勘、時間資料探勘、時間樣式、最新樣式、頻繁出現樣式 Frequent Patterns, Up-to-date Patterns, Temporal Patterns, Temporal Data Mining, Data Mining
統計 Statistics	本論文已被瀏覽 5722 次，被下載 32 次 The thesis/dissertation has been browsed 5722 times, has been downloaded 32 times.

中文摘要
近年來，與時間相關的資料探勘(temporal data mining)已經成為重要的議題並且吸引許多相關研究，而時間資料探勘主要是分析時間資料及找出與時間有關的樣式(temporal patterns)。儘管已經有許多被提出來的演算法能夠用來找出與時間相關的樣式，但仍然只能找出對於整個資料庫而言是頻繁出現的樣式(frequent patterns)。因此，T.P. Hong學者等人提出一個新的概念：最新且頻繁出現的樣式(frequent up-to-date patterns)，用來找出從某個最早的時間點到目前的時間點，這段期間內會成為頻繁出現的項目或項目集。T.P. Hong學者等人也提出利用建立UDP-tree演算法及探勘UDP-growth演算法來找出所有最新且頻繁出現的樣式。建立UDP-tree演算法首先先找出長度為1的最新且頻繁出現的樣式，並且得到出現次數及合理的生命週期；之後，建立UDP-tree；再來，利用UDP-growth演算法探勘出長度為2以上之最新且頻繁出現的樣式，並且得到出現次數及合理的生命週期。然而，建立UDP-tree演算法及挖掘UDP-growth演算法在找出所有最新且頻繁出現的樣式時會有一些問題：首先，當想找出長度為k(k≥1)之最新且頻繁出現的樣式時，他們需要花很多時間在檢查候選人是否在相對應的生命週期裡會成為最新且頻繁出現的樣式。因此，UDP演算法浪費很多執行時間。再者，當檢查長度為k(k≥3)之候選人是否會成為最新且頻繁出現的樣式時，UDP演算法需要將所有的候選人都帶入公式檢查是否滿足條件。因此，也會浪費許多執行時間。第三點，利用UDP演算法得到的結果有些是不合理的。為了避免這些問題以及改善效能，我們提出一個以最早出現時間點包含兩個刪除的策略(pruning strategy)適用於建立樹的演算法與探勘演算法中的方法。第一個刪除的策略適用於當在找出所有長度為k(k≥1)的候選人是否會成為最新且頻繁出現的樣式時，利用這個策略能夠減少許多執行時間；而第二個刪除的策略則是適用於當在找出所有長度為k(k≥3)的候選人是否會成為最新且頻繁出現的樣式時，這個策略能夠刪減掉許多不可能會成為最新且頻繁出現的候選人以降低執行時間；最後，我們也提出能夠避免得到不合理之結果的方法。因此，我們的方法會比UDP演算法來的更快速且不會得到不合理的結果。根據我們的模擬結果，顯示出我們的方法比UDP演算法來的更有效率。
Abstract
Recently, temporal data mining has been considered as an important topic attracting many researchers. Analyzing temporal data and discovering temporal patterns are the main concerns in temporal data mining. Although we can discover temporal patterns by these proposed algorithms, we only derive frequent patterns in the whole database. Therefore, a new concept of up-to-date patterns is proposed by Hong et al., which only cares about the items or itemsets that are frequent for a flexible period of time from the current time to the oldest past time. Hong et al. also propose the UDP-tree construction algorithm and the UDP-growth mining algorithm to find out all frequent up-to-date patterns. The UDP-tree construction algorithm first derives frequent up-to-date 1-patterns with their frequency and valid lifetime. Second, it constructs an UDP-tree. Then, the UDP-growth mining algorithm is proposed to find out up-to-date k-patterns (k ≥ 2) from the UDP-tree. However, the UDP-tree construction algorithm and the UDP-growth mining algorithm have some problems for finding all frequent up-to-date patterns. First, when they derive frequent up-to-date k-patterns (k ≥ 1), they need many times to check whether the item or itemset is frequent with corresponding lifetime or not. That is, the UDP algorithm needs long execution time for checking whether the item or itemset is frequent up-to-date pattern or not. Second, when checking whether the candidate k-pattern (k ≥ 3) is frequent up-to-date k-pattern or not, the UDP algorithm may check all candidate up-to-date k-patterns by the formula. It also wastes execution time. Third, some of the results which derived from the UDP algorithm are unreasonable. Therefore, to avoid these problems and improve the performance, we propose an Earliest-Time-Point approach to use two pruning strategies in the process of the tree construction algorithm and the mining algorithm. The first pruning strategy is applied to check all items or itemsets are frequent up-to-date k-patterns (k ≥ 1) or not. This strategy could reduce the execution time. The second pruning strategy is used for checking whether the itemsets will be frequent up-to-date k-patterns (k ≥ 3) or not. This strategy may prune some candidates to continue to be checked. That is, we reduce the number of candidates. Third, we propose an extension of the formula for deciding the valid appearance time of the pattern to avoid getting unreasonable results. Thus, our approach is faster than the UDP algorithm to find all frequent up-to-date patterns. Moreover, our approach can avoid getting unreasonable results. From our simulation results, we show that our Earliest-Time-Point approach is more efficient than the UDP algorithm.

目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 　1.1 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1 　1.2 Temporal Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 2 　　1.2.1 Pattern Discovery Technique for Temporal Data . . . . . . . . 3 　　1.2.2 Challenges for Temporal Data Mining . . . . . . . . . . . . . . 4 　　1.2.3 The Concept of Up-to-Date Patterns . . . . . . . . . . . . . . 5 　1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 　1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 　1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 　1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 12 2. A Survey of Data Mining Algorithms . . . . . . . . . . . . . . . . . 13 　2.1 The Apriori Algorithm for Association Rules Mining . . . . . . . . . 14 　2.2 The FP-tree Structure and the FP-growth Algorithm . . . . . . . . . 16 　2.3 The FUFP-tree Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 17 　2.4 The UDP-Tree Construction Algorithm and the UDP-Growth Mining Algorithm … 19 　　2.4.1 Up-to-Date Patterns . . . . . . . . . . . . . . . . . . . . . . . 19 　　2.4.2 The UDP-Tree Construction Algorithm . . . . . . . . . . . . . 20 　　　2.4.2.1 The Construction Algorithm . . . . . . . . . . . . . . 20 　　2.4.3 The UDP-Growth Mining Algorithm . . . . . . . . . . . . . . 23 　　　2.4.3.1 The Mining Algorithm . . . . . . . . . . . . . . . . . 23 3. The Earliest-Time-Point Approach . . . . . . . . . . . . . . . . . . . 26 　3.1 Defi nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 　3.2 The Tree Construction Algorithm with the Pruning Strategy . . . . . 28 　　3.2.1 Step 1: Scan the Database to Find Out the Timelist and the Count of Each Item … 29 　　3.2.2 Step 2: Find the Frequent 1-Patterns (L1) and Short-lifetime 1-Itemsets (SL1) … 31 　　3.2.3 Step 3: Examine the Items which are Stored in SL1 to Find Up-to-date 1-Patterns with Their Actual Lifetime…32 　　3.2.4 Step 4: Create the Header Table from Up-to-date 1-Patterns . 37 　　3.2.5 Step 5: Retain Items from Original Database by the Header Table within Their Corresponding Lifetime… 37 　　3.2.6 Step 6: Construct the Tree . . . . . . . . . . . . . . . . . . . . 39 　3.3 The Mining Algorithm with the Pruning Strategy . . . . . . . . . . . 43 　　3.3.1 Step 1: Deal with the Items in the Header Table One by One and Bottom-up … 44 　　3.3.2 Step 2: Form a Conditional Tree of Item I . . . . . . . . . . . 44 　　3.3.3 Step 3: Generate the Candidate Up-to-Date 2-Patterns . . . . 44 　　3.3.4 Step 4: Retain the Up-to-Date 2-Patterns within Their Corresponding Lifetime into L2 … 45 　　3.3.5 Step 5: Find Up-to-Date k-Patterns (k ≥ 3) within Their Corresponding Lifetime into Lk … 53 　3.4 Extension of the Formula . . . . . . . . . . . . . . . . . . . . . . . . . 56 　3.5 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 　4.1 The Performance Model . . . . . . . . . . . . . . . . . . . . . . . . . 63 　4.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 　5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 　5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

參考文獻 References
[1] “IBM Quest Synthetic Data Generator.” http://sourceforge.net/projects/ibmquestdatagen/, 2010. [2] R. Agrawal, T. Imielinksi, and A. Swami, “ Database Mining: A Performance Perspective,” IEEE Trans. on Knowledge and Data Eng. , Vol. 5, No. 6, pp. 914-925, Dec. 1993. [3] R. Agrawal, T. Imielinksi, and A. Swami, “Mining Association Rules between Sets of Items in Large Database,” Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 207-216, 1993. [4] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules,” Proc. of the Int. Conf. on Very Large Data Bases, pp. 487-499, 1994. [5] R. Agrawal, R. Srikant, and Q. Vu, “Mining Association Rules with Item Constraints,” Proc. of the 3rd Int. Conf. on Knowledge Discovery in Databases and Data Mining, pp. 67-73, 1997. [6] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pp. 255-264, 1997. [7] H. Chen, L. Shu, J. Xia, and Q. Deng, “Mining Frequent Patterns in a Varying-Size Sliding Window of Online Transactional Data Streams,” Information Sciences, Vol. 215, No. 15, pp. 15-36, Dec. 2012. [8] M. S. Chen, J. Han, and P. S. Yu, “Database Mining: An Overview from a Database Perspective,” IEEE Trans. on Knowledge and Data Eng. , Vol. 8, No. 6, pp. 866-883, Dec. 1996. [9] D. W. Cheung, J. Han, V. T. Ng, and C. Y. Wong, “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Approach,” Proc. of the 12th IEEE Int. Conf. on Data Eng. , pp. 106-114, 1996. [10] T. C. Fu, “A Review on Time Series Data Mining,” Eng. Applications of Artificial Intelligence, Vol. 24, No. 1, pp. 164-181, Feb. 2011. [11] M. Guillame-Bert and J. L. Crowley, “New Approach on Temporal Data Mining for Symbolic Time Sequences: Temporal Tree Associate Rules,” Proc. of the 23rd IEEE Int. Conf. on Tools with Artificial Intelligence, pp. 748-752, 2011. [12] J. Han and J. Gao, “Research Challenges for Data Mining in Science and Engineering,” Next Generation of Data Mining, pp. 3-27, 2009. [13] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 1-12, 2000. [14] T. P. Hong and C. W. Lin, “Temporal Data Mining with Up-to-Date Pattern Trees,” Expert Systems with Applications, Vol. 38, No. 12, pp. 15143-15150, Nov. 2011. [15] T. P. Hong, C. W. Lin, and Y. L. Wu, “Incrementally Fast Updated Frequent Pattern Trees,” Expert Systems with Applications, Vol. 34, No. 4, pp. 2424-2435, May 2008. [16] T. P. Hong, Y. Y.Wu, and S. L.Wang, “An E_ective Mining Approach for Up-to-Date Patterns,” Expert Systems with Applications, Vol. 36, No. 6, pp. 9747-9752, Aug. 2009. [17] B. Iyad, F. Dmitriy, H. James, M. Fabian, and H. Milos, “Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data,” Proc. Of the 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 280-288, 2012. [18] S. Laxman and P. S. Sastry, “A Survey of Temporal Data Mining,” Academy Proc. in Eng. Sciences, Vol. 31, No. 2, pp. 173-198, April 2006. [19] D. Li and J. S. Deogun, Discovering Partial Periodic Sequential Association Rules with Time Lag in Multiple Sequences for Prediction. 2005. [20] Y. Li, P. Ning, X. S. Wang, and S. Jajodia, “Discovering Calendar-based Temporal Association Rules,” Data and Knowledge Eng. , pp. 193-218, 2003. [21] H. Mannila, H. Toivonen, and A. I. Verkamo, “Efficient Algorithm for Discovering Association Rules,” Proc. of the AAAI Workshop on Knowledge Discovery in Database, pp. 181-192, 1994. [22] B. Ozden, S. Ramaswamy, and A. Silberschatz, “Cyclic Association Rules,” Proc. of the 14th Int. Conf. on Data Eng. , pp. 12-21, 1998. [23] J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. on Knowledge and Data Eng. , Vol. 9, No. 5, pp. 813-825, Sept. 1997. [24] J. F. Roddick and M. Spiliopoulou, “A Survey of Temporal Knowledge Discovery Paradigms and Methods,” IEEE Trans. on Knowledge and Data Eng. , Vol. 14, No. 4, pp. 750-767, July 2002. [25] J. Sourabh, J. Susheel, and J. Anurag, “An Assessment of Fuzzy Temporal Association Rule Mining,” Int. Journal of Application or Innovation in Eng. And Management, Vol. 2, No. 1, pp. 42-45, Jan. 2013. [26] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. Of the 21st Int. Conf. on Very Large Data Bases, pp. 407-419, 1995. [27] B. Tom, S. Gilbert, V. Koen, and W. Geert, “The Use of Association Rules for Product Assortment Decisions,” The 5th Int. Conf. on Knowledge Discovery and Data Mining, pp. 254-260, 1999. [28] K. Verma, O. P. Vyas, and R. Vyas, Temporal Approach to Association Rule Mining Using T-tree and P-Tree. 2005.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0523114-212312.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS