國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個於資料串流移動式視窗中以集合晶格來探勘最大頻繁集的方法,A Subset-Lattice Algorithm for Mining Maximal Frequent Itemsets over a Data Stream Sliding Window

論文名稱 Title	一個於資料串流移動式視窗中以集合晶格來探勘最大頻繁集的方法 A Subset-Lattice Algorithm for Mining Maximal Frequent Itemsets over a Data Stream Sliding Window
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	79
研究生 Author	王宣云 Syuan-Yun Wang
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	陳健輝 Gen-Huey Chen
口試委員 Advisory Committee	李建億, 郭大維 Chien-I Lee; Tei-Wei Kuo
口試日期 Date of Exam	2012-06-29	繳交日期 Date of Submission	2012-07-09
關鍵字 Keywords	項目集、資料串流、最大頻繁項目集、晶格、移動式視窗模型 Maximal Frequent Itemset, Lattice, Sliding Window Model, Itemset, Data Stream
統計 Statistics	本論文已被瀏覽 5670 次，被下載 186 次 The thesis/dissertation has been browsed 5670 times, has been downloaded 186 times.

中文摘要
在資料串流(data stream)中，線上的資料關聯法則是在資料探勘這個領域中非常重要的一個部分。其中，探勘最大頻繁項目集(maximal frequent itemset) 也是一個重要的議題。一個最大頻繁項目集是指不存在一個頻繁項目集 (frequent itemset)是其超集，而這類的頻繁項目集稱之為最大頻繁項目集。由於資料串流具有速度快、連續性，沒有限制性及即時性等特性，因此，我們只可掃描資料庫一次。因此，之前在傳統資料庫探勘最大頻繁項目集的演算法並不適用於資料串流。此外，很多應用都著重於距離現在時間較近的資料串流，而移動式視窗(sliding window model)正是處理最近資料串流的模式。在移動式視窗裡，須要先定義視窗的大小。而MFIoSSW 演算法是一個以移動式視窗為模式，探勘最大頻繁項目集的演算法。MFIoSSW 演算法使用一個簡單的資料結構來探勘最大頻繁項目集。此演算法使用一個矩陣A 來儲存最大頻繁項目集及其他有用的資料。但是MFIoSSW 演算法會用較多的時間探勘最大頻繁項目集，當一個新的交易進來時，新舊交易之間的比較次數會比較多。因此，在此計畫中，我們想提出一個以移動式視窗為模型的演算法，為Subset-Lattice 演算法。 Subset-Lattice 演算法是利用晶格(lattice)資料結構來儲存交易的資料。晶格資料結構會儲存父節點和子節點之間的關係。在每一個晶格節點，我們會儲存項目集、支持值和交易的編號數列。當新的交易進來時，我們會根據交易的特性分成五個集合：(1) equivalent (2) subset (3) intersection (4) empty set (5) superset。根據這五種集合關係，我們要新增或更新資料時會比較有效率。
Abstract
Online mining association rules in data streams is an important field in the data mining. Among them, mining the maximal frequent itemsets is also an important issue. A frequent itemset is called maximal if it is not a subset of any other frequent itemset. The set of all the maximal frequent itemsets is denoted as the maximal frequent itemset. Because data streams are continuous, high speed, unbounded, and real time. As a result, we can only scan once for the data streams. Therefore, the previous algorithms to mine the maximal frequent itemsets in the traditional databases are not suitable for the data streams. Furthermore, many applications are interested in the recent data streams, and the sliding window is the model which deal with the most recent data streams. In the sliding window model, a window size is required. One of the algorithms for mining the maximal frequent itemsets based on the sliding window model is called the MFIoSSW algorithm. The MFIoSSW algorithm uses a compact structure to mine the maximal frequent itemsets. It uses an array-based structure A to store the maximal frequent itemsets and other helpful itemsets. But it takes long time to mine the maximal frequent itemsets. When the new transaction comes, the number of comparison between the new transaction and the old transactions is too much. Therefore, in this project, we propose a sliding window approach, the Subset-Lattice algorithm. We use the lattice structure to store the information of the transactions. The structure of the lattice stores the relationship between the child node and the father node. In each node, we record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent, (2) subset, (3) intersection, (4) empty set, (5) superset. With this five relations, we can add the new transactions and update the support efficiently.

目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Data Streams Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Window Models in Data Streams . . . . . . . . . . . . . . . . 2 1.2.2 Two Types of the Sliding Window Model . . . . . . . . . . . . 3 1.2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Mining Maximal Frequent Itemsets in Data Streams . . . . . . . . . . 6 1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 10 2. A Survey of Data Mining Algorithms . . . . . . . . . . . . . . . . . 12 2.1 The Apriori Algorithm for Association Rules Mining . . . . . . . . . 12 2.2 Mining Maximal Frequent Itemsets from Data Streams . . . . . . . . 15 2.2.1 Description of INSTANT . . . . . . . . . . . . . . . . . . . . 15 2.3 The DSTree for the Mining of Frequent Sets from Data Streams . . . 15 2.4 The MFI-TransSW Algorithm for Frequent Itemsets Mining in Data streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 The MFI-TransSW Algorithm Structure . . . . . . . . . . . . 19 2.4.2 Three Phases of The MFI-TransSW Algorithm . . . . . . . . . 20 2.5 A CPS-tree Structure for Frequent Itemsets Mining in Data Streams . 21 2.5.1 The Structure of the CPS-tree . . . . . . . . . . . . . . . . . . 22 2.5.2 The Algorithm of the CPS-tree . . . . . . . . . . . . . . . . . 22 2.6 Mining Maximal Frequent Itemsets over a Stream Sliding Window . . 24 i Page 2.6.1 Transaction Addition . . . . . . . . . . . . . . . . . . . . . . . 24 2.6.2 Transaction Deletion . . . . . . . . . . . . . . . . . . . . . . . 25 3. The Subset-Lattice Algorithm . . . . . . . . . . . . . . . . . . . . . . 26 3.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Data Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Data Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.3 The Time-Sensitive Sliding Window . . . . . . . . . . . . . . . 44 3.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 The Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

參考文獻 References
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Very Large Data Bases, pp. 487–499, 1994. [2] F. Ao, Y. Yan, J. Huang, and K. Huang, “Mining Maximal Frequent Itemsets in Data Streams Based on FP Tree,” Proc. of the 5th Int. Conf. on Machine Learning and Data Mining in Pattern Recognition, pp. 479–489, 2007. [3] J. Chang and W. Lee, “estwin: Online Data Stream Mining of Recent Frequent Itemsets by Sliding Window Method,” Information Sciences, Vol. 31, No. 2, pp. 76–90, 2005. [4] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, “Catch the Moment: Maintaining Closed Frequent Itemsets over a Data Stream Sliding Window,” Knowledge and Information Systems, Vol. 10, No. 3, pp. 265–294, 2006. [5] C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, “Mining Frequent Patterns in Data Streams at Multiple Time Granularities,” In: H.Kargupta, A.Joshi,K.Sivakumar, eds. Next Generation Data Mining. Cambridge, Massachusetts: MIT Press, pp. 191–212, 2003. [6] J. Han, J. Peid, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” ACM SIGMOD Record, Vol. 29, No. 2, pp. 1–12, 2000. [7] J. L. Koh and S. N. Shin, “An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams,” Proc. of the Int. Conf. on Data Warehousing and Knowledge Discovery, pp. 352–362, 2006. [8] C. K. S. Leung and Q. I. Khan, “DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams,” Proc. of the 6th IEEE Int. Conf. on Data Mining, pp. 928–932, 2006. [9] H. Li and N. Zhang, “Mining Maximal Frequent Itemsets Over a Stream Sliding Window,” Proc. of Information Computing and Telecommunications, pp. 110– 113, 2010. [10] H. F. Li, “Interactive Mining of Top-K Frequent Closed Itemsets from Data Streams,” Expert Systems with Applications, Vol. 36, No. 7, pp. 10779–10788, 2009. [11] H. F. Li, C. C. Ho, and S. Y. Lee, “Incremental Updates of Closed Frequent Itemsets over Continuous Data Streams,” Expert Systems with Applications, Vol. 36, No. 2, pp. 2451–2458, 2009. [12] H. F. Li and S. Y. Lee, “Mining Frequent Itemsets over Data Streams Using Efficient Window Sliding Techniques,” Expert Systems with Applications, Vol. 36, No. 2, pp. 1466–1477, 2009. [13] C. H. Lin, D. Y. Chiu, and Y. H. Wu, “Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window,” Proc. of the Int. Conf. on SDM, pp. 68–79, 2005. [14] K. C. Lin, I. E. Liao, and Z. S. Chen, “An Improved Frequent Pattern Growth Method for Mining Association Rules,” Expert Systems with Applications, Vol. 38, No. 5, pp. 5154–5161, 2011. [15] G. Mao, X. Wu, X. Zhu, G. Chen, and C. Liu, “Mining Maximal Frequent Itemsets from Data Streams,” Information Science, Vol. 33, No. 3, pp. 251–262, 2007. [16] B. Mozafar, H. Thakkar, and C. Zaniolo, “Verifying and Mining Frequent Patterns from Large Windows over Data Streams,” Proc. of the Int. Conf. on ICDE, pp. 179–188, 2008. [17] S. Tanbeer, C. Ahmed, B. Jeong, and Y. Lee, “Efficient Single-Pass Frequent Pattern Mining Using a Prefix-Tree,” Information Sciencess, Vol. 179, No. 5, pp. 559–583, 2009. [18] S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee, “Sliding Window-Based Frequent Pattern Mining over Data Streams,” Information Sciencess, Vol. 179, No. 22, pp. 3843–3865, 2009. [19] P. S. Tsai, “Mining Top-K Frequent Closed Itemsets over Data Streams Using the Sliding Window Model,” Expert Systems with Applications, Vol. 37, No. 10, pp. 6968–6973, 2010. [20] M. J. Zaki and C. J. Hsiao, “Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure,” IEEE Transaction on Knowledge and Data Engineering, Vol. 17, No. 4, pp. 462–478, 2005. [21] M. J. Zaki and C. jui Hsiao, “Charm: An Efficient Algorithm for Closed Itemset Mining,” Proc. of the Int. Conf. on SDM, pp. 457–473, 2002. [22] W. Zhang, H. Liao, and N. Zhao, “Research on the FP Growth Algorithm about Association Rule Mining,” Proc. of the 10th Int. Conf. on Business and Information Management, pp. 315–318, 2008.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0709112-163748.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS