Responsive image
博碩士論文 etd-0709112-163748 詳細資訊
Title page for etd-0709112-163748
論文名稱
Title
一個於資料串流移動式視窗中以集合晶格來探勘最大 頻繁集的方法
A Subset-Lattice Algorithm for Mining Maximal Frequent Itemsets over a Data Stream Sliding Window
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
79
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-06-29
繳交日期
Date of Submission
2012-07-09
關鍵字
Keywords
項目集、資料串流、最大頻繁項目集、晶格、移動式視窗模型
Maximal Frequent Itemset, Lattice, Sliding Window Model, Itemset, Data Stream
統計
Statistics
本論文已被瀏覽 5670 次,被下載 186
The thesis/dissertation has been browsed 5670 times, has been downloaded 186 times.
中文摘要
在資料串流(data stream)中,線上的資料關聯法則是在資料探勘這個領域
中非常重要的一個部分。其中,探勘最大頻繁項目集(maximal frequent itemset)
也是一個重要的議題。一個最大頻繁項目集是指不存在一個頻繁項目集
(frequent itemset)是其超集,而這類的頻繁項目集稱之為最大頻繁項目集。由
於資料串流具有速度快、連續性,沒有限制性及即時性等特性,因此,我們只可
掃描資料庫一次。因此,之前在傳統資料庫探勘最大頻繁項目集的演算法並不適
用於資料串流。此外,很多應用都著重於距離現在時間較近的資料串流,而移動
式視窗(sliding window model)正是處理最近資料串流的模式。在移動式視窗裡,
須要先定義視窗的大小。而MFIoSSW 演算法是一個以移動式視窗為模式,探勘最
大頻繁項目集的演算法。MFIoSSW 演算法使用一個簡單的資料結構來探勘最大頻
繁項目集。此演算法使用一個矩陣A 來儲存最大頻繁項目集及其他有用的資料。
但是MFIoSSW 演算法會用較多的時間探勘最大頻繁項目集,當一個新的交易進來
時,新舊交易之間的比較次數會比較多。因此,在此計畫中,我們想提出一個以
移動式視窗為模型的演算法,為Subset-Lattice 演算法。 Subset-Lattice 演
算法是利用晶格(lattice)資料結構來儲存交易的資料。晶格資料結構會儲存父
節點和子節點之間的關係。在每一個晶格節點,我們會儲存項目集、支持值和交
易的編號數列。當新的交易進來時,我們會根據交易的特性分成五個集合:(1)
equivalent (2) subset (3) intersection (4) empty set (5) superset。 根
據這五種集合關係,我們要新增或更新資料時會比較有效率。
Abstract
Online mining association rules in data streams is an important field in the data
mining. Among them, mining the maximal frequent itemsets is also an important
issue. A frequent itemset is called maximal if it is not a subset of any other frequent
itemset. The set of all the maximal frequent itemsets is denoted as the maximal
frequent itemset. Because data streams are continuous, high speed, unbounded, and
real time. As a result, we can only scan once for the data streams. Therefore, the
previous algorithms to mine the maximal frequent itemsets in the traditional
databases are not suitable for the data streams. Furthermore, many applications are
interested in the recent data streams, and the sliding window is the model which
deal with the most recent data streams. In the sliding window model, a window size
is required. One of the algorithms for mining the maximal frequent itemsets based
on the sliding window model is called the MFIoSSW algorithm. The MFIoSSW
algorithm uses a compact structure to mine the maximal frequent itemsets. It uses
an array-based structure A to store the maximal frequent itemsets and other helpful
itemsets. But it takes long time to mine the maximal frequent itemsets. When the
new transaction comes, the number of comparison between the new transaction and
the old transactions is too much. Therefore, in this project, we propose a sliding
window approach, the Subset-Lattice algorithm. We use the lattice structure to store
the information of the transactions. The structure of the lattice stores the relationship
between the child node and the father node. In each node, we record the itemset and
the support. When the new transaction comes, we consider five relations: (1)
equivalent, (2) subset, (3) intersection, (4) empty set, (5) superset. With this five
relations, we can add the new transactions and update the support efficiently.
目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Data Streams Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Window Models in Data Streams . . . . . . . . . . . . . . . . 2
1.2.2 Two Types of the Sliding Window Model . . . . . . . . . . . . 3
1.2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Mining Maximal Frequent Itemsets in Data Streams . . . . . . . . . . 6
1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 10
2. A Survey of Data Mining Algorithms . . . . . . . . . . . . . . . . . 12
2.1 The Apriori Algorithm for Association Rules Mining . . . . . . . . . 12
2.2 Mining Maximal Frequent Itemsets from Data Streams . . . . . . . . 15
2.2.1 Description of INSTANT . . . . . . . . . . . . . . . . . . . . 15
2.3 The DSTree for the Mining of Frequent Sets from Data Streams . . . 15
2.4 The MFI-TransSW Algorithm for Frequent Itemsets Mining in Data
streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 The MFI-TransSW Algorithm Structure . . . . . . . . . . . . 19
2.4.2 Three Phases of The MFI-TransSW Algorithm . . . . . . . . . 20
2.5 A CPS-tree Structure for Frequent Itemsets Mining in Data Streams . 21
2.5.1 The Structure of the CPS-tree . . . . . . . . . . . . . . . . . . 22
2.5.2 The Algorithm of the CPS-tree . . . . . . . . . . . . . . . . . 22
2.6 Mining Maximal Frequent Itemsets over a Stream Sliding Window . . 24
i
Page
2.6.1 Transaction Addition . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Transaction Deletion . . . . . . . . . . . . . . . . . . . . . . . 25
3. The Subset-Lattice Algorithm . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Data Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Data Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 The Time-Sensitive Sliding Window . . . . . . . . . . . . . . . 44
3.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 The Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
參考文獻 References
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in
Large Databases,” Very Large Data Bases, pp. 487–499, 1994.
[2] F. Ao, Y. Yan, J. Huang, and K. Huang, “Mining Maximal Frequent Itemsets
in Data Streams Based on FP Tree,” Proc. of the 5th Int. Conf. on Machine
Learning and Data Mining in Pattern Recognition, pp. 479–489, 2007.
[3] J. Chang and W. Lee, “estwin: Online Data Stream Mining of Recent Frequent
Itemsets by Sliding Window Method,” Information Sciences, Vol. 31, No. 2,
pp. 76–90, 2005.
[4] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, “Catch the Moment: Maintaining
Closed Frequent Itemsets over a Data Stream Sliding Window,” Knowledge and
Information Systems, Vol. 10, No. 3, pp. 265–294, 2006.
[5] C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, “Mining Frequent
Patterns in Data Streams at Multiple Time Granularities,” In:
H.Kargupta, A.Joshi,K.Sivakumar, eds. Next Generation Data Mining. Cambridge,
Massachusetts: MIT Press, pp. 191–212, 2003.
[6] J. Han, J. Peid, and Y. Yin, “Mining Frequent Patterns without Candidate
Generation,” ACM SIGMOD Record, Vol. 29, No. 2, pp. 1–12, 2000.
[7] J. L. Koh and S. N. Shin, “An Approximate Approach for Mining Recently Frequent
Itemsets from Data Streams,” Proc. of the Int. Conf. on Data Warehousing
and Knowledge Discovery, pp. 352–362, 2006.
[8] C. K. S. Leung and Q. I. Khan, “DSTree: A Tree Structure for the Mining of
Frequent Sets from Data Streams,” Proc. of the 6th IEEE Int. Conf. on Data
Mining, pp. 928–932, 2006.
[9] H. Li and N. Zhang, “Mining Maximal Frequent Itemsets Over a Stream Sliding
Window,” Proc. of Information Computing and Telecommunications, pp. 110–
113, 2010.
[10] H. F. Li, “Interactive Mining of Top-K Frequent Closed Itemsets from Data
Streams,” Expert Systems with Applications, Vol. 36, No. 7, pp. 10779–10788,
2009.
[11] H. F. Li, C. C. Ho, and S. Y. Lee, “Incremental Updates of Closed Frequent Itemsets
over Continuous Data Streams,” Expert Systems with Applications, Vol. 36,
No. 2, pp. 2451–2458, 2009.
[12] H. F. Li and S. Y. Lee, “Mining Frequent Itemsets over Data Streams Using
Efficient Window Sliding Techniques,” Expert Systems with Applications, Vol. 36,
No. 2, pp. 1466–1477, 2009.
[13] C. H. Lin, D. Y. Chiu, and Y. H. Wu, “Mining Frequent Itemsets from Data
Streams with a Time-Sensitive Sliding Window,” Proc. of the Int. Conf. on SDM,
pp. 68–79, 2005.
[14] K. C. Lin, I. E. Liao, and Z. S. Chen, “An Improved Frequent Pattern
Growth Method for Mining Association Rules,” Expert Systems with Applications,
Vol. 38, No. 5, pp. 5154–5161, 2011.
[15] G. Mao, X. Wu, X. Zhu, G. Chen, and C. Liu, “Mining Maximal Frequent
Itemsets from Data Streams,” Information Science, Vol. 33, No. 3, pp. 251–262,
2007.
[16] B. Mozafar, H. Thakkar, and C. Zaniolo, “Verifying and Mining Frequent Patterns
from Large Windows over Data Streams,” Proc. of the Int. Conf. on ICDE,
pp. 179–188, 2008.
[17] S. Tanbeer, C. Ahmed, B. Jeong, and Y. Lee, “Efficient Single-Pass Frequent
Pattern Mining Using a Prefix-Tree,” Information Sciencess, Vol. 179, No. 5,
pp. 559–583, 2009.
[18] S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee, “Sliding Window-Based
Frequent Pattern Mining over Data Streams,” Information Sciencess, Vol. 179,
No. 22, pp. 3843–3865, 2009.
[19] P. S. Tsai, “Mining Top-K Frequent Closed Itemsets over Data Streams Using
the Sliding Window Model,” Expert Systems with Applications, Vol. 37, No. 10,
pp. 6968–6973, 2010.
[20] M. J. Zaki and C. J. Hsiao, “Efficient Algorithms for Mining Closed Itemsets
and Their Lattice Structure,” IEEE Transaction on Knowledge and Data Engineering,
Vol. 17, No. 4, pp. 462–478, 2005.
[21] M. J. Zaki and C. jui Hsiao, “Charm: An Efficient Algorithm for Closed Itemset
Mining,” Proc. of the Int. Conf. on SDM, pp. 457–473, 2002.
[22] W. Zhang, H. Liao, and N. Zhao, “Research on the FP Growth Algorithm about
Association Rule Mining,” Proc. of the 10th Int. Conf. on Business and Information
Management, pp. 315–318, 2008.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code