Responsive image
博碩士論文 etd-0525115-120601 詳細資訊
Title page for etd-0525115-120601
論文名稱
Title
一個於資料串流移動視窗中以權重排序晶格來探勘最大權重頻繁樣式的方法
A Weight-Order-Based Lattice Algorithm for Mining Maximal Weighted Frequent Patterns over a Data Stream Sliding Window
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
79
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-06-05
繳交日期
Date of Submission
2015-06-25
關鍵字
Keywords
最大權重頻繁項目集、資料串流、項目集、移動式視窗模型、晶格
Sliding Window Model, Lattice, Weighted Maximal Frequent Itemset, Data Stream, Itemset
統計
Statistics
本論文已被瀏覽 5693 次,被下載 44
The thesis/dissertation has been browsed 5693 times, has been downloaded 44 times.
中文摘要
對於現實生活中,在資料串流(data stream)中對權重頻繁樣式探勘(Weighted frequent pattern mining)是一個非常重要領域,比如說,超市。其中探勘最大權重頻繁樣式(Weighted maximal frequent pattern)也是一個重要議題。一個最大權重頻繁樣式是指其不是其他權重頻繁樣式的子集且同時它的權重值是大於或等於門檻值。但很多之前方法是沒辦法使用在權重頻繁樣式探勘上,因為它們大部份採用了anti-monotone property。這個特性是說,即使有一個樣式Y的子集X,此子集X不是weighted frequent pattern,但是但樣式Y卻仍可能是weighted frequent pattern。此外,由於資料串流具有速度快、連續性、沒有限制性與即時性等特性,因此我們只可掃描資料庫一次。所以,之前在傳統資料庫探勘之演算法是不適用於資料串流。再則,很多應用都著重於距離現在時間較近的資料串流,而移動式視窗(sliding window model) 正是處理最近資料串流的模式。為了在移動式視窗中找出最大權重頻繁樣式,Ryu學者等人提出了WMFP-SW演算法。WMFP-SW演算法使用FP-tree去探勘最大權重頻繁樣式。它還使用了最大權重來減少候補樣式。但是它在視窗移動的時候花了太多時間。因為當新的交易資料到達的時候,WMFP-SW演算法必須重建整個FP-tree。除此之外,WMFP-SW演算法可能會有missing case。為了解決這些問題,在這個論文中,基於移動式視窗為模型的演算法,我們提出Weighted-Order-Based Lattice演算法。我們使用晶格(lattice)來儲存交易的資料。晶格資料結構會儲存父節點與子節點之間的關係。在每一個晶格節點,我們會儲存項目集與數量。當新的交易資料到達的時候,我們根據交易特性分成五種集合關係 : (1) equivalent、(2) subset、(3) intersection、(4) empty set與 (5) superset。根據這五種集合關係,我們要新增或更新資料的時候會更有效率。此外,我們使用global maximal weight pruning方式 與local maximal weight pruning 方式去避免不必要的運算。從效能成果研讀,我們顯示了在真實資料與在模擬資料之測試,我們的Weighted-Order-Based Lattice演算法都比WMFP-SW演算法更有效率。
Abstract
Weighted frequent pattern mining in data streams is an important field for the real world, such as the supermarket. Moreover, mining the weighted maximal frequent patterns is also an important issue. The weighted maximal frequent pattern is the pattern which is not the subset of any other pattern and the weighted support is larger than the threshold. However, many previous Apriori-like algorithms cannot be used in weighted frequent pattern mining. The reason is that even through a subset X of a pattern Y is not a weighted frequent pattern, the pattern Y may be a weighted frequent pattern. Besides, because data streams are continuous, high speed, unbounded, and real time, we can only scan once for the data streams. Therefore, the previous algorithms in the traditional databases are not suitable for the data streams. Furthermore, many applications are interested in the recent data streams, and the sliding window is the model which deals with the most recent data streams. In order to solve mining weighted maximal frequent patterns based on the sliding window model, Ryu et al. propose the WMFP-SW algorithm. The WMFP-SW algorithm uses the FP-tree to mine the weighted maximal frequent patterns. It also uses maximal weight to prune the patterns. But it takes long time in mining the weighted maximal frequent patterns. Because when the new transaction comes, the WMFP-SW algorithm always has to reconstruct the FP-tree. Moreover, the WMFP-SW algorithm may have a missing case. To solve those problems, in this thesis, we propose the Weighted-Order-Based Lattice algorithm based on the sliding window model. We use the lattice structure to store the information of the transactions. The structure of the lattice stores the relationship between the child node and the father node. In each node, we record the itemset and the count. When the new transaction comes, we consider five relations: (1) equivalent, (2) subset, (3) intersection, (4) empty set, (5) superset. With those five relations, we can add the new transactions and update the support efficiently. Moreover, we use global maximal weight pruning strategy and local maximal weight pruning strategy to avoid generating invalid candidate patterns. From the the performance study, including the real data and synthetic data, we show that theWeighted-Order-Based Lattice algorithm provides better performance than the WMFP-SW algorithm both in the case of real data and the case of simulation in both cases.
目次 Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Data Stream Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Window Models in Data Streams . . . . . . . . . . . . . . . . 2
1.2 Mining Weighted Maximal Frequent Itemsets . . . . . . . . . . . . . . 4
1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 12
2. A Survey of Algorithms for Mining Weighed Frequent Patterns . 13
2.1 The MWFIM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The WMFP-SW Algorithm . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Data Structures for WMFP-SW . . . . . . . . . . . . . . . . . 15
2.2.2 Pruning Patterns by MaxW . . . . . . . . . . . . . . . . . . . 19
2.2.3 Pruning Patterns in Single-Path . . . . . . . . . . . . . . . . . 20
2.3 The MWS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 The Set-Checking Algorithm . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 The Set-Checking Algorithm . . . . . . . . . . . . . . . . . . . 22
3. A Weight-Order-Based Lattice Algorithm . . . . . . . . . . . . . . . 26
3.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Our Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Data Initialization . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 The Pruning Strategy . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Data Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.4 Data Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.5 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 The Performance Model . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.2 Simulation Data . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
參考文獻 References
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases, ” Proc. of the 1993 ACM SIGMOD Int. Conf. on Management of Data, Vol. 22, No. 2, pp. 207-216, June 1993.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. of the 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215, pp. 487-499, 1994.
[3] C. F. Ahmed, S. K. Tanbeer, and B. S. Jeong, “Efficient Mining of Weighted Frequent Patterns over Data Streams,” Proc. of the 11th Int. Conf. on High Performance Computing and Communications, pp. 400-406, 2009.
[4] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, Y. K. Lee, and H. J. Choi, “Single-Pass Incremental and Interactive Mining for Weighted Frequent Patterns,” Expert Systems with Applications, Vol. 39, No. 9, pp. 7976-7994, July 2012.
[5] V. Bogorny, J. Valiati, S. Camargo, P. Engel, B. Kuijpers, and L. O. Alvares, “Mining Maximal Generalized Frequent Geographic Patterns with Knowledge Constraints,” Proc. of the 6th Int. Conf. on Data Mining, pp. 813-817, 2006.
[6] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. of the 17th Int. Conf. on Data Engineering, pp. 443-452, 2001.
[7] L. Chang, T. Wang, D. Yang, H. Luan, and S. Tang, “Efficient Algorithms for Incremental Maintenance of Closed Sequential Patterns in Large Databases,” Data Knowledge Engineering, Vol. 68, No. 1, pp. 68-106, Jan. 2009.
[8] Y. I. Chang, C. E. Li, and W. H. Peng, “An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams,” Proc. of the Int. Conf. on Technologies and Applications of Artificial Intelligence (TAAI), pp. 21-26, 2012.
[9] Y. I. Chang, M. H. Tsai, C. E. Li, and P. Y. Lin, “A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams,” Intelligent Technologies and Engineering Systems, Vol. 234, No. 1, pp. 235-241, Feb. 2013.
[10] Y. Chen, R. F. Bie, and C. Xu, “A New Approach for Maximal Frequent Sequential Patterns Mining over Data Streams,” Int, Journal of Digital Content Technology and Its Applications, Vol. 5, No. 6, June 2011.
[11] G. Fang, Z. Deng, and H. Ma, “Network Traffic Monitoring Based on Mining Frequent Patterns,” Proc. of the 6th Int. Conf. on Fuzzy Systems and Knowledge Discovery, Vol. 7, pp. 571-575, 2009.
[12] J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Vol. 8, No. 1, pp. 53-87, Jan. 2004.
[13] J. Han, J. Peid, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” ACM SIGMOD Record, Vol. 29, No. 2, pp. 1-12, 2000.
[14] G. Lee, U. Yun, and K. H. Ryu, “Sliding Window Based Weighted Maximal Frequent Pattern Mining over Data Streams,” Expert Systems with Applications, Vol. 41, No. 2, pp. 694-708, Feb. 2014.
[15] F. G. Li, Y. J. Sun, Z. W. Ni, Y. Liang, and X. M. Mao, “The Utility Frequent Pattern Mining Based on Slide Window in Data Stream,” Proc. of the 5th Int. Conf. on Intelligent Computation Technology and Automation (ICICTA), pp. 414-419, 2012.
[16] W. Liu, Y. Zheng, S. Chawla, J. Yuan, and X. Xing, “Discovering SpatioTemporal Causal Interactions in Traffic Data Streams,” Proc. of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 1010-1018, 2011.
[17] R. T. Ng, L. V. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,” ACM SIGMOD Record, Vol. 27, pp. 13-24, 1998.
[18] B. Vo, F. Coenen, and B. Le, “A New Method for Mining Frequent Weighted Itemsets based on WIT-trees,” Expert Systems with Applications, Vol. 40, No. 4, pp. 1256-1264, March 2013.
[19] J. Wang and Y. Zeng, “DSWFP: Efficient Mining of Weighted Frequent Pattern over Data Streams,” Proc. of the 8th Int. Conf. on Fuzzy Systems and Knowledge Discovery (FSKD), Vol. 2, pp. 942-946, 2011.
[20] U. Yun, “Mining Lossless Closed Frequent Patterns with Weight Constraints,” Knowledge-Based Systems, Vol. 20, No. 1, pp. 86-97, Feb. 2007.
[21] U. Yun, G. Lee, and K. H. Ryu, “Mining Maximal Frequent Patterns by Considering Weight Conditions over Data Streams,” Knowledge-Based Systems, Vol. 55, No. 1, pp. 49-65, Jan. 2014.
[22] U. Yun, H. Shin, K. H. Ryu, and E. Yoon, “An Efficient Mining Algorithm for Maximal Weighted Frequent Patterns in Transactional Databases,” KnowledgeBased Systems, Vol. 33, No. 1, pp. 53-64, Sept. 2012.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code