國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個於資料串流滑動式視窗中以子集晶格來探勘高效益樣式集的方法,A Subset-Lattice Algorithm for Mining High Utility Patterns over the Data Stream Sliding Window

論文名稱 Title	一個於資料串流滑動式視窗中以子集晶格來探勘高效益樣式集的方法 A Subset-Lattice Algorithm for Mining High Utility Patterns over the Data Stream Sliding Window
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	87
研究生 Author	陳榮富 Rong-Fu Chen
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	陳健輝 Gen-Huey Chen
口試委員 Advisory Committee	范俊逸, 李建億 Chun-I Fan; Chien-I Lee
口試日期 Date of Exam	2017-06-16	繳交日期 Date of Submission	2017-06-20
關鍵字 Keywords	資料串流、高效益樣式、晶格、滑動式視窗模組、效益探勘 Data Stream, Utility Mining, High Utility Patterns, Sliding Window Model, Lattice
統計 Statistics	本論文已被瀏覽 5712 次，被下載 11 次 The thesis/dissertation has been browsed 5712 times, has been downloaded 11 times.

中文摘要
對於現實的生活中，在資料串流 (Data stream) 中對高效益樣式探勘 (High utility pattern mining) 在資料挖掘領域中是一個重要的議題，比如說，超市。傳統的關聯式法則只考慮某一物品在一筆交易記錄中是有或無以及每一物品的價值皆相同。然而，效益探勘 (Utility mining) 被提出來解決這些限制，而藉由考慮每一物品擁有不同的價值以及某一物品在一筆交易記錄中的數量是一個或是多個的情況。在某些情況下，我們所要找的高效益樣式，也許是不頻繁出現的樣式組合，而不是最頻繁出現的樣式組合。由於不頻繁出現的樣式組合可能貢獻出很大量的價值，而最頻繁出現的樣式組合可能只貢獻出些許的價值。一個高效樣式是指其所貢獻價值大於或等於門檻值。此外，因為資料串流具有沒有限制性、連續性與速度快，因此我們必須將交易資料保存起來做探勘。再者，許多的應用都著重於距離現在時間較近的資料串流，而滑動式視窗模組 (Sliding window model) 主要是用在處理最近的資料串流。為了解決在滑動式視窗中找出高效益樣式，Ryang等學者提出SHU-Grow演算法，SHU-Grow演算法使用基於樹的資料結構來找出高效益樣式。在他們提出的結構中，他們記錄著每一樣式所貢獻的預估價值。最後，他們必須從候選高效益樣式中確認出真正的高效益樣式。因為以預估價值所產生的候選樣式個數將會大於以真實價值所產生的候選樣式個數。執行時間以及存儲記憶體的耗費將會增加。為了解決使用預估價值的問題，在這個論文中，基於滑動式視窗模組，我們提出Subset-Lattice演算法。我們的演算法使用晶格 (Lattice) 的資料結構來記錄交易的資料以及儲存子節點與父節點之間的關係。在每一個晶格節點中，我們會記錄項目集以及一個陣列QSRecords。當要新增一筆新的交易資料時，我們的演算法會考慮五種情況: (1) empty、 (2) equivalent、 (3) superset、 (4) subset與 (5) intersection。然而，我們在挖掘高效益樣式的時候，我們所算出來的數值是該樣式組合所貢獻的真正價值，而不是預估價值。所以我們所提的Subset-Lattice演算法所產生的候選樣式個數會少於SHU-Grow演算法。我們提出的演算法的結構需要的節點少於SHU-Grow演算法。藉由演算法執行時間數據，我們可以從模擬結果顯示出Subset-Lattice演算法對於SHU-Grow演算法在執行時間以及存儲空間 (節點個數) 有較好的效率。
Abstract
For the real world, high utility pattern mining over data streams is an important issue in the data mining field such as the supermarket. The traditional association rule mining only considers the present or absent of each item in the binary form and treats all items with the same profit and importance. However, utility mining was proposed to address the limitation by considering the different profit value and non-binary form quantity of each item. The problem is that infrequent patterns may contribute a great number of profit, whereas frequent patterns may only contribute a small amount of profit. The high utility pattern is the pattern that its contribution value is greater than the threshold. In addition, because data streams are unbounded, continuous, and high speed, we must keep the information of the transactions to perform the mining process. Furthermore, many applications are interested in the recent data streams. The sliding window model deals with the most recent data streams mainly. In order to solve mining high utility patterns based on the sliding window model over data streams, Ryang et al. propose the SHU-Grow algorithm. The SHU-Grow algorithm uses the tree-based data structure to mine high utility patterns. In their structure, they always record the estimated value of each pattern. At last, they have to identify actual high utility patterns from the candidate patterns. Since the number of candidates with the estimated value will be larger than the number of the patterns with real values, the processing time and storage cost will be increased. To solve the problem, which uses estimated value of each pattern, in this thesis, we propose the Subset-Lattice algorithm based on the sliding window model. Our algorithm utilizes the lattice structure to record the information of the transactions and to store relationship between the child node and the parent node. In each node, we record the itemset and an array QSRecords. When our algorithm inserts the new transaction, we consider five cases: (1) empty, (2) equivalent, (3) superset, (4) subset, and (5) intersection. Moreover, we calculate the actual value rather than the estimated value to find high utility patterns. So, our algorithm generates fewer number of the candidate patterns than the SHU-Grow algorithm. The data structure of our proposed algorithm requires fewer number of nodes than the SHU-Grow algorithm. From our simulation results, we show that Subset-Lattice algorithm has better performance than the SHU-Grow algorithm both in the processing time and storage space (the number of nodes).

目次 Table of Contents
[THESIS VALIDATION LETTER + i] [ACKNOWLEDGEMENTS + ii] [ABSTRACT(CHINESE) + iii] [ABSTRACT(ENGLISH) + iv] [LIST OF FIGURES + vii] [LIST OF TABLES + xi] [1. Introduction + 1] [1.1 High Utility Pattern Mining + 2] [1.2 Data Stream + 4] [1.2.1 Landmark Window Model + 4] [1.2.2 Tilted-Time Window Model + 4] [1.2.3 Sliding Window Model + 5] [1.3 Applications + 6] [1.4 Related Works + 6] [1.5 Motivation + 8] [1.6 Organization of the Thesis + 9] [2. A Survey of Sliding Window-Based Algorithms for Mining Useful Information over Data Streams + 10] [2.1 TMoment Algorithm + 10] [2.1.1 Computing Supports Using Transactions + 11] [2.1.2 Building TCET + 12] [2.1.3 The Window Slides Forward + 13] [2.1.3.1 Eliminating the Oldest Transaction + 13] [2.1.3.2 Inserting a New Transaction + 14] [2.2 SHU-Grow Algorithm + 14] [2.2.1 Construction of SHU-Tree + 16] [2.2.2 The Mining Process + 18] [2.2.3 The Window Slides Forward + 20] [2.3 WMFP-SW Algorithm + 21] [2.3.1 Pruning Patterns by MaxW + 24] [2.3.2 Pruning Patterns in Single-Path + 25] [2.3.3 The Window Slides Forward + 26] [3. The Subset-Lattice Algorithm + 29] [3.1 Data Structure + 29] [3.2 The Proposed Algorithm + 32] [3.2.1 Data Initialization + 34] [3.2.2 Data Insertion + 35] [3.2.3 Mining High Utility Patterns + 46] [3.2.4 Data Deletion + 48] [3.2.5 Comparison + 55] [4. Performance + 57] [4.1 Performance Model + 57] [4.2 Experiments Results + 59] [5. Conclusion + 68] [5.1 Summary + 68] [5.2 Future Work + 69] [BIBLIOGRAPHY + 70]

參考文獻 References
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. of the 20th Int. Conf. on VLDB, pp. 490–501, 1994. [2] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and H. J. Choi, “Interactive Mining of High Utility Patterns over Data Streams,” Expert Systems with Applications, Vol. 39, No. 15, pp. 11979–11991, Nov. 2012. [3] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “An Efficient Algorithm for Sliding Window-Based Weighted Frequent Pattern Mining over Data Streams,” IEICE Trans. on Information and Systems, Vol. 92, No. 7, July 2009. [4] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, Y. K. Lee, and H. J. Choi, “Single-Pass Incremental and Interactive Mining for Weighted Frequent Patterns,” Expert Systems with Applications, Vol. 39, No. 9, pp. 7976–7994, July 2012. [5] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. of the 17th Int. Conf. on Data Eng., pp. 443–452, 2001. [6] Y. I. Chang, C. E. Li, W. H. Peng, and S. Y. Wang, “Efficient Subset-Lattice Algorithms for Mining Closed Frequent Itemsets and Maximal Frequent Itemsets in Data Streams,” Int. Journal of Electrical Eng., Vol. 20, No. 2, pp. 51–63, April 2013. [7] Y. I. Chang, M. H. Tsai, C. E. Li, and P. Y. Lin, “A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams,” Intelligent Technologies and Eng. Systems, Vol. 20, No. 2, pp. 51–63, April 2013. [8] H. Chen, L. C. Shu, J. L. Xia, and Q. S. Deng, “Mining Frequent Patterns in a Varying-Size Sliding Window of Online Transactional Data Streams,” Information Sciences, Vol. 215, pp. 15–36, Dec. 2012. [9] Y. Chi, H. X. Wang, P. S. Yu, and R. R. Muntz, “Catch the Moment: Maintaining Closed Frequent Itemsets over a Data Stream Sliding Window,” Knowledge and Information Systems, Vol. 10, No. 3, pp. 265–294, Oct. 2003. [10] C. J. Chu, V. S. Tseng, and T. Liang, “An Efficient Algorithm for Mining Temporal High Utility Itemsets from Data Streams,” Journal of Systems and Software, Vol. 81, No. 7, pp. 1105–1117, July 2008. [11] C. Y. Dai and L. Chen, “An Algorithm for Mining Frequent Closed Itemsets in Data Stream,” Int. Conf. on Applied Physics and Industrial Eng., pp. 1722–1728, 2012. [12] M. Deypir, M. H. Sadreddini, and M. Tarahomi, “An Efficient Sliding Window Based Algorithm for Adaptive Frequent Itemset Mining over Data Streams,” Journal of Information Science and Eng., pp. 1001–1020. [13] Z. Farzanyar, M. Kangavari, and N. Cercone, “Max-FISM: Mining (recently) Maximal Frequent Itemsets over Data Streams Using the Sliding Window Model,” Information Sciences, Vol. 64, No. 6, pp. 1706–1718, Spet. 2012. [14] K. Gouda and M. J. Zaki, “GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets,” Data Mining and Knowledge Discovery, Vol. 11, No. 3, pp. 223–242, Nov. 2005. [15] G. Grahne and J. F. Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-Trees,” IEEE Trans. on Knowledge and Data Eng., Vol. 17, No. 10, pp. 1347– 1362, Oct. 2005. [16] C. S. Hemalatha, V. Vaidehi, and R. Lakshmi, “Minimal Infrequent Pattern Based Approach for Mining Outliers in Data Streams,” Expert Systems with Applications, Vol. 42, No. 4, pp. 1998–2012, March 2015. [17] J. L. Koh and S. N. Shin, “An Approximate Approach for Mining Recently Frequent Itemset from Data Streams,” Computer Science Data Warehousing and Knowledge Discovery, Vol. 4081, No. 1, pp. 352–362, Spet. 2006. [18] G. Lee, U. Yun, and K. H. Ryu, “Sliding Window Based Weighted Maximal Frequent Pattern Mining over Data Streams,” Expert Systems with Applications, Vol. 41, No. 2, pp. 694–708, Feb. 2014. [19] H. F. Li, “MHUI-Max: An Efficient Algorithm for Discovering High Utility Itemsets from Data Streams,” Journal of Information Science, Vol. 37, No. 5, pp. 532– 545, Sept. 2011. [20] H. F. Li and H. Chen, “Mining Non-Derivable Frequent Itemsets over Data Stream,” Data and Knowledge Eng., Vol. 68, No. 5, pp. 481–498, May 2009. [21] H. F. Li, C. C. Ho, and S. Y. Lee, “Incremental Updates of Closed Frequent Itemsets over Continuous Data Streams,” Expert Systems with Applications, Vol. 36, No. 2, pp. 2451–2458, March 2009. [22] H. F. Li and S. Y. Lee, “Mining Frequent Itemsets over Data Streams Using Efficient Window Sliding Techniques,” Expert Systems with Applications, Vol. 36, No. 2, pp. 1466–1477, March 2009. [23] H. F. Li and N. Zhang, “Mining Maximal Frequent Itemsets over a Stream Sliding Window,” Proc. of Information Computing and Telecommunication, pp. 110–113, 2010. [24] C. H. Lin, D. Y. Chiu, Y. H. Wu, and A. L. P. Chen, “Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window,” Proc. of the SIAM Int. Conf. on Data Mining, pp. 68–79, 2005. [25] M. Y. Lin, T. F. Tu, and S. C. Hsueh, “High Utility Pattern Mining Using the Maximal Itemset Property and Lexicographic Tree Structures,” Information Sciences, Vol. 215, pp. 1–14, Dec. 2012. [26] F. Nori, M. Deypir, and M. H. Sadreddni, “A Sliding Window Based Algorithm for Frequent Closed Itemset Mining over Data Streams,” Journal of Systems and Software, Vol. 86, No. 3, pp. 615–623, March 2013. [27] H. Ryang and U. Yun, “High Utility Pattern Mining over Data Streams with Sliding Window Technique,” Expert Systems with Applications, Vol. 57, pp. 215– 231, Sept. 2016. [28] B. E. Shie, P. S. Yu, and V. S. Tseng, “Efficient Algorithms for Mining Maximal High Utility Itemsets from Data Streams with Different Models,” Expert Systems with Applications, Vol. 39, No. 17, pp. 12947–12960, Dec. 2012. [29] S. K. Tanbeer, C. F. Ahmed, B. S. Jeong, and Y. K. Lee, “Sliding Window-Based Frequent Pattern Mining over Data Streams,” Information Sciences, Vol. 179, No. 22, pp. 3843–3865, Nov. 2009. [30] U. Yun and G. Lee, “Sliding Window Based Weighted Erasable Stream Pattern Mining for Stream Data Applications,” Future Generation Computer Systems, Vol. 59, pp. 1–20, June 2016. [31] U. Yun, H. Shin, K. H. Ryu, and E. C. Yoon, “An Efficient Mining Algorithm for Maximal Weighted Frequent Patterns in Transactional Databases,” Knowledge- Based Systems, Vol. 33, pp. 53–64, Sept. 2012.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0519117-171805.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS