國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個使用區域估計指標來探勘增量資料庫中可移除權重樣式集之InvP-List演算法,An InvP-List Algorithm with the Local Estimated Factor for Mining Weighted Erasable Itemsets over the Incremental Database

論文名稱 Title	一個使用區域估計指標來探勘增量資料庫中可移除權重樣式集之InvP-List演算法 An InvP-List Algorithm with the Local Estimated Factor for Mining Weighted Erasable Itemsets over the Incremental Database
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	106 學年度第 2 學期 The spring semester of Academic Year 106	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	80
研究生 Author	杜祥嘉 Siang-Jia Du
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	李強 Chiang Lee
口試委員 Advisory Committee	黃三益, 李建億, 王友群 San-Yih Hwang; Chien-I Lee; You-Chiun Wang
口試日期 Date of Exam	2018-06-29	繳交日期 Date of Submission	2018-07-04
關鍵字 Keywords	可移除樣式、權重限制、移除樣式集、區域性估計指標、頻繁樣式 Local Estimated, Weight Constraint, Erasable Itemset, Frequent Patterns, Itemset Pruning
統計 Statistics	本論文已被瀏覽 5693 次，被下載 0 次 The thesis/dissertation has been browsed 5693 times, has been downloaded 0 times.

中文摘要
可清除樣式集為資料庫中帶來價值較低的樣式集，此研究的應用包含零售業市場分析，以及節省資金等。過往的可清除樣式集演算法，只考慮產品所帶來的利益而沒有考慮到產品中零件各自的權重，且只適用於靜態資料庫中。然而，當新的資料被加入資料庫，原來的資料的門檻值將受到改變，因此必須重新掃描整個資料庫一次。另一方面，考慮產品中各零件的權重及價值探勘分析可清除權重樣式集，以貼近真實生活。但是，當我們使用過往的可清除樣式集演算法並考慮物品的權重，將會違反anti-monotone property。也就是，樣式Y雖然是可清除樣式集，但是其子集X，不一定是可清除樣式集。因此，為了加強過往可清除樣式集的演算法應用範圍，克服上述的限制，許多新的演算法提出了適用於動態資料庫的可清除權重樣式集探勘方法，其中包含IWEI演算法。IWEI演算法使用一個樣式集獲利高估的估計指標，進而滿足anti-monotone property，並使用IWEI-Tree及 OP-List，進而實現適用於動態資料庫演算法。然而，IWEI-Tree在被建立之後，仍必須被重建一次。若資料庫更新的頻率相當頻繁，這將會是一項非常耗時的工作。此外，雖然IWEI演算法使用了一靜態估計指標來刪除可清除權重樣式集之候選集，但由於此靜態估計指標的值仍太大以致於無法有效地刪除候選集。為了解決此問題，於此篇論文，我們提出了一個基於商品之反向索引檔之List結構以及一個區域性的估計指標，並使用商品之反向索引檔之List產生候選集List，再利用此區域性估計指標來刪除候選集List中的候選集，來降低選集List的大小。而我們提出的這個區域性估計指標稱為LMAW，LMAW用來判斷某一樣式集是否有機會成為可清除權重樣式之候選集。我們所提出的一個基於商品之反向索引檔之List演算法，InvP-List演算法，只需要掃描資料庫一次。此外，我們演算法提出的LMAW估計指標，相較於IWEI演算法的估計指標，可產生較少的候選集。從效能結果得知，我們利用真實資料與模擬資料測試，我們提出的InvP-List演算法都比IWEI演算法更有效率。
Abstract
An erasable itemset is the low profit itemset in the product database, and it is strongly concerned in some applications, including retail market analysis and economized funds. The previous algorithms for mining erasable itemsets ignore the weight of each component of the product and mine erasable itemsets by concerning the product profit only in static product databases. However, while the new data is inserted into the database, the threshold value of the original data will be changed and the database must be scanned again. On the other hand, by considering that each item has its individual price or weight, the result of erasable mining is approaching the real world. But, when we consider the weight of each component, previous algorithms for mining weighted erasable itemsets would violate the anti-monotone property. That is, the subset X of an erasable pattern Y may not be an erasable pattern. Therefore, previous algorithms for mining erasable itemsets need to expand its ability to deal with those kinds of tasks. To deal with the limitation, algorithms for mining weighted erasable itemsets for dynamic databases have been proposed, including the IWEI algorithm. The IWEI algorithm uses the static overestimated factor of itemsets profits to satisfy the ”anti-monotone property” of weighted erasable itemset, and constructs the IWEI-Tree and OP-List data structures for the dynamic database. However, the IWEI-Tree has to be reconstructed, when reading the whole product database is finished. It will take long time to complete the mining of the whole tree, if the database is frequently updated. Moreover, the IWEI algorithm generates the too low static value of the overestimated factor to prune candidates. To solve those problems, in this project, we propose the Inverted-Product-List algorithm (InvP-List) and with the local estimated factor to identify weighted erasable itemsets candidates from the Candidate-List which is generated from InvP-List. We propose the appropriate estimated factor to reduce the number of candidates which is called LMAW. LMAW is a local estimated factor which is used to check whether the itemset is a weighted erasable itemset or not. Our InvP-List algorithm requires only one database scan. Moreover, our proposed algorithm concerning the local estimated factor creates few number of candidates than the IWEI algorithm. From the performance study, we show that our InvP-List algorithm is more efficient than the IWEI algorithm both in the real and the synthetic datasets.

目次 Table of Contents
[THESIS VALIDATION LETTER+i] [ACKNOWLEDGEMENTS+ii] [ABSTRACT (CHINESE)+iii] [ABSTRACT (ENGLISH)+iv] [LIST OF FIGURES+vii] [LIST OF TABLES+x] [Introduction+1] [Weighted Erasable Itemset Mining+2] [Applications+6] [Related Works+7] [Motivation+10] [Organization of the Thesis+11] [A Survey of Algorithms for Mining Weighted Erasable itemset+12] [MERIT Algorithm+12] [The dMERIT+ Algorithm+13] [The MEI Algorithm+15] [The IWEI Algorithm+17] [Data Structures for the IWEI algorithm+18] [The Construction of the IWEI-Tree+19] [The Construction of the OP-list+21] [The Mining process+21] [The Inverted-Product List Based Algorithm+23] [Data Structure+23] [The Inverted-Product Table+25] [The Weight Table+31] [The Profit Table+31] [The 1-Candidate Table+32] [The Mining Algorithm with 1-Candidate-Table+34] [Insertion of the New Data+41] [Comparison+46] [Performance+49] [The Performance Model+49] [Experiments Results+51] [Real Dataset+51] [Synthetic Dataset+56] [Conclusion+63] [Summary+63] [Future Work+64] [BIBLIOGRAPHY+65]

參考文獻 References
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. of the 20th Int. Conf. on VLDB, pp. 490–501, 1994. [2] R. Agrawal, T. Imieli´nski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pp. 207–216, 1993. [3] Z.-H. Deng, G.-D. Fang, Z.-H.Wang, and X.-R. Xu, “Mining Erasable Itemsets,” Proc. of the Int. Conf.on Machine Learning and Cybernetics, pp. 67–73, 2009. [4] Z.-H. Deng and X.-R. Xu, “Fast Mining Erasable Itemsets Using NC-sets,” Expert Systems with Applications, Vol. 39, No. 4, pp. 4453–4463, March 2012. [5] Z. Deng and X. Xu, “An Efficient Algorithm for Mining Erasable Itemsets,” Proc. of the Int. Conf. on Advanced Data Mining and Applications, pp. 214–225, 2010. [6] A. Fariha, C. F. Ahmed, C. K. Leung, M. Samiullah, S. Pervin, and L. Cao, “A New Framework for Mining Frequent Interaction Patterns from Meeting Databases,” Engineering Applications of Artificial Intelligence, Vol. 45, pp. 103– 118, Oct. 2015. [7] J. Han, J. Peid, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pp. 1–12, 2000. [8] Q. Huynh-Thi-Le, B. V. Tuong Le, and B. Le, “An Efficient and Effective Algorithm for Mining Top-Rank-k Frequent Patterns,” Expert Systems with Applications, Vol. 42, No. 1, pp. 156–164, Jan. 2015. [9] L. J. Kao, Y. P. Huang, and F. E. Sandnes, “Mining Time-Dependent Influential Users in Facebook Fans Group,” IEEE Int. Conf. on Systems, Man, and Cybernetics, pp. 718–723, Oct. 2016. [10] R. U. Kiran, M. Kitsuregawa, and P. K. Reddy, “Efficient Discovery of Periodic- Frequent Patterns in Very Large Databases,” Journal of Systems and Software, Vol. 112, pp. 110–121, Feb. 2016. [11] T. Le, B. Vo, and F. Coenen, “An Efficient Algorithm for Mining Erasable Itemsets Using the Difference of NC-Sets,” Proc. of the IEEE Int. Conf. on Systems, Man, and Cybernetics, pp. 2270–2274, 2013. [12] T. Le and B. Vo, “MEI: An Efficient Algorithm for Mining Erasable Itemsets,” Engineering Applications of Artificial Intelligence, Vol. 27, No. 1, pp. 155–166, Jan. 2014. [13] G. Lee, U. Yun, H. Ryang, and D. Kim, “Erasable Itemset Mining Over Incremental Databases with Weight Conditions,” Engineering Applications of Artificial Intelligence, Vol. 52, No. 1, pp. 213–234, June 2016. [14] G. Lee, U. Yun, and K. H. Ryu, “Sliding Window Based Weighted Maximal Frequent Pattern Mining Over Data Streams,” Expert Systems with Applications, Vol. 41, No. 2, pp. 694–708, Feb. 2014. [15] J. C.-W. Lin, W. Gan, P. Fournier-Viger, and T.-P. Hong, “RWFIM: Recent Weighted-Frequent Itemsets Mining,” Engineering Applications of Artificial Intelligence, Vol. 45, No. 1, pp. 18–32, Oct. 2015. [16] H. Ryang and U. Yun, “High Utility Pattern Mining Over Data Streams with Sliding Window Technique,” Expert Systems with Applications, Vol. 57, No. 1, pp. 214–231, Sept. 2016. [17] J. Sahoo, A. K. Das, and A. Goswami, “An Efficient Approach for Mining Association Rules from High Utility Itemsets,” Expert Systems with Applications, Vol. 42, No. 13, pp. 5754 – 5778, Aug. 2015. [18] C.-H. Weng, “Revenue Prediction by Mining Frequent Itemsets with Customer Analysis,” Engineering Applications of Artificial Intelligence, Vol. 63, No. 1, pp. 85–97, Aug. 2017. [19] U. Yun and G. Lee, “Incremental Mining of Weighted Maximal Frequent Itemsets from Dynamic Databases,” Expert Systems with Applications, Vol. 54, No. 1, pp. 304–327, July 2016. [20] U. Yun and G. Lee, “Sliding Window Based Weighted Erasable Stream Pattern Mining for Stream Data Applications,” Future Generation Computer Systems, Vol. 59, No. 1, pp. 1–20, June 2016. [21] U. Yun, G. Lee, and K. H. Ryu, “Mining Maximal Frequent Patterns by Considering Weight Conditions Over Data Streams,” Knowledge-Based Systems, Vol. 55, No. 1, pp. 49–65, Jan. 2014. [22] U. Yun and H. Ryang, “Incremental High Utility Pattern Mining with Static and Dynamic Databases,” Applied Intelligence, Vol. 42, No. 2, pp. 323–352, March 2015. [23] U. Yun, H. Ryang, G. Lee, and H. Fujita, “An Efficient Algorithm for Mining High Utility Patterns from Incremental Databases with One Database Scan,” Knowledge-Based Systems, Vol. 124, No. 1, pp. 188–206, May 2017.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0603118-162333.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS