國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,具一致性關聯規則之有效探勘方法 ,Efficient Mining Approaches for Coherent Association Rules

論文名稱 Title	具一致性關聯規則之有效探勘方法 Efficient Mining Approaches for Coherent Association Rules
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	67
研究生 Author	林郁凱 Yui-Kai Lin
指導教授 Advisor	洪宗貝 Tzung-Pei Hong
召集委員 Convenor	林文揚 Wen-Yang Lin
口試委員 Advisory Committee	李宗南, 陳俊豪 Chung-Nan Lee; Chun Hao Chen
口試日期 Date of Exam	2012-07-24	繳交日期 Date of Submission	2012-08-29
關鍵字 Keywords	關聯式規則、推論邏輯、資料挖掘、高度一致性規則、投影技術 projection, coherent rules, propositional logic, association rules, data mining
統計 Statistics	本論文已被瀏覽 5656 次，被下載 522 次 The thesis/dissertation has been browsed 5656 times, has been downloaded 522 times.

中文摘要
資料挖掘技術主要目的是從大量的資料庫中找尋出各種不同商品之間的潛在關係，進而幫助市場經理人藉由此技術提升產品銷售量。Apriori演算法就是一種挖掘關聯式規則的技術。然而許多建立在Apriori演算法上的資料挖掘技術通常只專注於在找尋正相關的規則，像是“買了牛奶的人就會買麵包”。然而，如果只考慮正相關規則而忽略負相關規則的重要性，則可能會誤導人們做出錯誤的決策。例如，雖然找出“買了牛奶的人就會買麵包”的規則，然而交易資料中可能也會產生“不買牛奶的人就會買麵包”的規則，這個時候這兩條規則是相互產生牴觸的。換句話說，如果探勘出“買了牛奶的人就會買麵包”同時也找出“不買牛奶的人就不會買麵包”這兩條規則，那麼這條規則是具實際應用的參考價值。本論文中，為了解決以上所提出的問題，我們利用推論邏輯等價的概念，提出兩種挖掘高度一致性規則的演算法。第一個方法稱為Apriori為基礎的高度一致性規則的演算法；在此方法中，根據Apriori演算法加入邏輯等價的概念進行探勘高度一致性規則。同時，我們亦推導出商品集的上限與下限用於刪除不必要的檢查。接著，為提升探勘的效率，第二個方法則採取了投影的想法，提出一個以投影技術為基礎(Projection-based)的高度一致性規則探勘演算法。透過第二個方法，探勘的效率則可有效的提升。實驗部分，透過多組模擬資料與一組真實資料進行實驗後，實驗結果顯示所提方法可以找出可靠度較佳的高度一致性規則，且第二個方法中亦顯示探勘的效率可明顯提升。
Abstract
The goal of data mining is to help market managers find relationships among items from large datasets to increase profits. Among the mining techniques, the Apriori algorithm is the most basic and important for association rule mining. Although a lot of mining approaches have been proposed based on the Apriori algorithm, most of them focus on positive association rules, such as R1: “If milk is bought, then bread is bought”. However, rule R1 may confuses users and makes wrong decision if the negative relation rules are not considered. For example, the rule such as R2: “If milk is not bought, then bread is bought” may also be found. Then, the rule R2 conflicts with the positive rule R1. So, if two rules such as “If milk is bought, then bread is bought” and “If milk is not bought, then bread is not bought” are found at the same time, the rules which is called coherent rule may be more valuable.In this thesis, we thus propose two algorithms for solving this problem. The first proposed algorithm is named Highly Coherent Rule Mining algorithm (HCRM), which takes the properties of propositional logic into consideration and is based on Apriori approach for finding coherent rules. The lower and upper bounds of itemsets are also tightened to remove unnecessary check. Besides, in order to improve the efficiency of the mining process, the second algorithm, namely Projection-based Coherent Mining Algorithm (PCA), based on data projection is proposed for speeding up the execution time. Experiments are conducted on real and simulation datasets to demonstrate the performance of the proposed approaches and the results show that both HCRM and PCA can find more reliable rules and PCA is more efficient.

目次 Table of Contents
誌謝 i 摘要 ii Abstract iii Table of Contents iv List of Tables vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Thesis organization 3 Chapter 2 Review of Related Mining Approaches 5 2.1 Association Rule Mining Approaches 5 2.2 Concept of Coherent Rules 7 Chapter 3 Derivations of Lower and Upper Bounds of Itemsets 10 Chapter 4 Highly Coherent Association Rule Mining (HCRM) Algorithm 13 4.1 Proposed HCRM algorithm 13 4.2 An Example 17 Chapter 5 Projection-Based Coherent Association Rule Mining Algorithm (PCA) 25 5.1 Proposed projection-based coherent mining algorithm (PCA) 25 5.2 An Example of PCA 31 Chapter 6 Experimental Results 43 6.1 Experimental Results of the first Proposed Method 44 6.2 Experimental Results of the Second Proposed Method 48 Chapter 7 Conclusion and Future Work 50 References 52

參考文獻 References
[1]R. Agarwal, C. Aggarwal and V. V. V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Itemsets,” Journal of Parallel and Distributed Computing, Vol. 61 No. 3, pp. 350-371, 2001 [2]R. Agrawal, T. Imielinksi and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” ACM SIGMOD International Conference on Management of Data, pp. 207-216, 1993 [3]R. Agrawal, T. Imielinksi and A. Swami, “Database Mining: A Performance Perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925, 1993 [4]R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” International Conference on Very Large Data Bases, pp.487-499, 1994 [5]M. L. Antonie and O. R. Zaïane, “Mining Positive and Negative Association Rules: An Approach for Confined Rules,” European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 27-38, 2004 [6]T. D. Bie, “An Information Theoretic Framework for Data Mining,” ACM SIGKDD conference on Knowledge Discovery and Data Mining, pp. 564-572, 2011 [7]E. Baralis, T. Cerquitelli and S. Chiusano, “IMine: Index Support for Item Set Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 4, pp. 493-506, 2009 [8]S. Brin, R. Motwani and C. Silverstein, “Beyond Market Baskets: Generalizing Association Rules to Correlations,” ACM SIGMOD international conference on Management of data, pp. 265-276, 1997 [9]Y. L. Cheung and A. W. C. Fu, “Mining Frequent Itemsets without Support Threshold: With and Without Item Constraints,” IEEE Transactions on Knowledge and Data Engineering, Vol.16, No.9, pp. 1052-1069, 2004 [10]B. Cule and B. Goethals, “Mining Association Rules in Long Sequences,” Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, pp. 300-309, 2010 [11]Y. H. Chu, J. W. Huang, K. T. Chuang, D. N. Yang and M. S. Chen, “Density Conscious Subspace Clustering for High-Dimensional Data,” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 1, pp. 16-30, 2010 [12]R. Cai, A. K.H. Tung, Z. Zhang and Z. Hao, “What is Unequal among the Equals? Ranking Equivalent Rules from Gene Expression Data,” IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 11, pp. 1735-1747, 2011 [13]D. A. Chiang, C. T. Wang, S. P. Chen and C. C. Chen, “The Cyclic Model Analysis on Sequential Patterns,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 11, pp. 1617-1628, 2009 [14]T. D. T. Do, A. Laurenty and A. Termier, “PGLCM: Efficient Parallel Mining of Closed Frequent Gradual Itemsets,” IEEE International Conference on Data Mining, pp.138-147, 2010 [15]J. Han, H. Cheng, D. Xin, and X. Yan, “Frequent Pattern Mining: Current Status and Future Directions,” Data Mining and Knowledge Discovery, Vol.15, No.1, pp. 55-86, 2007 [16]J. Han, J. Pei, Y. Yin and R. Mao, “Mining Frequent Patterns without Candidate Generation a Frequent-Pattern Tree Approach,” Journal Data Mining and Knowledge Discovery, Vol. 8, No.1, pp. 53-87, 2004 [17]J. Han, J. Wang, Y. Lu and P. Tzvetkov, “Mining Top-K Frequent Closed Patterns without Minimum Support,” IEEE International Conference on Data Mining, pp.211-218, 2002 [18]C. Marinica and F. Guillet, “Knowledge-Based Interactive Postmining of Association Rules Using Ontologies,” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 6, pp. 784-797, 2010 [19]J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M. Hsu, “PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth,” International Conference on Data Engineering, pp. 215-224, 2001 [20]J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal and M. C. Hsu, “Mining Sequential Patterns by Pattern-Growth: The Prefixspan Approach,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp.1424-1440, 2004 [21]M. Plantevit ,A. Laurent ,D. Laurent ,M. Teisseire and Y. W. Choong, “Mining Multidimensional and Multilevel Sequential Patterns,” ACM Transactions on Knowledge Discovery from Data, Vol. 4, No. 1, pp. 4-37, 2010 [22]S. Ruggieri, “Frequent Regular Itemset Mining,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263-272, 2010 [23]M. Segond and C. Borgelt, “Item Set Mining Based on Cover Similarity,” Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Vol. 6635, pp. 493-505, 2011 [24]A. T. H. Sim, M. Indrawan, S. Zutshi, and B. Srinivasan, “Logic-Based Pattern Discovery,” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 6, pp. 798-811, 2010 [25]C. Sá, C. Soares, A. M. Jorge, P. Azevedo and J. Costa, “Mining Association Rules for Label Ranking,” Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Vol. 6635, pp.432-443, 2011 [26]W. G. Teng, M. J. Hsieh and M. S. Chen, “On the Mining of Substitution Rules for Statistically Dependent Items,” IEEE International Conference on Data Mining, pp. 442-449, 2002 [27]G. I. Webb, “Self-Sufficient Itemsets: An Approach to Screening Potentially Interesting Associations between Items,” ACM Transactions on Knowledge Discovery from Data, Vol. 4, No.1, pp.1-20, 2010 [28]K. Wang, Y. He and J. Han, “Pushing Support Constraints into Association Rules Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No.3, pp. 642 - 658, 2003 [29]X. Wu, C. Zhang, and S. Zhang, “Efficient Mining of both Positive and Negative Association Rules,” ACM Transactions on Information Systems, Vol.22, No.3, pp. 381-405, 2004 [30]N. Zhong, Y. Li and S. T. Wu, “Effective Pattern Discovery for Text Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1, pp. 30-44, 2012 [31]Y. Zhao, H. Zhang, S. Wu, J. Pei, L. Cao, C. Zhang and H. Bohlscheid, “Debt Detection in Social Security by Sequence Classification Using both Positive and Negative Patterns,” European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Vol. 5782, pp.648-663, 2009 [32]Z. Zheng, Y. Zhao, Z. Zuo and L. Cao, “An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns,” Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp.262-273, 2010

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0829112-165931.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS