Responsive image
博碩士論文 etd-0911112-051949 詳細資訊
Title page for etd-0911112-051949
論文名稱
Title
增進利潤探勘隱私保護效能之研究
A Study on Improving Efficiency of Privacy-Preserving Utility Mining
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
148
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-07-24
繳交日期
Date of Submission
2012-09-11
關鍵字
Keywords
利潤探勘、遞減式、漸進式、準大項目集、基因演算法、隱私保護
Utility mining, Pre-large itemset, Incremental, Decremental, Privacy preserving, Genetic algorithm
統計
Statistics
本論文已被瀏覽 5755 次,被下載 518
The thesis/dissertation has been browsed 5755 times, has been downloaded 518 times.
中文摘要
近年來,利潤探勘演算法被設計用來衡量量化資料庫裡商品的利潤,以用來找出高利潤的項目集。透過商品的利潤與價格等因素,更有效的知識將被探勘出來以提供給管理者。現今的方法,大多以批次的方式來找出高利潤項目集。在現實應用中,交易資料可能隨時地被新增、刪除、或修改。以批次的方式來處理,則必須重掃整個更新後的資料庫,以用來維護更新後的知識。在本篇論文的第一部分,基於準大法則(Pre-large concepts),我們提出了有效的漸進式及遞減式的利潤探勘演算法,當交易資料被新增或移除時,我們設計的演算法能有效的更新並維護我們已探勘出的高利潤項目集。我們首先必須判斷每個項目集在原始資料庫中是否為大(高)加權交易利潤項目集,準大加權交易利潤項目或小(不存在於發現的知識中)加權交易利潤項目集,並將其分成三個部分與九種情況做討論,每個部份將根據我們所設計的演算法去進行資料的維護。基於準大法則的概念,我們只需要針對少量項目集進行資料庫重新掃描便可進行高利潤項目集的維護。
然而,在利潤探勘資料蒐集與數據傳播的過程中,可能引發隱私資料外洩風險。敏感資料或個人隱私在分享或是發佈時,應該加以隱匿受到保護。也因此,隱私保護之利潤探勘成為近年來重要的研究議題之一。在本篇論文第二部分,我們提出兩種透過新增及刪除資料的方法來隱藏敏感項目集。在這二個方法裡,我們提出一個演化式的隱私保護利潤探勘方法,透過遺傳演算法找出最合適的交易集做為新增或刪除之交易,進而隱藏住敏感項目集。此方法設計了三個變數來設計一個彈性的評估函數,並可根據使用者的喜好彈性地分配此三個變數的權重。透過在第一部分提出的準大加權交易利潤項目集的概念,我們可以減少遺傳演算法在評估染色體時重新掃描資料庫的時間成本,以加快評估的過程。最後,我們也透過實驗結果來評估演算法的效能。
Abstract
Utility mining algorithms have recently been proposed to discover high utility itemsets from a quantitative database. Factors such as profits or prices are concerned in measuring the utility values of purchased items for revealing more useful knowledge to managers. Nearly all the existing algorithms are performed in a batch way to extract high utility itemsets. In real-world applications, transactions may, however, be inserted, deleted or modified in a database. The batch mining procedure requires more computational time for rescanning the whole updated database to maintain the up-to-date knowledge. In the first part of this thesis, two algorithms for data insertion and data deletion are respectively proposed for efficiently updating the discovered high utility itemsets based on pre-large concepts. The proposed algorithms firstly partition itemsets into three parts with nine cases according to whether they are large (high), pre-large or small transaction-weighted utilization in the original database. Each part is then performed by its own procedure to maintain and update the discovered high utility itemsets. Based on the pre-large concepts, the original database only need to be rescanned for much fewer itemsets in the maintenance process of high utility itemsets.
Besides, the risk of privacy threats usually exists in the process of data collection and data dissemination. Sensitive or personal information are required to be kept as private information before they are shared or published. Privacy-preserving utility mining (PPUM) has thus become an important issue in recent years. In the second part of this thesis, two evolutionary privacy-preserving utility mining algorithms to hide sensitive high utility itemsets in data sanitization for inserting dummy transactions and deleting transactions are respectively proposed. The two evolutionary privacy-preserving utility mining algorithms find appropriate transactions for insertion and deletion in the data-sanitization process. They adopt a flexible evaluation function with three factors. Different weights are assigned to the three factors depending on users’ preference. The maintenance algorithms proposed in the first part of this thesis are also used in the GA-based approach to reduce the cost of rescanning databases, thus speeding up the evaluation process of chromosomes. Experiments are conducted as well to evaluate the performance of the proposed algorithms.
目次 Table of Contents
論文審定書 i
致謝 ii
摘要 iii
Abstract iv
Contents v
List of Figures viii
List of Tables xi
CHAPTER 1 Introduction 1
1.1 Motivation 1
1.2 Contributions 4
1.3 Organization of Thesis 5
CHAPTER 2 Related Work 7
2.1 Association Rule Mining 7
2.2 The Concept of Pre-large Itemsets 10
2.3 High Utility Mining 14
2.4 Genetic Algorithms 15
2.5 Data Sanitization 18
CHAPTER 3 The Maintenance Algorithms for Mining High Utility Itemsets in Dynamic Database 20
3.1 Introduction 20
3.2 The Maintenance Algorithm for Transaction Insertion 20
3.2.1 Notation 22
3.2.2 Theoretical Foundation 24
3.2.3 The Proposed Incremental High Utility Mining Algorithm 26
3.2.4 An Illustrated Example 32
3.2.5 Experimental Results 42
3.3 The Maintenance Algorithm for Transaction Deletion 52
3.3.1 Notation 53
3.3.2 Theoretical Foundation 55
3.3.3 The Proposed Decremental High Utility Mining Algorithm 57
3.3.4 An Illustrated Example 62
3.3.5 Experimental Results 71
CHAPTER 4 The GA-based Algorithms for Privacy Preserving Utility Mining 80
4.1 Introduction 80
4.2 The Representation of Chromosomes 80
4.3 Fitness Function 82
4.4 The Pre-large Concepts and Proposed Sliding Count 87
4.5 Genetic Operators 89
4.5.1 Crossover 89
4.5.2 Mutation 89
4.5.3 Selection 90
4.6 A GA-based Approach for Privacy Preserving Utility Mining through Transaction Insertion 90
4.6.1 Notation 91
4.6.2 The Flowchart of Proposed Algorithm 92
4.6.3 The Proposed PPUM Approach through Transactions Insertion 93
4.6.4 An Illustrated Example 97
4.6.5 Experimental Results 106
4.7 A GA-based Approach for Privacy Preserving in Utility Mining through Transaction Deletion 110
4.7.1 Notation 111
4.7.2 The Flowchart of Proposed Algorithm 112
4.7.3 The Proposed GA-based Approach through Transaction Deletion 113
4.7.4 An Illustrated Example 117
4.7.5 Experimental Results 123
CHAPTER 5 Conclusion and Future Work 129
References 131
參考文獻 References
[1] R. Agrawal, T. Imieliński and A. Swami, "Mining association rules between sets of items in large databases," ACM SIGMOD International Conference on Management of data, pp. 207-216, 1993.
[2] R. Agrawal and R. Srikant, "Fast agorithms for mining association rules in large databases," The International Conference on Very Large Data Bases, pp. 487-499, 1994.
[3] M. Atallah, A. Elmagarmid, M. Ibrahim, E. Bertino and V. Verykios, "Disclosure limitation of sensitive rules," Workshop on Knowledge and Data Engineering Exchange, pp. 45-52, 1999.
[4] F. Berzal, J. C. Cubero, N. Marin and J. M. Serrano, "TBAR: An efficient method for association rule mining in relational databases," Data and Knowledge Engineering, vol. 37, pp. 47-64, 2001.
[5] R. Chan, Q. Yang and Y. D. Shen, "Mining high utility itemsets," IEEE International Conference on Data Mining, pp. 19-26, 2003.
[6] D. W. Cheung, J. Han, V. Ng and C. Y. Wong, "Maintenance of discovered association rules in large databases: an incremental updating technique," The International Conference on Data Engineering, pp. 106-114, 1996.
[7] D. W. L. Cheung, S. D. Lee and B. Kao, "A general incremental technique for maintaining discovered association rules," The International Conference on Database Systems for Advanced Applications, pp. 185-194, 1997.
[8] E. Dasseni, V. S. Verykios, A. K. Elmagarmid and E. Bertino, "Hiding association rules by using confidence and support," The International Workshop on Information Hiding, pp. 369-383, 2001.
[9] S. Elie, S. Takanori and A. Z. Lofti, Genetic Algorithms and Fuzzy Logic Systems: Soft Computing Perspectives, 1997.
[10] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama, "Mining optimized association rules for numeric attributes," ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 182-191, 1996.
[11] D. E. Goldberg, Genetic algorithms in search, optimization and machine learning, 1989.
[12] J. Han and Y. Fu, "Discovery of multiple-level association rules from large databases," The International Conference on Very Large Data Bases, pp. 420-431, 1995.
[13] J. H. Holland, Adaptation in natural and artificial systems, 1992.
[14] T. P. Hong, C. W. Lin, C. C. Chang and S. L. Wang, "Hiding sensitive itemsets by inserting dummy transactions," IEEE International Conference on Granular Computing, pp. 246-249, 2011.
[15] T. P. Hong and C. Y. Wang, "Maintenance of association rules using pre-large itemsets," Intelligent Databases: Technologies and Applications, pp. 44-60, 2007.
[16] T. P. Hong, C. Y. Wang and Y. H. Tao, "A new incremental data mining algorithm using pre-large itemsets," Intelligent Data Analysis, vol. 5, pp. 111-129, 2001.
[17] T. P. Hong, K. T. Yang, C. W. Lin and S. L. Wang, "Evolutionary privacy-preserving data mining," World Automation Congress, pp. 1-7, 2010.
[18] IBM, "IBM quest data mining project: Quest synthetic data generation code," Available: http://www.almaden.ibm.com/cs/quest/syndata.html.
[19] Y. C. Li, J. S. Yeh and C. C. Chang, "Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets," Lecture Notes in Computer Science, vol. 3614, pp. 484-484, 2005.
[20] Y. C. Li, J. S. Yeh and C. C. Chang, "Efficient algorithms for mining share-frequent itemsets," The World Congress of International Fuzzy Systems Association, pp. 539-543, 2005.
[21] Y. C. Li, J. S. Yeh and C. C. Chang, "A fast algorithm for mining share-frequent itemsets," Lecture Notes in Computer Science, pp. 551-560, 2005.
[22] C. W. Lin, T. P. Hong and W. H. Lu, "The Pre-FUFP algorithm for incremental mining," Expert Systems with Applications, vol. 36, pp. 9498-9505, 2009.
[23] C. W. Lin, G. C. Lan and T. P. Hong, "An incremental mining algorithm for high utility itemsets," Expert Systems with Applications, vol. 39, pp. 7173-7180, 2012.
[24] Y. Liu, W. K. Liao and A. Choudhary, "A fast high utility itemsets mining algorithm," The International Workshop on Utility-based Data Mining, pp. 90-99, 2005.
[25] Y. Liu, W. K. Liao and A. Choudhary, "A two-phase algorithm for fast discovery of high utility itemsets," Lecture Notes in Computer Science, vol. 3518, pp. 141-143, 2005.
[26] H. Mannila, H. Toivonen and A. I. Verkamo, "Efficient algorithm for discovering association rules," The AAAI Workshop on Knowledge Discovery in Databases, 1994.
[27] Z. Michalewicz, Genetic algorithms + data structures = evolution programs, 1996.
[28] Microsoft, "Example database foodmart of Microsoft analysis services," Available: http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.
[29] M. Mitchell, An introduction to genetic algorithms, 1996.
[30] D. E. O'Leary, "Knowledge discovery as a threat to database security," Knowledge Discovery in Databases, pp. 507-516, 1991.
[31] J. S. Park, M. S. Chen and P. S. Yu, "Using a hash-based method with transaction trimming for nining association rules," IEEE Transactions on Knowledge and Data Engineering, vol. 9, pp. 813-825, 1997.
[32] R. Srikant and R. Agrawal, "Mining generalized association rules," The International Conference on Very Large Data Bases, pp. 407-419, 1995.
[33] R. Srikant and R. Agrawal, "Mining quantitative association rules in large relational tables," ACM SIGMOD international conference on Management of data, pp. 1-12, 1996.
[34] H. Yao and H. J. Hamilton, "Mining itemset utilities from transaction databases," Data and Knowledge Engineering, vol. 59, pp. 603-626, 2006.
[35] H. Yao, H. J. Hamilton and C. J. Butz, "A foundational approach to mining itemset utilities from databases," The SIAM International Conference on Data Mining, pp. 211-225, 2004.
[36] J. S. Yeh and P. C. Hsu, "HHUIF and MSICF: Novel algorithms for privacy preserving utility mining," Expert Systems with Applications, vol. 37, pp. 4779-4786, 2010.
[37] J. S. Yeh, P. C. Hsu and M. H. Wen, "Novel algorithms for privacy preserving utility mining," The International Conference on Intelligent Systems Design and Applications pp. 291-296, 2008.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code