國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,運用MapReduce架構之平行遺傳模糊資料探勘,Parallel Genetic-Fuzzy Mining with MapReduce Architecture

論文名稱 Title	運用MapReduce架構之平行遺傳模糊資料探勘 Parallel Genetic-Fuzzy Mining with MapReduce Architecture
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	103 學年度第 2 學期 The spring semester of Academic Year 103	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	87
研究生 Author	劉育瑒 Yu-Yang Liu
指導教授 Advisor	洪宗貝 Tzung-Pei Hong
召集委員 Convenor	陳俊豪 Chun-Hao Chen
口試委員 Advisory Committee	李宗南, 蔡崇煒 Chung-Nan Lee; Chun-Wei Tsai
口試日期 Date of Exam	2015-07-24	繳交日期 Date of Submission	2015-08-24
關鍵字 Keywords	遺傳演算法、MapReduce、資料預處理、模糊資料挖掘、FP-growth MapReduce, genetic algorithm, FP-growth, fuzzy mining, data preprocessing
統計 Statistics	本論文已被瀏覽 5737 次，被下載 38 次 The thesis/dissertation has been browsed 5737 times, has been downloaded 38 times.

中文摘要
模糊資料探勘技術能有效地透過將數量資訊轉換為模糊函式的方法，找出資料庫中隱藏的語意關聯規則，但良好的模糊函式是決定模糊資料探勘最終關聯規則品質的重要關鍵，因此過去有許多研究提出使用遺傳演算法訓練並提升模糊函式之質量來有效提升關聯規則之質量。但這類方法仍有執行時間過長的問題，且在模糊函式訓練完成後，對於頻繁項目集的挖掘同樣是一件相當費時的程序。因此在本篇論文中，我們提出一系列以MapReduce為基礎的演算法來加快遺傳模糊探勘的整體速度。本篇論文的貢獻可分為三部分，包括原始資料的預處理、使用遺傳演算法訓練模糊函式以及模糊關聯規則的推導，所有程序都使用MapReduce作分散式處理；資料的預處理除了能將其轉換為MapReduce架構所需之key-value格式外，更進一步將各自物品的數值資訊統整起來，有效的減少多餘的資料庫掃描次數；針對遺傳模糊函式訓練的部分，最耗時的fitness計算將被設計為分散式計算；最後，本研究設計了一個採用分散式FP-growth的方法來提升尋找模糊關聯規則的執行效率。單機與MapReduce版本的效能將會在實驗中比較及討論，其結果顯示本論文所提出的分散式方法能有效的縮短整體模糊探勘的執行時間。
Abstract
Fuzzy data mining can successfully find out hidden linguistic association rules by transforming quantity information into fuzzy membership values. In the derivation process, good membership functions play a key role in achieving the quality of finial results. In the past, some researches were proposed to train membership functions by genetic algorithms and could indeed improve the quality of found rules. Those kinds of methods were, however, suffered from the long execution time in the training phase. Besides, after appropriate fuzzy membership functions are found, mining out the frequent itemsets from them is also a very time-consuming process as traditional data mining. In this thesis, we thus propose a series of approaches based on the MapReduce architecture to speed up the GA-fuzzy mining process. The contributions can be divided into three parts, including data preprocessing, membership-function training by GA, and fuzzy association-rule derivation. All are performed by MapReduce. For data preprocessing, the proposed approach can not only transform the original data into key-value format to fit the requirement of MapReduce, but also efficiently reduce the redundant database scan by joining the quantities into lists. For membership-function training by GA, the fitness evaluation, which is the most time-costly process, is distributed to shorten the execution time. At last, a distributed fuzzy rule mining approach based on FP-growth is designed to improve the time efficiency of finding fuzzy association rules. The performance between using a single processor and using MapReduce will be compared and discussed from experiments and the results show that our approaches can efficiently reduce the execution time of the whole process.

目次 Table of Contents
論文審定書 i 致謝 ii 摘要 iii Abstract iv Contents v List of Figures vii List of Table viii CHAPTER 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 3 1.3 Organization 4 CHAPTER 2 Related Works 6 2.1 Genetic Fuzzy Mining 6 2.2 MapReduce 8 2.3 Parallel FP-Growth 9 CHAPTER 3 Efficient Data Preprocessing for Fuzzy Mining with MapReduce (EDPFM-MR) 12 3.1 Problem Statement and Definitions 13 3.2 Proposed Algorithm, EDPFM-MR 14 3.3 An Example of Using EDPFM-MR 16 CHAPTER 4 Parallel Genetic Fuzzy Membership Function Training with MapReduce (PGFMFT-MR) 23 4.1 Problem Statement and Definitions 25 4. 1. 1 Chromosome Representation 25 4. 1. 2 Initial Population 26 4. 1. 3 Fitness Function 26 4. 1. 4 Genetic Operators 30 4.2 Proposed Algorithm, PGFMFT-MR 31 4.3 An Example of Using PGFMFT-MR 34 CHAPTER 5 Parallel FP-Growth for Fuzzy Mining with MapReduce (PFPGFM-MR) 46 5.1 Problem Statement and Definitions 47 5.2 The Proposed PFPGFM-MR 48 5.3 An Example of Using PFPGFM-MR 51 CHAPTER 6 Experimental Evaluation 62 6.1 Experimental Datasets 62 6.2 Experimental Results of EDPFM-MR 62 6.3 Experimental Results of PGFMFT-MR 65 6.4 Experimental Result of PFPGFM-MR 67 CHAPTER 7 Conclusion 71 References 73

參考文獻 References
[1] Alejandro Peña-Ayala, “Educational data mining: a survey and a data mining-based analysis of recent works,” Expert Systems with Applications, vol. 41, no. 4, pp. 1432-1462, 2014. [2] Chun-Wei Tsai, Chin-Feng Lai, Ming-Chao Chiang and Laurence T. Yang, “Data mining for internet of things: a survey,” IEEE Communications Surveys and Tutorials, vol. 16, no. 1, pp. 77-79, 2014. [3] Sanjeev Pippal, Lakshay Batra, Akhila Krishna, Hina Gupta and Kunal Arora, “Data mining in social networking sites: a social media mining approach to generate effective business strategies,” International Journal of Innovations and Advancement in Computer Science, vol. 3, no. 2, pp. 22-27, 2014. [4] Ramakrishnan Srikant and Rakesh Agrawal, “Mining quantitative association rules in large relational tables,” ACM Special Interest Group on Management of Data, vol. 25, no. 2, pp. 1-12, 1996. [5] Huizhen Liu, Shangping Dai and Hong Jiang, “Quantitative association rules mining algorithm based on matrix,” IEEE International Conference on Computational Intelligence and Software Engineering, pp. 1-4, 2009. [6] Tzung-Pei Hong and Chai-Ying Lee, “Induction of fuzzy rules and membership functions from training examples,” Fuzzy Sets and Systems, vol. 84, no. 2, pp. 33-47, 1996. [7] Tzung-Pei Hong, Chan-Sheng Kuo and Sheng-Chai Chi, “Mining association rules from quantitative data,” Intelligent Data Analysis, vol. 3, no. 5, pp. 363-376, 1999. [8] Tzung-Pei Hong, Kuei-Ying Lin and Been-Chian Chien, “Mining fuzzy multiple-level association rules from quantitative data,” Applied Intelligence, vol. 18, no.1, pp. 79-90, 2003. [9] Hai Jin, Jianhua Sun, Hao Chen and Zongfen Han, “A fuzzy data mining based intrusion detection model,” IEEE International Workshop on Future Trends of Distributed Computing Systems, pp. 191-197, 2004. [10] Tzung-Pei Hong and Tsung-Ching Lin, “Mining complete fuzzy frequent itemsets by tree structures,” IEEE International Conference on Systems Man and Cybernetics, pp. 563-567, 2010. [11] Mehmet Kaya and Reda Alhajj, “A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining,” IEEE International Conference on Fuzzy Systems, vol. 2, pp. 881-886, 2003. [12] Tzung-Pei Hong, Chun-Hao Chen, Yu-Lung Wu and Yeong-Chyi Lee, “A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions,” Soft Computing, vol. 10, no. 11, pp. 1091-1101, 2006. [13] Chun-Hao Chen, Tzung-Pei Hong and Vincent Shin-Mu Tseng, “A modified approach to speed up genetic-fuzzy data mining with divide-and-conquer strategy,” IEEE Evolutionary Computation, pp. 1-6, 2007. [14] Tzung-Pei Hong and Yeong-Chyi Lee and Min-Thai Wu, “An effective parallel approach for genetic-fuzzy data mining,” Expert Systems with Applications, vol. 41, pp. 655-662, 2004. [15] Chun-Wei Lin, Tzung-Pei Hong, Wen-Hsiang Lu, “Linguistic data mining with fuzzy FP-trees,” Expert Systems with Applications, vol. 37, no. 6, pp. 4560-4567, 2010 [16] Tzung-Pei Hong, Chun-Hao Chen, Yeong-Chyi Lee and Yu-Lung Wu, “Genetic-fuzzy data mining with divide-and-conquer strategy,” IEEE Transactions on Evolutionary Computation, vol. 12, no.2, pp. 252-265, 2008. [17] Jeffrey Dean and Sanjay Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008. [18] Jian Yu, Salvatore Greco, Pawan Lingras, Guoyin Wang and Andrzej Skowron, “The high-activity parallel implementation of data preprocessing based on MapReduce,” Rough Set and Knowledge Technology, vol. 6401, pp. 646-654, 2010. [19] Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung and Bongki Moon, “Parallel data processing with MapReduce: a survey,” ACM Special Interest Group on Management of Data, vol. 40, no. 4, pp. 11-20, 2011. [20] Sherif Sakr, Anna Liu and Ayman G. Fayoumi, “The family of MapReduce and large-scale data processing systems,” Communications of the ACM, vol. 46, 2013. [21] Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita and Sandeep Tata, “Column-oriented storage techniques for MapReduce,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 419-429, 2011. [22] Feng Li, Beng Chin Ooi, M. Tamer Özsu and Sai Wu, “Distributed data management using MapReduce,” ACM Computing Surveys, vol. 46, no. 3, 2014. [23] Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad and Jens Dittrich, “RAFT at work: speeding-up MapReduce applications under task and node failures,” ACM Special Interest Group on Management of Data, pp. 1225-1228, 2011. [24] Li Liu, Eric Li, Yimin Zhang and Zhizhong Tang, “Optimization of frequent itemset mining on multiple-core processor,” International Conference on Vary Large Data Bases, pp. 1275-1285, 2007. [25] Osmar R. Zaïane, Mahammad El-Hajj and Paul Lu, “Fast parallel association rule mining without candidacy generation,” IEEE International Conference on Data Mining, pp. 665-668, 2001. [26] Kawuu Wei-Chen Lin and Yu-Chin Luo, “Efficient strategies for many-task frequent pattern mining in cloud computing environments, ” Journal of Knowledge-Based System, vol. 49, pp. 10-21, 2013. [27] Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang and Edward Y. Chang, “PFP: parallel FP-growth for query recommendation,” ACM Conference on Recommender Systems, pp. 107-114, 2008. [28] Le Zhou, Zhiyong Zhong, Jin Chang, Junjie Li, Joshua Zhexue Huang and Shengzhong Feng, “Balanced parallel FP-growth with MapReduce, IEEE Youth Conference on Information Computing and Telecommunications, pp. 243-246, 2010. [29] Alexandre Parodi and Pierre Bonelli, “A new approach of fuzzy classifier systems,” Proceedings of Fifth International Conference on Genetic Algorithms, pp. 223-230, 1993. [30] Tzung-Pei Hong, Yu-Yang Liu, Min-Thai Wu and Chun-Wei Tsai “Efficient data preprocessing for genetic-fuzzy mining with MapReduce,” IEEE International Conference on Consumer Electronics - Taiwan, 2015.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0724115-200644.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS