國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,透過連續空間樣式歸納法來加強啟發式演算法應用在分群問題的效能,Continuous Space Pattern Reduction Enhanced Metaheuristics for Clustering

論文名稱 Title	透過連續空間樣式歸納法來加強啟發式演算法應用在分群問題的效能 Continuous Space Pattern Reduction Enhanced Metaheuristics for Clustering
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	59
研究生 Author	林子園 Tzu-Yuan Lin
指導教授 Advisor	江明朝 Ming-Chao Chiang
召集委員 Convenor	李宗南 Chung-Nan Lee
口試委員 Advisory Committee	洪宗貝, 蔡崇煒 Tzung-Pei Hong; Chun-Wei Tsai
口試日期 Date of Exam	2012-07-27	繳交日期 Date of Submission	2012-09-07
關鍵字 Keywords	連續空間樣式歸納法、啟發式演算法、分群 Pattern reduction over continuous space, clustering, metaheuristics
統計 Statistics	本論文已被瀏覽 5717 次，被下載 0 次 The thesis/dissertation has been browsed 5717 times, has been downloaded 0 times.

中文摘要
樣式歸納法透過偵測出分群演算法在收斂過程中不再改變分群關係的樣式，並且省略掉與這些樣式相關的計算，藉此減少分群演算法的計算時間。很明顯地這是一個有效率的方法，然而這樣的樣式歸納法受限於它只能夠解決以二元或整數編碼的問題，例如組合式最佳化問題。針對這個限制，我們的研究將著重於發展一個新型態的樣式歸納法，稱作連續空間樣式歸納法，用來解決傳統樣式歸納法不適用於實數編碼的限制。傳統的樣式歸納法包含兩個運算子，偵測及壓縮。不同於傳統的樣式歸納法，新型態的樣式歸納法將偵測分成兩個步驟，第一步驟為偵測出哪些部份解能夠被壓縮，第二步驟為確保這些部份解是否能夠當成最終解，藉此判斷之後的計算是否為多餘的計算進而壓縮起來。為了評估新型態樣式歸納法的效能，我們將結合啟發式演算法來解決分群問題當作效能評估。
Abstract
The pattern reduction (PR) algorithm we proposed previously, which works by eliminating patterns that are unlikely to change their membership during the convergence process, is obviously one of the most efficient methods for reducing the computation time of clustering algorithms. However, it is limited to problems with solutions that can be binary or integer encoded, such as combinatorial optimization problems. As such, this study is aimed at developing a new pattern reduction algorithm, called pattern reduction over continuous space, to get rid of this limitation. Like the PR, the proposed algorithm consists of two operators: detection and compression. Unlike the PR, the detection operator is divided into two steps. The first step is aimed at finding out subsolutions that can be considered as the candidate subsolutions for compression. The second step is performed to ensure that the candidate subsolutions have reached the final state so that any further computation is eventually a waste and thus can be compressed. To evaluate the performance of the proposed algorithm, we apply it to metaheuristics for clustering.

目次 Table of Contents
論文審定書i 誌謝iii 摘要v Abstract vi List of Figures ix List of Tables x Chapter 1 簡介1 1.1 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 論文的貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 相關研究4 2.1 問題. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 組合式最佳化. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 分群問題. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 相關演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 k-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 啟發式演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2.1 基因演算法. . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 混合式啟發式演算法. . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 效能提昇的方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 混合式. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 分群問題歸納策略. . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2.1 維度歸納. . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2.2 質心歸納. . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2.3 樣式歸納. . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3 研究方法21 3.1 演算法壓縮對象. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 演算法啟用時機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 演算法流程及實作. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 範例. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 4 實驗結果32 4.1 參數設定以及所使用的資料集. . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 實驗一. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 實驗二. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 實驗三. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.5 實驗四. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.6 實驗五. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 5 結論以及未來改進的方向41 5.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 未來改進的方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Bibliography 43

參考文獻 References
[1] W. J. Welch, “Algorithmic complexity: three NP-hard problems in computational statistics,” Journal of Statistical Computation and Simulation, vol. 15, no. 1, pp. 17–25, 1982. [2] P. Brucker, “On the complexity of clustering problems,” Lecture Notes in Economics and Mathematical Systems, pp. 45–54, 1978. [3] K. Consulting, “Electronic discovery costs: Managing systems and information to control costs and improve results,” pp. 1–12, Kahn Consulting Incorporation, 2005. [4] P. Ferragina and A. Gulli, “A personalized search engine based on web-snippet hierarchical clustering,” in Special interest tracks and posters, pp. 801–810, 2005. [5] F. Giannotti, M. Nanni, D. Pedreschi, and F. Samaritani, “Webcat: Automatic categorization of web search results,” in Social, Emotional and Behavioural Difficulties, pp. 507– 518, 2003. [6] C. Blum and A. Roli, “Metaheuristics in combinatorial optimization: Overview and conceptual comparison,” ACM Computing Surveys, vol. 35, no. 3, pp. 268–308, 2003. [7] J. Z. C. Lai, Y. C. Liaw, and J. Liu, “A fast VQ codebook generation algorithm using codeword displacement,” Pattern Recognition, vol. 41, pp. 315–319, Jan 2008. [8] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” Sep 1999. [9] R. X. R. Xu and D. I. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005. [10] J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means clustering algorithm,” Royal Statistical Society, vol. 28, no. 1, pp. 100–108, 1979. [11] J. H. Holland, Adaptation in natural and artificial systems. Cambridge, MA, USA: MIT Press, 1992. [12] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. 1st ed., 1989. [13] Michalewicz, Genetic algorithms + Data structures = Evolution program. Springer, 1992. [14] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of IEEE International Conference on Neural NetWorks, vol. 4, pp. 1942–1948, 1995. [15] K. Krishna and M. Narasimha Murty, “Genetic k-means Algorithm,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, pp. 433–439, Jun 1999. [16] S. Bandyopadhyay and U. Maulik, “An evolutionary technique based on k-means algorithm for optimal clustering in RN,” Information Sciences—Applications: An International Journal, vol. 146, pp. 221–237, Oct 2002. [17] M. C. Chiang, C. W. Tsai, and C. S. Yang, “A time-efficient pattern reduction algorithm for k-means clustering,” Information Sciences, vol. 181, pp. 716–731, Feb 2011. [18] C. Ding and X. He, “k-means clustering via principal component analysis,” in Proceedings of the twenty-first international conference on Machine learning, (New York, NY, USA), p. 29, ACM, 2004. [19] T. Kaukoranta, P. Fr‥anti, and O. Nevalainen, “A fast exact GLA based on code vector activity detection,” IEEE Transactions on Image Processing, vol. 9, no. 8, pp. 1337–1342, 2000. [20] S. Martello and P. Toth, Knapsack problems: algorithms and computer implementations. 1990. [21] J. Pepper, B. Golden, and E. Wasil, “Solving the traveling salesman problem with annealing-based heuristics: a computational study,” IEEE Transactions on Systems, Man and Cybernetics, Part A : Systems and Humans, vol. 32, pp. 72–77, Jan 2002. [22] D. L. Applegate, R. E. Bixby, V. Chvatal, and W. J. Cook, The Traveling Salesman Problem: A Computational Study. Princeton University Press, Jan 2007. [23] M. Pilski, P. Bouvry, and F. Seredynski, “Modern metaheuristics for function optimization problem,” in Intelligent Information Systems, pp. 466–470, 2005. [24] C. W. Tsai, S. P. Tseng, M. C. Chiang, and C. S. Yang, “A framework for accelerating metaheuristics via pattern reduction,” in Proceedings of the annual conference on Genetic and evolutionary computation, no. 2, pp. 293–294, ACM, 2010. [25] C. W. Tsai, C. S. Yang, and M. C. Chiang, “A time efficient pattern reduction algorithm for k-means based clustering,” IEEE International Conference on Systems, Man and Cybernetics, pp. 504–509, 2007. [26] S. Forrest and M. Mitchell, “Relative building-block fitness and the building-block hypothesis,” 1993. [27] J. J. Grefenstette, “Deception considered harmful,” in Foundations of Genetic Algorithms 2, pp. 75–91, Morgan Kaufmann, 1993. [28] R. S. Michalski, F. Esposito, and L. Saitta, “Learnable evolution model: Evolutionary processes guided by machine learning,” in Machine Learning, pp. 9–40, 2000. [29] J. T. Tou and R. C. Gonzalez, “Pattern recognition principles,” Image Rochester NY, vol. 7, p. 377, 1974. [30] B. S. Everitt, “Unresolved problems in cluster analysis,” Biometrics, vol. 35, pp. 169–181, Mar 1979. [31] R. Haralick, “Automatic remote sensor image processing,” in Digital Picture Analysis, vol. 11 of Topics in Applied Physics, pp. 5–63, Springer Berlin / Heidelberg, 1976. [32] C. J. M. William J. Frawley, Gregory Piatetsky-Shapiro, “Knowledge discovery in databases: An overview,” Association for the Advancement of Artificial Intelligence, vol. 13, no. 3, pp. 57–70, 1992. [33] V. V. Raghavan and K. Birchard, “A clustering strategy based on a formalism of the reproductive process in natural systems,” Special Interest Group on Information Retrieval, vol. 14, pp. 10–22, Sep 1979. [34] C. H. Chen, Statistical pattern recognition. Rochelle Park, N.J., Hayden, 1973. [35] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transaction on ”Pattern Analysis and Machine Intelligence”, pp. 224–227, Apr 1979. [36] J. B. Macqueen, “Some methods for classification and analysis of multivariate observations,” in Procedings of the Fifth Berkeley Symposium on Math, Statistics, and Probability, vol. 1, pp. 281–297, University of California Press, 1967. [37] H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, vol. 22, pp. 400–407, Sep 1951. [38] F. Glover, “Tabu search; part i,” ORSA Journal on Computing, vol. 1, no. 3, pp. 190–206, 1989. [39] F. Glover, “Tabu search; part ii,” ORSA Journal on Computing, vol. 2, no. 1, pp. 4–32, 1990. [40] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983. [41] V. Cˇ erny’, “Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm,” Optimization Theory and Applications, vol. 45, pp. 41–51, Jan 1985. [42] E. Gelenbe, A. Ghanwani, and V. Srinivasan, “Improved neural heuristics for multicast routing,” IEEE Journal on Selected Areas in Communications, vol. 15, pp. 147–155, Feb 1997. [43] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization by a colony of cooperating agents,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 26, pp. 29–41, Feb 1996. [44] M. Dorigo and L. Gambardella, “Ant colony system: a cooperative learning approach to the traveling salesman problem,” IEEE Transactions on Evolutionary Computation, vol. 1, pp. 53–66, Apr 1997. [45] D. T. Pham, A. Ghanbarzadeh, E. Koc, S. Otri, S. Rahim, and M. Zaidi, “The bees algorithm, a novel tool for complex optimisation problems,” in Proceedings of the International Virtual Conference on Intelligent Production Machines and Systems, pp. 454–459, Elsevier, 2006. [46] J. H. Holland, Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975. [47] A. Brindle, Genetic Algorithms for Function Optimization. Phd thesis, University of Alberta, 1981. [48] D. E. Goldberg and K. Deb, “A comparative analysis of selection schemes used in genetic algorithms,” in Foundations of Genetic Algorithms, pp. 69–93, 1991. [49] S. Eschrich, J. Ke, L. Hall, and D. Goldgof, “Fast accurate fuzzy clustering through data reduction,” IEEE Transactions on Fuzzy Systems, vol. 11, pp. 262–270, Apr 2003. [50] M. Omran and S. Al-Sharhan, “Barebones particle swarm methods for unsupervised image classification,” in IEEE Congress on Evolutionary Computation, pp. 3247–3252, Sep 2007.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.137.218.215 論文開放下載的時間是校外不公開 Your IP address is 3.137.218.215 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS