Responsive image
博碩士論文 etd-0619117-161041 詳細資訊
Title page for etd-0619117-161041
論文名稱
Title
自建構分群演算法之研究
Some Variants of Self-Constructing Clustering
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
62
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-07-12
繳交日期
Date of Submission
2017-07-19
關鍵字
Keywords
資料探勘、自建構式分群、資料分群、二次式規劃、Z-距離、訓練週期、迭代運算
training cycle, iterative computing, quadratic programming, z-distance, data clustering, data mining, self-constructing clustering
統計
Statistics
本論文已被瀏覽 5692 次,被下載 27
The thesis/dissertation has been browsed 5692 times, has been downloaded 27 times.
中文摘要
Lee & Ouyang 在2003年發展了自建構式分群法(SCC),其不需要由使用者預先給予分群數目,而SCC在演算法中只會對整個資料集進行一輪的運算。然而,一旦資料點被分配到一群後,即不會再做更動,這樣的行為可能會造成配置上的錯誤以及使所產生的分群受到資料輸入順序的影響。另一方面,在距離計算上,每一維度的權重都相同,在特定的應用當中可能會不大適合。
本論文提出兩種SCC之改進版本,SCC-I與SCC-IW。SCC-I能夠使用兩輪以上的訓練週期並且在每次訓練週期中資料點被允許再重新分配入其他群中。而當該輪訓練週期的內容不再改變時,即達到合適群聚數,演算法終止。如此一來,所產生的群聚可以有較少的機會被資料輸入順序所影響。而SCC-IW則是SCC-I的延伸版,其再分群過程中給予每一資料維度不同的權重,而這些權重由資料中適應學習產生。這個做法在不同維度有其相關性時,是具有成效的。透過一系列不同的現實資料集實驗與其結果,可以展示出本論文提出方法之成效。
Abstract
Self-constructing clustering (SCC) was proposed, in which the number of clusters is not required to be specified in advance by the user. For a given set of instances, SCC performs only one training cycle on the instances, so it is fast. However, once an instance has been assigned to a cluster, the assignment won’t be changed afterwards. The clusters produced may depend on the sequence the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted, which may not be suitable in certain applications. In this paper, two improved versions of SCC, SCC-I and SCC-IW, are proposed. SCC-I allows two or more training cycles on the instances to be performed. An instance can be re-assigned to another cluster in each cycle. A desired number of clusters is obtained when no assignment has been changed in the current cycle. In this way, the clusters produced are less likely to be affected by the feeding sequence of the instances. SCC-IW is an extension of SCCI, allowing each dimension to be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. This is useful when certain relevance exists between different dimensions. A number of experiments with real world benchmark datasets are conducted and the results are shown to demonstrate the effectiveness of the proposed ideas.
目次 Table of Contents
致謝 i
摘要 ii
Abstract iii
圖目錄 vi
表目錄 vii
第一章 導論 1
1.1. 研究背景 1
1.2. 研究動機與目的 3
1.3. 論文架構 4
第二章 文獻探討 5
2.1. 自建構式分群法( Self-Constructing Clustering , SCC ) 5
2.2. 結合時間權重之K-means ( TSKmeans ) 7
第三章 迭代式自建構分群法 10
3.1. 方法介紹 10
3.2. 範例 12
第四章 迭代式結合權重之自建構分群法 21
4.1. 動機說明 21
4.2. 方法介紹 22
4.3 範例 26
第五章 實驗結果與討論 30
5.1. 多類別資料集 31
5.2. 時間序列資料集 35
5.3. 參數α之影響與討論 39
第六章 結論與未來展望 42
參考文獻 43
附錄 49
參考文獻 References
[1] D. L. Olson, Y. Shi, Introduction to business data mining, McGraw-Hill/Irwin Englewood Cliffs, 2007.
[2] S. Theodoridis, K. Koutroumbas, Pattern Recognition, Elsevier, 2008.
[3] W. Li, L. Jaroszewski, A. Godzik, Clustering of highly homologous sequences to reduce the size of large protein databases, Bioinformatics 17 (3) (2001) 282–183.
[4] S.-J. Lee, C.-S. Ouyang, S.-H. Du, A neuro-fuzzy approach for segmentation of human objects in image sequences, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 33 (3) (2003) 420–437.
[5] R. Filipovych, S. M. Resnick, C. Davatzikos, Semi-supervised cluster analysis of imaging data, NeuroImage 54 (3) (2011) 2185–2197.
[6] J.-Y. Jiang, R.-J. Liou, S.-J. Lee, A fuzzy self-constructing feature clustering algorithm for text classification, IEEE Transactions on Knowledge and Data Engineering 23 (3) (2011) 335–349.
[7] R.-F. Xu, S.-J. Lee, Dimensionality reduction by feature clustering for regression problems, Information Sciences 299 (2015) 42–57.
[8] M. Wang, Y. Yu, W. Lin, Adaptive neural-based fuzzy inference system approach applied to steering control, Proceedings of International Symposium on Neural Networks (2009) 1189–1196.
[9] Y. Xu, V. Olman, D. Xu, Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees, Bioinformatics 18 (4) (2002) 536–545.
[10] C.-C. Wei, T.-T. Chen, S.-J. Lee, K-nn based neuro-fuzzy system for time series prediction, Proceedings of 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2013) 569–574.
[11] F. Can, E. A. Ozkarahan, Concepts and effectiveness of the cover-coefficient-
based clustering methodology for text databases, ACM Transactions on Database Systems 15 (4) (1990) 483–517.
[12] R. Feldman, J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
[13] S.-J. Lee, J.-Y. Jiang, Multilabel text categorization based on fuzzy relevance clustering, IEEE Transactions on Fuzzy Systems 22 (6) (2014) 1457–1471.
[14] C.-L. Liao, S.-J. Lee, A clustering based approach to improving the efficiency of collaborative filtering recommendation, Electronic Commerce Research and Applications 18 (2016) 1–9.
[15] F. M. Alvarez, A. Troncoso, J. C. Riquelme, J. S. A. Ruiz, Energy time series forecasting based on pattern sequence similarity, IEEE Transactions on Knowledge and Data Engineering 23 (8) (2011) 1230–1243.
[16] Z.-Y. Wang, S.-J. Lee, A neuro-fuzzy based method for TAIEX forecasting, Proceedings of International Conference on Machine Learning and Cybernetics (ICMLC) 1 (2) (2014) 579–584.
[17] B. Everitt, Cluster analysis, Chichester, West Sussex, UK: Wiley, 2011.
[18] T. Kohonen, Self-Organizing Maps, Springer-Verlag, 1995.
[19] K. Alsabti, S. Ranka, V. Singh, An efficient k-means clustering algorithm, Electrical Engineering and Computer Science Paper 43.
[20] Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery 2 (1998) 283–304.
[21] S.-J. Lee, C.-S. Ouyang, A neuro-fuzzy system modeling with self-constructing rule generation and hybrid SVD-based learning, IEEE Transactions on Fuzzy Systems 11 (3) (2003) 341–353.
[22] H.-S. Park, C.-H. Jun, A simple and fast algorithm for k-medoids clustering, Expert Systems with Applications 36 (2009) 3336–3341.
[23] D. Sculley, Web-scale k-means clustering, Proceedings of 19th International Conference on World Wide Web (2010) 1177–1178.
[24] A. Kraskov, H. Stogbauer, R. G. Andrzejak, P. Grassberger, Hierarchical clustering based on mutual information, arXiv:q-bio/0311039v2 [q-bio.QM].
[25] G. J. Szekely, M. L. Rizzo, M. L, Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method, Journal of Classification 22 (2005) 151–183.
[26] E. Achtert, C. Bohm, P. Kroger, DeLi-Clu: Boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking, Lecture Notes in Computer Science 3918 (2006) 119–128.
[27] E. Achtert, C. Bohm, P. Kroger, A. Zimek, Mining hierarchies of correlation clusters, Proceedings of 18th International Conference on Scientific and Statistical Database Management (SSDBM) (2006) 119–128.
[28] W. Zhang, D. Zhao, X. Wang, Agglomerative clustering via maximum incremental path integral, Pattern Recognition 46 (11) (2013) 3056–3065.
[29] M. Gagolewski, M. Bartoszuk, A. Cena, Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm, Information Sciences 363 (2016) 8–23.
[30] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 39 (1) (1977) 1–38.
[31] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1981.
[32] M. A. T. Figueiredo, A. K. Jain, Unsupervised learning of finite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (3) (2002) 381–396.
[33] K. Pal, J. Keller, J. Bezdek, A possibilistic fuzzy c-means clustering algorithm, IEEE transactions of Fuzzy Systems 13 (4) (2005) 517–530.
[34] M. R. Fellows, J. Guo, C. Komusiewicz, R. Niedermeier, J. Uhlmann, Graphbased data clustering with overlaps, Discrete Optimization 8 (1) (2011) 2–17.
[35] A. P´erez-Su´arez, J. F. Martinez-Trinidad, J. A. Carrasco-Ochoa, J. E. Medina- Pagola, OClustR: A new graph-based algorithm for overlapping clustering, Neurocomputing 121 (2013) 234–247.
[36] S. Baadel, F. Thabtah, J. Lu, Multi-cluster overlapping k-means extension algorithm, Proceedings of International Conference on Machine Learning and Computing, 2015.
[37] C. Am´endola, J.-C. Faug`ere, E. Sturmfels, Moment varieties of Gaussian mixtures, Journal of Algebraic Statistics 7 (1) (2016) 14–28.
[38] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large databases with noise, Proceedings of 2nd ACM International Conference on Knowledge Discovery and Data Mining (1996) 226–231.
[39] A. Hinneburg, d. Heim, An efficient approach to clustering large multimedia databases with noise, Prodeedings of 4th ACM International Conference on Knowledge Discovery and Data Mining (1998) 58–65.
[40] H.-P. Kriegel, P. Kroger, J. Sander, A. Zimek, Density-based clustering, WIREs Data Mining and Knowledge Discovery 1 (3) (2011) 231–240.
[41] R. Agrawal, J. Gehrke, D. Gunopoulos, P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of ACM International Conference on Management of Data (1998) 94–105.
[42] C.-H. Cheng, A. W. Fu, Y. Zhang, Entropy-based subspace clustering for mining numerical data, Proceedings of 5th ACM International Conference on Knowledge Discovery and Data Mining (1999) 84–93.
[43] K. Kailing, H.-P. Kriegel, P. Kroger, Density-connected subspace clustering for high-dimensional data, Proceedings of SIAM International Conference on Data Mining (SDM’04) (2004) 246–257.
[44] R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, Automatic subspace clustering of high dimensional data, Data Mining and Knowledge Discovery 11 (2005) 5–33.
[45] E. Achtert, C. Bohm, H.-P. Kriegel, P. Kroger, I. Muller-Gorman, A. Zimek, Detection and visualization of subspace cluster hierarchies, Lecture Notes in Computer Science 4443 (2007) 152–163.
[46] H.-P. Kriege, P. Kroger, A. Zimek, Subspace clustering, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2 (4) (2012) 351–364.
[47] B. J. Frey, D. Dueck, Clustering by passing messages between data points, Science 315 (5814) (2007) 972–976.
[48] C.-S. Ouyang, W.-J. Lee, S.-J. Lee, A TSK-type neuro-fuzzy network approach to system modeling problems, IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics 35 (4) (2005) 751–767.
[49] X. Huang, Y. Ye, L. Xiong, R. Lau, N. Jiang, S. Wang, Time series k-means: A new k-means type smooth subspace clustering for time series data, Information Sciences 367 (2016) 1–13.
[50] X. Huang, Y. Ye, H. Guo, Y. Cai, H. Zhang, Y. Li, DSKmeans: a new kmeanstype approach to discriminative subspace clustering, Knowledge-Based Systems 70 (2014) 293–300.
[51] A. Asuncion, D. Newman, The UCI machine learning repository.
[52] K-means, https://www.mathworks.com/help/stats/kmeans.html.
[53] Fuzzy c-means, https://www.mathworks.com/help/fuzzy/fcm.html.
[54] Gaussian mixture model, https://en.wikipedia.org/wiki/Mixture model.
[55] Matlab, https://www.mathworks.com/products/matlab.html.
[56] Gmm source code, http://blog.pluskid.org/?p=39.
[57] Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, The UCR time series classification archive.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code