國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以模糊集合理論改善支持向量機之增量學習演算法,Enhancement of Incremental Learning Algorithm for Support Vector Machines Using Fuzzy Set Theory

論文名稱 Title	以模糊集合理論改善支持向量機之增量學習演算法 Enhancement of Incremental Learning Algorithm for Support Vector Machines Using Fuzzy Set Theory
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	97 學年度第 1 學期 The fall semester of Academic Year 97	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	76
研究生 Author	莊育明 Yu-Ming Chuang
指導教授 Advisor	林葭華 Cha-Hwa Lin
召集委員 Convenor	李宗南 Chung-Nan Lee
口試委員 Advisory Committee	范俊逸 Chun-I Fan
口試日期 Date of Exam	2008-07-29	繳交日期 Date of Submission	2009-02-03
關鍵字 Keywords	分類、模糊集合理論、增量學習、支持向量機 Fuzzy Set Theory, Classification, SVM, Incremental Learning
統計 Statistics	本論文已被瀏覽 5680 次，被下載 0 次 The thesis/dissertation has been browsed 5680 times, has been downloaded 0 times.

中文摘要
近年來，有相當多關於支持向量機(Support Vector Machines, SVMs)的研究。支持向量機被廣泛地應用在許多領域中且具有相當不錯的分類(預測)率。但面對資料集中含有大量資料的情況下，便需要冗長的計算時間與龐大的記憶體空間。為了降低運算時的複雜度，學者們提出增量學習的演算法，用來協助處理那些待訓練的資料。一些研究指出，某些在超平面附近非支持向量的資料會有助於學習過程。因此，本研究對於N. A. Syed的方法進行改良，提出三種新的增量學習演算法: 混合式增量學習演算法(MIL)，半混合式增量學習演算法(HMIL)與回合氏增量學習演算法(PIL)，之後再加入模糊集合理論來協助分類測試資料，期待獲得較佳的分類準確率。本實驗分別探討三種測試的方法與學習採用的回合次數對於分類結果的影響並針對五種不同的資料集，進行準確度的測試。實驗結果顯示，與其他採用增量學習或是主動學習演算法之模擬結果比較，MIL皆提供了不錯的分類準確度。而HMIL以及PIL則特別針對在某些研究獲得分類高準確度的資料集上，獲得更近一步的效能改善。
Abstract
Over the past few years, a considerable number of studies have been made on Support Vector Machines (SVMs) in many domains to improve classification or prediction. However, SVMs request high computational time and memory when the datasets are large. Although incremental learning techniques are viewed as one possible solution developed to reduce the computation complexity of the scalability problem, few studies have considered that some examples close to the decision hyperplane other than support vectors (SVs) might contribute to the learning process. Consequently, we propose three novel algorithms, named Mixed Incremental learning (MIL), Half-Mixed Incremental learning (HMIL), and Partition Incremental learning (PIL), by improving Syed’s incremental learning method based on fuzzy set theory. We expect to achieve better accuracy than other methods. In the experiments, the proposed algorithms are investigated on five standard machine learning benchmark datasets to demonstrate the effectiveness of the method. Experimental results show that HIL have superior classification accuracy than the other incremental or active learning algorithms. Especially, for the datasets that might have high accuracy in other research reports, HMIL and PIL could even improve the performance.

目次 Table of Contents
Chapter 1 Introduction 1 Chapter 2 Literature Reviews 3 2.1 Support Vector Machines 3 2.2 Fuzzy Set Theory 6 2.3 Related Work 15 Chapter 3 The Proposed Methods 18 3.1 Incremental Training 20 3.1.1 Mixed Incremental Training 20 3.1.2 Half-Mixed Incremental Training 23 3.1.3 Partitional Incremental Training 25 3.2 Extension based on fuzzy set theory 27 3.3 Test Methods 30 Chapter 4 Simulation 33 4.1 Datasets 34 4.2 Simulation 1 - Comparison of Proposed Test Methods with 10-Fold Cross Validation 36 4.3 Simulation 2 - Analysis on the Number of Iterations 40 4.4 Simulation 3 - Evaluation of Candidate Examples 51 4.5 Simulation 4 - Performance Measure of Learning Algorithms 58 Chapter 5 Conclusion 65 References 67

參考文獻 References
[1] C. Cambell, N. Cristianini, A. Smola, Query learning with large margin classifiers, Proceedings of the 17th International Conference on Machine Learning, Stanford University, CA, June 2000, pp.111-118. [2] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [3] C. Cheng, Frank Y. Shih, An improved incremental training algorithm for support vector machines using active query, Pattern Recognition, Volume 40, Issue 3, March 2007, Pages 964-971. [4] C. Cortes, V. Vapnik, Support-vector network, March. Learn. 20 (1995) 273-297. [5] H. Drucker, D. Wu, and V. N. Vapnik, Support vector machines for spam categorization, IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1048-1054, 1999. [6] R. E. Fan, P. H. Chen, and C. J. Lin, Working set selection using the second order information for training SVM. Journal of Machine Learning Research 6, 1889-1918, 2005 [7] J. S. R. Jang, C. T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1997. [8] T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceedings of the European Conference on Machine Learning (ECML), Springer, 1998. [9] P. Mitra, C. A. Murthy, S. K. Pal, A probabilistic active support vector learning algorithm, IEEE Trans, Pattern Anal. Mach. Intel. 26(3) (2004) 413-418. [10] D. J. Newman, S. Hettich, C. L. Blake, C. J. Merz. UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998. [11] H. T. Nguyen, N. R. Prasad, C. L. Walker, E. A. Walker, A First Course in Fuzzy and Neural control. Chapman & Hall/CRC, Boca Raton, FL, 2003. [12] H. T. Nguyen, A. Smeulders, Active learning using pre-clustering, Proceedings of the 21st International Conference on Machine Learning, Alberta, Canada, July 2004. [13] E. Osuna, R. Freund, F. Girosi. An improved training algorithm for support vector machines. In Proceedings of IEEE NNSP’97, Amelia Island, FL, 1997. [14] G. Schohn, D. Cohn, Less is more: active learning with support vector machines, Proceedings of the 17th International Conference on Machine Learning, Stanford University, CA, June 2000, pp.839-846. [15] N.A. Syed, H. Liu, K. K. Sung, Incremental learning with support vector machines, Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, July 1999. [16] S. Tong and D. Koller, Support Vector Machine Active Learning with Applications to Text Classification, Journal of Machine Learning Research, vol. 2, pp.45-66, 2001. [17] V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. [18] L. A. Zadeh, Fuzzy Sets, Information and Control 8, 338-353(1965).

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.119.17.207 論文開放下載的時間是校外不公開 Your IP address is 18.119.17.207 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS