博碩士論文 etd-0810110-175700 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 蔡憲奇(Shian-Chi Tsai) 電子郵件信箱 E-mail 資料不公開
畢業系所 電機工程學系研究所(Electrical Engineering)
畢業學位 碩士(Master) 畢業時期 98學年第2學期
論文名稱(中) 一個混合式多標籤文件分類方法  
論文名稱(英) A Mixed Approach for Multi-Label Document Classification
檔案
  • etd-0810110-175700.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    電子論文:校內校外完全公開

    論文語文/頁數 中文/54
    統計 本論文已被瀏覽 5061 次,被下載 1085 次
    摘要(中)  不同於單標籤(single-label)的文件分類,文件只屬於單一類別,當文件同時分類到兩個以上的類別時,稱為多標籤(multi-label)文件,而如何多具有多標籤特性的文件進行準確的分類,成為近年來熱門的研究課題。在此論文裡,我們針對多標籤文件分類問題提出一個結合模糊相似方法與multi-label K nearest neighbors(MLKNN)演算法的分類方法 fuzzy similarity measure multi-label K nearest neighbors(FSMLKNN),我們的方法透過模糊相似測量演算法來計算測試文件與類別群中心相似度,並結合MLKNN的演算法使其效率大幅改善且準確率相對的提升。在實驗中,會將FSMLKNN和現存的分類方法,包含決策樹C4.5、支援向量機support vector machine(SVM)、和MLKNN演算法比較,實驗的結果顯示,FSMLKNN相較於其他方法具有更佳的效率與良好的準確率。
    摘要(英)  Unlike single-label document classification, where each document exactly belongs to a single category, when the document is classified into two or more categories, known as multi-label file, how to classify such documents accurately has become a hot research topic in recent years. In this paper, we propose a algorithm named fuzzy similarity measure multi-label K nearest neighbors(FSMLKNN) which combines a fuzzy similarity measure with the multi-label K nearest neighbors(MLKNN) algorithm for multi-label document classification, the algorithm improved fuzzy similarity measure to calculate the similarity between a document and the center of cluster similarity, and proposed algorithm can significantly improve the performance and accuracy for multi-label document classification. In the experiment, we compare FSMLKNN and the existing classification methods, including decision tree C4.5, support vector machine(SVM) and MLKNN algorithm, the experimental results show that, FSMLKNN method is better than others.
    關鍵字(中)
  • 多標籤文件分類
  • 模糊相似度測量
  • 相關分數
  • 資訊檢索
  • 關鍵字(英)
  • relevance score
  • information retrieval
  • Multi-Label document classification
  • fuzzy similarity measure
  • 論文目次 摘要 i
    Abstract ii
    圖目錄 v
    表目錄 vi
    第一章 緒論 1
    1.1 概述 1
    1.2 研究動機 2
    1.3 論文架構 3
    第二章 文獻探討 4
    2.1 多標籤問題轉換 4
    2.2 分類方法 6
    2.2.1 決策樹C4.5 6
    2.2.2 支援向量機 7
    2.2.3 MLKNN 8
    2.2.4 模糊相似方法 11
    第三章 系統簡介 15
    3.1 文件分類系統架構 15
    3.2 文件前處理 16
    3.3 特徵選取 18
    3.4 特徵選取方式 20
    第四章 我們的方法 21
    4.1 模糊相似分群 23
    4.2 MLKNN分類 27
    第五章 實驗結果與分析 29
    5.1 文件集 29
    5.2 評估方法 30
    5.3 實驗結果 31
    5.3.1 實驗一 31
    5.3.2 實驗二 38
    第六章 結論與未來展望 42
    參考文獻 43
    參考文獻 [1] R. Baeza-Yates and B. Ribeiro-Neto, “Modern Information Retrieval,” Addison Wesley, 1999.
    [2] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning Multi-Label Scene Classification,” Pattern Recognition, vol. 37, no. 9, pages 1757-1771, 2004.
    [3] Y. C. Chang, S. M. Chen, and C. J. Liau, “Multilabel Text Categorization Based on a New Linear Classifier Learning Method and a Category-Sensitive Refinement Method,” Expert Systems with Application, pages 1948-1953, 2008.
    [4] S. Diplaris, G. Tsoumakas, P. Mitkas, and I. Vlahavas, “Protein Classification with Multiple Algorithms,” Panhellenic Conference on Informatics , vol. 3746, pages 448-456, 2005.
    [5] S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami, “Inductive Learning Algorithms and Representation for Text Categorization,” ACM International Conference on Information and Knowledge Management, pages 148-155, 1998.
    [6] N. Fuhr and C. Buckley, “A Probabilistic Learning Approach for Document Indexing,” ACM Transactions on Information Systems, vol. 9, no. 3, pages 223-248 , 1991.
    [7] I. J. Good, “The Estimation of Probabilities: An Essay on Modern Bayesian Methods,” MIT Press, 1965.
    [8] D. A. Hull, “Improving Text Retrieval for the Routing Problem Using Latent Semantic Indexing,” ACM International Conference on Research and Development in Information Retrieval, pages 282-289, 1994.
    [9] T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with Tfidf for Text Categorization,” International Conference on Machine Learning , pages 143-151, 1997.
    [10] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” European Conference on Machine Learning, pages 137-142, 1998.
    [11] D. D. Lewis and M. Ringuette, “A Comparison of Two Learning Algorithms for Text Categorization,” Third Annual Symposium on Document Analysis and Information Retrieval, pages 81-93, 1994.
    [12] D. D. Lewies, Y. Yang, T. G. Rose, and F. Li, “RCV1 : A New Benchmark Collection for Text Categorization Research, ” Journal of Machine Learning Research, vol. 5, pages 361-397, 2004
    [13] T. Mitchell, “Machine Learning,” McGraw-Hill, 1997.
    [14] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, pages 81-106, 1986.
    [15] J. R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann, 1993.
    [16] J. J. Rocchio, “Relevance Feedback in Information Retrieval,” The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313-323, 1997.
    [17] G. Salton and M. J. McGill, “Introduction to Modern Retrieval,” McGraw-Hill Book Company, 1983.
    [18] F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, vol. 34, no. 1, pages 1-47, 2002.
    [19] R. Saracoğlu, K. Tütüncü, and N. Allahverdi, “A New Approach on Search for Similar Documents with Multiple Categories Using Fuzzy Clustering,” Expert Systems with Application, pages 2545-2554, 2008.
    [20] S. Tan, “Neighbor-weighted K-nearest Neighbor for Unbalanced Text Corpus,” Expert Systems with Applications, vol. 28, no. 4, pages 667-671, 2005.
    [21] S. Tan, “An Effective Refinement Strategy for KNN Text Classifier,” Expert Systems with Applications, vol. 30, no. 2, pages 290-298, 2006.
    [22] G. Tsoumakas and I. Katakis, “Multi-label Classification: An Overview,” International Journal of Data Warehousing and Mining vol. 3, no. 3, pages 1-13, 2007.
    [23] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-label Data,” Data Mining and Knowledge Discovery Handbook (draft of preliminary accepted chapter), O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2009.
    [24] D. H. Widyantoro and J. Yen, “A Fuzzy Similarity Approach in Text Classification Task,” IEEE International Conference on Fuzzy Systems, vol. 2, pages 653-658, 2000.
    [25] M. L. Zhang and Z. H. Zhou, “A K-nearest Neighbor Based Algorithm for Multi-label Classification,” IEEE International Conference on Granular Computing, vol. 2, pages 718-721, 2005.
    [26] M. L. Zhang and Z. H. Zhou, “MLKNN : A Lazy Learning Approach Multi-Label Learning,” Pattern Recognition, vol. 40, pages 2038-2048, 2007.
    [27] http://disi.unitn.it/moschitti/corpora.htm
    [28] http://people.csail.mit.edu/jrennie/20Newsgroups/
    口試委員
  • 吳志宏 - 召集委員
  • 歐陽振森 - 委員
  • 蔡賢亮 - 委員
  • 賴智錦 - 委員
  • 李錫智 - 指導教授
  • 口試日期 2010-07-21 繳交日期 2010-08-10

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫