Responsive image
博碩士論文 etd-0808111-134811 詳細資訊
Title page for etd-0808111-134811
論文名稱
Title
文件資料維度縮減與多標籤分類方法之研究
Feature Reduction and Multi-label Classification Approaches for Document Data
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
122
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2011-07-04
繳交日期
Date of Submission
2011-08-08
關鍵字
Keywords
特徵分群、自建構分群、多標籤文件分類、文件分類、維度縮減
multi-label document classification, self-constructing clustering, text classification, dimension reduction, feature clustering
統計
Statistics
本論文已被瀏覽 5770 次,被下載 1234
The thesis/dissertation has been browsed 5770 times, has been downloaded 1234 times.
中文摘要
本論文針對文件資料提出新的特徵縮減與多標籤分類方法。在處理文件資料時通常以向量空間模型(vector space model)來描述一個文件內容。這樣的表示方式使得文件特徵維度會非常的龐大,因此不利於文件內容的分類處理。為了改善高維度造成的影響,在本論文中我們提出一個特徵群聚方法來進行文件維度縮減,並設計一個高效率的方法來處理文件資料分類問題。

我們利用所研發之自建構群聚方法,針對高維度文件資料特徵進行分群,將分群之後的結果利用加權方式結合,形成新的較低維度資料集。在分群的過程中,我們透過特徵出現在各個類別的機率分佈來表示每個特徵,並利用歸屬函數來計算每個特徵之間的相似度,藉此將相似特徵歸納為一群。自建構群聚方法應用漸進式的方式,逐一處理輸入的資料,並且只處理一次,因此速度比傳統迭代式方法快很多。經過複雜度的分析與實驗結果的比較,證明我們所提出的方法具有快速的執行能力與更高的精確度之優點。透過特徵分群的方式來減少維度,可以節省大量的資料儲存空間,也可以減少執行分類工作訓練與測試的時間。經由實際文件資料實驗佐證,我們的方法可以比其他已知方法更快速有效的進行文件維度縮減,並且幫助分類器取得不錯的分類結果。

在本論文中我們也提出一個多標籤文件分類方法。多標籤文件為一個文件資料可以同時屬於多個類別。透過模糊相似度的運算將每一個文件表示為對於類別的模糊相似度向量,此向量的長度為類別個數,其遠小於文件維度,以此達到維度縮減的功能,提升後續處理的速度。當文件以模糊相似度向量表示,則可以透過文件對於各類別之間相似程度的分布情形來衡量兩個文件之間是否相似,我們透過一個漸進式的分群方法對以模糊相似度向量表示的文件進行分群,其中每一群代表一種特定的分布情形。然後以最小平方法來評估各種分布情形對於多標籤類別的影響,最後透過訓練樣本的輸入,取得對於每一個類別的門檻值,以此來分類多標籤文件資料。經由實際文件資料實驗佐證,我們的方法可以更快速有效的對多標籤文件資料進行分類的工作。
Abstract
This thesis proposes some novel approaches for feature reduction and multi-label classification for text datasets. In text processing, the bag-of-words model is commonly used, with each document modeled as a vector in a high dimensional space. This model is often called the vector-space model. Usually, the dimensionality of the document vector is huge. Such high-dimensionality can be a severe obstacle for text processing algorithms. To improve the performance of text processing algorithms, we propose a feature clustering approach to reduce the dimensionality of document vectors. We also propose an efficient algorithm for text classification.

Feature clustering is a powerful method to reduce the dimensionality
of feature vectors for text classification. We
propose a fuzzy similarity-based self-constructing algorithm for
feature clustering. The words in the feature vector of a document
set are grouped into clusters based on similarity test. Words that
are similar to each other are grouped into the same cluster. Each
cluster is characterized by a membership function with statistical
mean and deviation. When all the words have been fed in, a desired
number of clusters are formed automatically. We then have one
extracted feature for each cluster. The extracted feature
corresponding to a cluster is a weighted combination of the words
contained in the cluster. By this algorithm, the derived membership
functions match closely with and describe properly the real
distribution of the training data. Besides, the user need not
specify the number of extracted features in advance, and
trial-and-error for determining the appropriate number of extracted
features can then be avoided. Experimental results show
that our method can run faster and obtain better extracted features than other methods.

We also propose a fuzzy similarity clustering scheme for multi-label
text categorization in which a document can belong to one or more
than one category. Firstly, feature transformation is performed. An
input document is transformed to a fuzzy-similarity vector. Next,
the relevance degrees of the input document to a collection of
clusters are calculated, which are then combined to obtain the
relevance degree of the input document to each participating
category. Finally, the input document is classified to a certain
category if the associated relevance degree exceeds a threshold. In
text categorization, the number of the involved terms is usually
huge. An automatic classification system may suffer from large
memory requirements and poor efficiency. Our scheme can do without
these difficulties. Besides, we allow the region a category covers
to be a combination of several sub-regions that are not necessarily
connected. The effectiveness of our proposed scheme is demonstrated
by the results of several experiments.
目次 Table of Contents
摘要 i
Abstract iii
Contents v
List of Figures viii
List of Tables x
1 Introduction 1
1.1 Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Document Feature Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Multi-label Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Document Feature Reduction 9
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Information Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Incremental Orthogonal Centroid Algorithm . . . . . . . . . . . . . . . 12
2.2.3 Divisive Information-Theoretic Feature Clustering . . . . . . . . . . . . 14
3 Multi-label Text Categorization 17
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Problem Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Algorithm Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Fuzzy Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Rank-SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 ML-RBF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.4 ML-KNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.5 BoosTexter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 A Fuzzy Self-Constructing Feature Clustering Algorithm 29
4.1 Self-Constructing Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 A Novel Similarity-Based Scheme for Multi-Label Text Categorization 44
5.1 Feature Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Cluster-similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Category-similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4 Hard-limiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Operation Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6 Experimental Results 64
6.1 Experimental Results for Document Feature Reduction . . . . . . . . . . . . . 64
6.1.1 20 Newsgroups Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.2 REUTERS CORPUS VOLUME 1 (RCV1) Dataset . . . . . . . . . . . 71
6.1.3 Cade12 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Experimental Results for Multi-label Classification . . . . . . . . . . . . . . . 79
6.2.1 WebKB Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.2 Medical Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2.3 YAHOO Web Page Dataset . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.4 RCV1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7 Conclusion 92
Bibliography 96
參考文獻 References
[1] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slat-
tery, “Learning to extract symbolic knowledge from the World Wide Web,” Fifteenth
National Conference on Artifical Intelligence, 1998.
[2] D. D. Lewis and K. A. Knowles, “Threading electronic mail: A preliminary study,”
Information Processing and Management, vol. 33, no. 2, pp. 209–217, 1997.
[3] K. Lang, “NewsWeeder : Learning to filter netnews,” International Conference on Ma-
chine Learning, pp. 331–339, 1995.
[4] S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan, “Keyword detection, navigation,
and annotation in hierarchical text,” 23rd International Conference on Very Large Data
Bases, pp. 446–455, 1997.
[5] S. Weiss, S. Kasif, and E. Brill, “Text classification in USENET newsgroup: A progress
report,” AAAI Spring Symposium on Machine Learning in Information Access Technical
Papers, 1996.
[6] D. Hull, J. Pedersen, and H. Schutze, “Document routing as statistical classification,”
AAAI Spring Symposium on Machine Learning in Information Access Technical Papers,
1996.
[7] T. Yan and H. Molina, “SIFT - a tool for wide-area information dissemination,” 1995
USENIX Technical Conference, pp. 177–186, 1995.
[8] G. Salton and M. J. McGill, Introduction to Modern Retrieval. McGraw-Hill Book
Company, 1983.
[9] T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TFIDF for text
categorization,” in 14th International Conference on Machine Learning, 1997, pp. 143–
151.
[10] “Http://people.csail.mit.edu/jrennie/20newsgroups/.”
[11] “Http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html.”
[12] H. Kim, P. Howland, and H. Park, “Dimension reduction in text classification with
support vector machines,” Journal of Machine Learning Research, vol. 6, pp. 37–53,
2005.
[13] F. Sebastiani, “Machine learning in automated text categorization,” ACM Computing
Surveys, vol. 34, no. 1, pp. 1–47, 2002.
[14] B. Y. Ricardo and R. N. Berthier, Modern Information Retrieval. Addison Wesley
Longman, 1999.
[15] A. L. Blum and P. Langley, “Selection of relevant features and examples in machine
learning,” Aritficial Intelligence, vol. 97, no. 1-2, pp. 245–271, 1997.
[16] E. F. Combarro, E. Monta˜nés, I. Díaz, J. Ranilla, and R. Mones, “Introducing a family
of linear measures for feature selection in text categorization,” IEEE Transactions on
Knowledge and Data Engineering, vol. 17, no. 9, pp. 1223–1232, 2005.
[17] K. Daphne and M. Sahami, “Toward optimal feature selection,” in 13th International
Conference on Machine Learning, 1996, pp. 284–292.
[18] R. Kohavi and G. John, “Wrappers for feature subset selection,” Aritficial Intelligence,
vol. 97, no. 1-2, pp. 273–324, 1997.
[19] Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text cat-
egorization,” in 14th International Conference on Machine Learning, 1997, pp. 412–
420.
[20] H. Liu and L. Yu, “Toward integrating feature selection algorithms for classification and
clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp.
491–502, 2005.
[21] D. D. Lewis, “Feature selection and feature extraction for text categorization,” in Work-
shop Speech and Natural Language, 1992, pp. 212–217.
[22] H. Li, T. Jiang, and K. Zang, “Efficient and robust feature extraction by maximum
margin criterion,” in Conference on Advances in Neural Information Processing System,
2004, pp. 97–104.
[23] E. Oja, Subspace Methods of Pattern Recognition. Research Studies Press, 1983.
[24] R. Caruana and D. Freitag, “Greedy attribute selection.” 11th International Conference
on Machine Learning, pp. 28–36, 1994.
[25] J. G. Dy and C. E. Brodley, “Feature subset selection and order identification for un-
supervised learning,” 17th International Conference on Machine Learning, pp. 247–254,
2000.
[26] Y. Kim, W. Street, and F. Menczer, “Feature selection for unsupervised learning via
evolutionary search,” Sixth ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining, pp. 365–369, 2000.
[27] M. Dash, K. Choi, P. Scheuermann, and H. Liu, “Feature selection for clustering - a
filter solution,” Second International Conference on Data Mining, pp. 115–122, 2002.
[28] M. A. Hall, “Correlation-based feature selection for discrete and numeric class machine
learning,” 17th International Conference on Machine Learning, pp. 359–366, 2000.
[29] H. Liu and R. Setiono, “A probabilistic approach to feature selection - a filter solution,”
13th International Conference on Machine Learning, pp. 319–327, 1996.
[30] L. Yu and H. Liu, “Feature selection for high-dimensional data: A fast correlation-based
filter selection,” 20h International Conference on Machine Learning, pp. 856–863, 2003.
[31] S. Das, “Filters, wrappers and a boosting-based hybrid for feature selection,” 18th In-
ternational Conference on Machine Learning, pp. 74–81, 2001.
[32] A. Y. Ng, “On feature selection: Learning with exponentially many irrelevant features
as training examples,” 15th International Conference on Machine Learning, pp. 404–
412, 1998.
[33] E. Xing, M. Jordan, and R. Karp, “Feature selection for high-dimensional genomic
microarray data,” 15th International Conference on Machine Learning, pp. 601–608,
1998.
[34] P. Langley, “Selection of relevant feature in machine learning,” The AAAI Fall Sympo-
sium on Relevance, pp. 140–144, 1994.
[35] J.Yan, B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q. Yang, W. Xi, and Z. Chen,
“Effective and efficient dimensionality reduction for large-scale and streaming data pre-
processing,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp.
320–331, 2006.
[36] I. T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.
[37] A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 23, no. 2, 2001.
[38] H. Park, M. Jeon, and J. B. Rosen, “Lower dimensional representation of text data based
on centroids and least squares,” BIT Numberical Math, vol. 43, pp. 427–448, 2003.
[39] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear
embedding,” Science, vol. 290, pp. 2323–2326, 2000.
[40] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for
nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319–2323, 2000.
[41] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding
and clustering,” Advances in Neural Information Processing Systems 14, 2002.
[42] K. Hiraoka, K. Hidai, M. Hamahira, H. Mizoguchi, T. Mishima, and S. Yoshizawa,
“Successive learning of linear discriminant analysis: Sanger-type algorithm,” in 14th
International Conference on Pattern Recognition, 2000, pp. 2664–2667.
[43] J. Weng, Y. Zhang, and W. S. Hwang, “Candid covariance-free incremental principal
component analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 25, no. 8, pp. 1034–1040, 2003.
[44] J. Yan, B. Y. Zhang, S. C. Yan, Z. Chen, W. G. Fan, Q. Yang, W. Y. Ma, and Q. S.
Cheng, “Immc: Incremental maximum margin criterion,” in 10th ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining, 2004, pp. 725–730.
[45] L. D. Baker and A. McCallum, “Distributional clustering of words for text classification,”
in 21st Annual International ACM SIGIR, 1998, pp. 96–103.
[46] R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, “Distributional word clusters vs.
words for text categorization,” Journal of Machine Learning Research, vol. 1, pp. 1–48,
2002.
[47] M. C. Dalmau and O. W. M. Flórez, “Experimental results of the signal processing
approach to distributional clustering of terms on reuters-21578 collection,” in 29th Eu-
ropean Conference on IR Research, 2007, pp. 678–681.
[48] I. S. Dhillon, S. Mallela, and R. Kumar, “A divisive infromation-theoretic feature clus-
tering algorithm for text classification,” Journal of Machine Learning Research, vol. 3,
pp. 1265–1287, 2003.
[49] D. Ienco and R. Meo, “Exploration and reduction of the feature space by hierarchical
clustering,” in 2008 SIAM Conference on Data Mining, 2008, pp. 577–587.
[50] N. Slonim and N. Tishby, “The power of word clusters for text classification,” in 23rd
European Colloquium on Information Retrieval Research (ECIR), 2001.
[51] F. Pereira, N. Tishby, and L. Lee, “Distributional clustering of englishwords,” in 31st
Annual Meeting of ACL, 1993, pp. 183–190.
[52] H. Al-Mubaid and S. A. Umair, “A new text categorization technique using distributional
clustering and learning logic,” IEEE Transactions on Knowledge and Data Engineering,
vol. 18, no. 9, pp. 1156–1165, 2006.
[53] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley,
1999.
[54] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label scene classifi-
cation,” Pattern Recognition, vol. 37, no. 9, pp. 1757–1771, 2004.
[55] A. Elisseeff and J. Weston, “A kernel method for multi-labelled classification,” Advances
in Neural Information Processing Systems 14, MIT Press, Cambridge, pp. 681–687,
2002.
[56] J. J. Rocchio, “Relevance feedback in information retrieval,” In G. Salton (Ed.), The
SMART retrieval system: Experiments in automatic document processing, pp. 313–323,
1971.
[57] T. Mitchell, Machine Learning. McGraw-Hill, 1997.
[58] S. Tan, “Neighbor-weighted k-nearest neighbor for unbalanced text corpus,” Expert Sys-
tems with Applications, vol. 28, no. 4, pp. 667–671, 2005.
[59] S. Tan, “An effective refinement strategy for KNN text classifier,” Expert Systems with
Applications, vol. 30, no. 2, pp. 290–298, 2006.
[60] Y. Yang and C. G. Chute, “An example-based mapping method for text categorization
and retrieval,” ACM Transactions on Information Systems, vol. 12, no. 3, pp. 252–277,
1994.
[61] D. A. Hull, “Improving text retrieval for the routing problem using latent semantic
indexing,” ACM International Conference on Research and Development in Information
Retrieval, pp. 282–289, 1994.
[62] D. W. Aha, “Lazy learning: Special issue editorial,” Artificial Intelligence Review, vol. 11,
no. 1-5, pp. 7–10, 1997.
[63] D. Lewis and M. Ringuette, “A comparison of two learning algorithms for text catego-
rization,” Third Annual Symposium on Document Analysis and Information Retrieval,
pp. 81–93, 1994.
[64] I. J. Good, The Estimation of Probabilities: An Essay on Modern Bayesian Methods.
MIT Press, 1965.
[65] J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, 1986.
[66] J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[67] N. Fuhr and C. Buckley, “A probabilistic learning approach for document indexing,”
ACM Transactions on Information Systems, vol. 9, no. 3, pp. 223–248, 1991.
[68] C. Apté, F. J. Damerau, and S. M. Weiss, “Automated learning of decision rules for text
categorization,” ACM Transactions on Information Systems, vol. 12, no. 3, pp. 233–251,
1994.
[69] W. W. Cohen and Y. Singer, “Context-sensitive learning methods for text categoriza-
tion,” ACM Transactions on Information Systems, vol. 17, no. 2, pp. 141–173, 1999.
[70] T. Joachims, “Text categorization with support vector machines: Learning with many
relevant features,” European Conference on Machine Learning, pp. 137–142, 1998.
[71] S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami, “Inductive learning algorithms
and representation for text categorization,” 7th ACM International Conference on In-
formation and Knowledge Management, pp. 148–155, 1998.
[72] G. Tsoumakas and I. Katakis, “Multi-label classification: An overview,” International
Journal of Data Warehousing and Mining, vol. 3, no. 3, pp. 1–13, 2007.
[73] G. Tsoumakas, I. Katakis, and I. Vlahavas, Mining Multi-label Data. O. Maimon, L.
Rokach (Ed.), Springer, 2nd edition, 2010.
[74] A. McCallum, “Multi-label text classification with a mixture model trained by EM,”
Working Notes of the AAAI’99 Workshop on Text Learning, 1999.
[75] R. E. Schapire and Y. Singer, “BoosTexter: A boosting-based system for text catego-
rization,” Machine Learning, vol. 39, no. 2-3, pp. 135–168, 2000.
[76] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete
data via the EM algorithm,” Journal of the Royal Statistics Society-B, vol. 39, no. 1,
pp. 1–38, 1977.
[77] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning
and an application to boosting,” Journal of Computer and System Sciences, vol. 55,
no. 1, pp. 119–139, 1997.
[78] F. D. Comité, R. Gilleron, and M. Tommasi, “Learning multi-label altenating decision
tree from texts and data,” Lecture Notes in Computer Science, vol. 2734, pp. 35–49,
2003.
[79] Y. Freund and L. Mason, “The alternating decision tree learning algorithm,” 16th In-
ternational Conference on Machine Learning, pp. 124–133, 1999.
[80] M. L. Zhang and Z. H. Zhou, “Multilabel neural networks with applications to func-
tional genomics and text categorization,” IEEE Transactions on Knowledge and Data
Engineering, vol. 18, no. 10, pp. 1338–1351, 2006.
[81] M. L. Zhang, “ML-RBF: RBF neural networks for multi-label learning,” Neural Pro-
cessing Letters, vol. 29, no. 2, pp. 61–74, 2009.
[82] M. L. Zhang and Z. H. Zhou, “ML-kNN: A lazy learning approach to multi-label learn-
ing,” Pattern Recognition, vol. 40, no. 7, pp. 2038–2048, 2007.
[83] M. L. Zhang, J. M. Peña, and V. Robles, “Feature selection for multi-label naive bayes
classification,” Information Sciences, vol. 179, no. 19, pp. 3218–3229, 2009.
[84] M. Jeon, H. Park, and J. B. Rosen, “Dimension reduction based on centroids and least
squares for efficient processing of text data,” Technical Report MN TR 01-010, Univ. of
Minnesota, Minneapolis, 2003.
[85] P. Howland and H. Park, “Generalizing discriminant analysis using the generalized sin-
gular value decomposition,” IEEE Transactions on Pattern Analysis and Machine In-
telligence, vol. 26, pp. 995–1006, 2004.
[86] S. Diplaris, G. Tsoumakas, P. Mitkas, and I. Vlahavas, “Protein classification with
multiple algorithms,” Panhellenic Conference on Informatics, vol. 3746, pp. 448–456,
2005.
[87] T. Goncalves and P. Quaresma, “A preliminary approach to the multilabel classification
problem of portuguese juridical documents,” in 11th Portuguese Conference on Artificical
Intelligence, 2003.
[88] B. Lauser and A. Hotho, “Automatic multi-label subject indexing in a multilingual
environment,” in 7th European Conference in Research and Advanced Technology for
Digital Libraries, 2003.
[89] T. Li and M. Ogihara, “Detecting emotion in music,” in Internation Symposium on
Music Information Retreval, 2003.
[90] A. Clare and R. D. King, “Knowledge discovery in multi-label phenotype data,” in 5th
European Conference on Principles of Data Mining and Knowledge Discovery, 2001.
[91] D. H. Widyantoro and J. Yen, “A fuzzy similarity approach in text classification task,”
IEEE International Conference on Fuzzy Systems, pp. 653–658, 2000.
[92] R. Saraco˘glu, K. T‥ut‥unc‥u, and N. Allahverdi, “A new approach on search for similar
documents with multiple categories using fuzzy clustering,” Expert Systems with Appli-
cations, vol. 34, no. 4, pp. 2545–2554, 2008.
[93] J. Yen and R. Langari, Fuzzy Logic–Intelligence, Control, and Information. Upper
Saddle River, NJ, USA: Prentice-Hall, 1999.
[94] J. S. Wang and C. S. G. Lee, “Self-adaptive neurofuzzy inference systems for classification
applications,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 6, pp. 790–802, 2002.
[95] C.-S. Ouyang, W.-J. Lee, and S.-J. Lee, “A TSK-type neuro-fuzzy network approach
to system modeling problems,” IEEE Transactions on Systems, Man, and Cybernetics
Part B: Cybernetics, vol. 35, no. 4, pp. 751–767, 2005.
[96] C. Cortes and V. Vapnik, “Support-vector network,” Machine Learning, vol. 20, no. 3,
pp. 273–297, 1995.
[97] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regu-
larization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.
[98] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge
University Press, Cambridge, UK, 2004.
[99] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann,
2006.
[100] S. P. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information
Theory, vol. 28, pp. 128–137, 1957.
[101] J. MacQueen, “Some methods for classification and analysis of multivariate observa-
tions,” Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp.
281–297, 1967.
[102] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster
Analysis. John Wiley & Sons, 1990.
[103] Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with
categorical values,” Data Mining and Knowledge Discovery, vol. 2, pp. 283–304, 1998.
[104] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method
for very large databases,” ACM-SIGMOD International Conference on Management of
Data, pp. 103–114, 1996.
[105] S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large
databases,” ACM-SIGMOD International Conference on Management of Data, pp. 73–
84, 1998.
[106] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering
clusters in large spatial databases,” International Conference on Knowledge Discovery
and Data Mining, pp. 226–231, 1996.
[107] A. Hinneburg and D. A. Keim, “An efficient approach to clustering in large multime-
dia databases with noise,” International Conference on Knowledge Discovery and Data
Mining, pp. 58–65, 1998.
[108] S.-J. Lee and C.-S. Ouyang, “A neuro-fuzzy system modeling with self-constructing
rule generation and hybrid SVD-based learning,” IEEE Transactions on Fuzzy Systems,
vol. 11, no. 3, pp. 341–353, 2003.
[109] W. Wang, J. Yang, and R. Muntz, “STING: A statistical information grid approach to
spatial data mining,” International Conference on Very Large Data Bases, pp. 186–195,
1997.
[110] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A muti-resolution clus-
tering approach for very large spatial databases,” International Conference on Very
Large Data Bases, pp. 428–439, 1998.
[111] S. L. Lauritzen, “The EM algorithm for graphical association models with missing data,”
Computational Satistics and Data Analysis, vol. 19, pp. 191–201, 1995.
[112] G. H. Golub and C. F. V. Loan, Matrix Computations. Baltimore, MD, USA: The
Johns Hopkins University Press, 1996.
[113] D. D. Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A new benchmark collection for text
categorization research,” Journal of Machine Learning Research, vol. 5, pp. 361–397,
2004.
[114] “The cadê web directory, http://www.cade.com.br/.”
[115] C. C. Chang and C. J. Lin, “Libsvm: A library for support vector machines,” software
available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm, 2001.
[116] Y. Yang and X. Liu, “A re-examination of text categorization methods,” in ACM SIGIR
Conference, 1999, pp. 42–49.
[117] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining multi-label data,” Data Mining and
Knowledge Discovery Handbook (draft of preliminary accepted chapter), 2009.
[118] Http://web.ist.utl.pt/∼acardoso/datasets/.
[119] K. Nigam, A. K. McCallum, S. Thrun, and T. M. Mitchell, “Learning to classify text
from labeled and unlabeled documents,” Proceedings of 15th National Conference on
Artificial Intelligence, 1998.
[120] J. Pestian, C. Brew, P. Matykiewicz, D. Hovermale, N. Johnson, K. B. Cohen, and
W. Duch, “A shared task involving multi-label classification of clinical free text,”
BioNLP 2007: Biological, translational, and clinical language processing., pp. 97–104,
2007.
[121] N. Ueda and K. Saito, Parametric Mixture Models for Multi-label Text. MIT Press,
Cambridge, MA, 2003.
[122] “http://cse.seu.edu.cn/people/zhangml/resources.htm.”
[123] “http://mulan.sourceforge.net/datasets.html.”
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code