國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,蛋白質與核糖核酸序列的機器學習分析方法,Machine Learning Approaches for the Protein and RNA Sequence Analysis

論文名稱 Title	蛋白質與核糖核酸序列的機器學習分析方法 Machine Learning Approaches for the Protein and RNA Sequence Analysis
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	102 學年度第 2 學期 The spring semester of Academic Year 102	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	136
研究生 Author	何秋誼 Chiou-Yi Hor
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	洪宗貝 Tzung-Pei Hong
口試委員 Advisory Committee	吳邦一, 薛佑玲, 李宗南, 林耀鈴, 楊佳寧 Bang-Ye Wu; Y. L. Shiue; Chungnan Lee; Yaw-Ling Lin; Chia-Ning Yang
口試日期 Date of Exam	2014-07-08	繳交日期 Date of Submission	2014-08-09
關鍵字 Keywords	生物資訊、機器學習、RNA二級結構、必要蛋白質、支持向量機、特徵選擇 feature selection, RNA secondary structure, essential protein, support vector machine, bioinformatics, machine learning
統計 Statistics	本論文已被瀏覽 5682 次，被下載 332 次 The thesis/dissertation has been browsed 5682 times, has been downloaded 332 times.

中文摘要
將機器學習方法整合於生物資訊的研究有多年的歷史了。給定一條序列，此序列可以是由氨基酸或核苷酸所構成。如果序列是由氨基酸所構成，我們稱此序列為蛋白質序列。如果序列是由核苷酸所構成，我們稱此序列為核糖核酸序列。利用機器學習技術，可以在使用者不進行實驗的情況下，告訴他們這條序列的資訊，例如其折疊狀態為何或者是屬於那一類型蛋白質。在這篇論文中，我們主要將研究重點集中在核糖核酸二級結構預測以及必要蛋白質預測。在核糖核酸二級結構預測問題中，我們主要目的是要去預測核糖核酸序列的折疊狀況。傳統的方式在進行預測時，通常是採用熱力學或序列比對的方法。在此論文，我們採用由其他學者已經開發的工具為基礎預測軟體，並利用已經訓練好的支持向量機當成選擇器，選擇到底應該用那一基礎軟體進行預測，整體的二級結構預測準確率較高。為了讓支持向量機可以選到較佳的基礎軟體，我們提出了漸進式的特徵選擇與分類器合法方法。實驗結果顯示，以這樣的方法所得到的二級結構預測正確率比我們採用的任一基礎軟體的正確率都有顯著的提升。在必要蛋白質預測問題中，除了利用蛋白質序列資訊外，我們也採用了蛋白質交互作用網路以及其他蛋白質特性。為了能夠找到重要的特徵，我們提出了可以同時考慮到分類效能與特徵子集大小的循序向後特徵選擇方法。時驗結果顯示，採用我們的方法找到的特徵子集所訓練出來的支持向量機，其對蛋白質必要性的預測能力，較之前學者所得到的結果有顯著提升。
Abstract
The machine learning approach has been adopted in bioinformatics for several decades. Given a sequence, which may be composed of nucleotides or amino acids, the problem is to ask the learning machine about the status of the sequence without performing experiments. In this dissertation, we focus on two problems of recent interest, which are the prediction of the RNA secondary structure, and the prediction of the protein essentiality. An RNA secondary structure is the fold of a nucleotide sequence. Conventional methods usually address the structure prediction problem from the thermodynamics or comparative perspectives. Instead of developing our prediction tool from scratch, we take advantage of the state-of-the-art software tools. We adopt a tool preference choice approach to select a good software tool for prediction, in hope that the performance is better than any base prediction software. Our tool selector is built by incorporating various RNA sequence features and several SVM classifiers. To facilitate classifier combination and important feature identification, we propose an incremental feature selection method for classifier ensemble construction. The experimental results show that the achieved prediction accuracy is significantly better than any base predictor. For the essential protein prediction problem, we also adopt various features, which include sequence, protein, topology, and other properties. To identify features relevant to the protein essentiality, we propose a modified sequential backward feature selection method. The method takes both feature sizes and prediction performance into consideration. The experimental results show that the achieved performance is significantly better than those of previous works.

目次 Table of Contents
1 Introduction 1 1.1 RNA Secondary Structure Prediction 1 1.2 Essential Protein Prediction 3 1.3 Summary and Organization 6 2 Prerequisite Knowledge 7 2.1 Position Speciﬁc Scoring Matrix (PSSM) 7 2.2 Support Vector Machine 8 2.3 Hierarchical Clustering 9 2.4 Cross-Validation Methods 10 2.4.1 k-fold Cross-Validation 11 2.4.2 Bootstrap Cross-Validation 11 2.5 Information-Theoretic Feature Selection Methods 12 2.5.1 Basic Information Theory and Feature Relevance 12 2.5.2 Minimal Redundancy and Maximal Relevance (mRMR) 15 2.5.3 Minimal Relevant Redundancy (mRR) 15 2.5.4 Conditional Mutual Information Maximization (CMIM) 16 2.6 Classiﬁer Combination Methods 17 2.6.1 Majority Vote 17 2.6.2 Behavior Knowledge Space 18 2.6.3 Adaboost 19 2.7 Signiﬁcance Tests 19 2.8 Performance Evaluation Methods 21 2.9 RNA Secondary Structure Prediction Softwares 21 2.9.1 pknotsRG 21 2.9.2 RNAStructure 22 2.9.3 NUPACK 22 3 Feature Extraction 23 3.1 Composition-Related Features 23 3.2 RNA Features 26 3.2.1 Transformed Protein Sequence Features 26 3.2.2 Sequence Features 28 3.2.3 Other Features 30 3.3 Protein Features 38 3.3.1 Sequence Features 38 3.3.2 Protein Property Features 42 3.3.3 Topology Features 43 3.3.4 Other Features 52 4 RNA Secondary Structure Prediction 54 4.1 Data sets and Features 55 4.2 An Incremental Feature Selection Method 57 4.3 Experimental Results 61 4.3.1 Experiments for the Classiﬁcation Accuracy 61 4.3.2 Signiﬁcance Tests for the Base-pair Accuracy and Feature Analysis 64 4.4 Discussion 68 5 Essential Protein Prediction 70 5.1 Data sets and Features 71 5.2 The Feature Selection Method 74 5.3 Experimental Results 76 5.3.1 The Experimental Procedure 76 5.3.2 Backward Feature Selection and mRMR/CMIM Feature Ranking 77 5.3.3 Bootstrap Cross-Validations 79 5.3.4 Performance Comparison and Signiﬁcance Tests 84 5.3.5 ROC Analysis 88 5.3.6 Top Percentage Analysis 91 5.3.7 Conﬁdence Intervals of Performance Measures and Informational Odds 96 5.3.8 Comparison with Other Feature Selection Methods 99 5.4 Discussion 105 6 Conclusions 109

參考文獻 References
[1] M. L. Acencio and N. Lemke, “Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information,” BMC Bioinformatics, Vol. 10, No. 1, pp. 290–307, 2009. [2] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipmanl, “Basic local alignment search tool,” Journal of Molecular Biology, Vol. 215, No. 3, pp. 403–410, 1990. [3] S. F. Altschul, T. L. Madden, A. A. Schaﬀer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, Vol. 25, No. 17, pp. 3389–3402, 1997. [4] M. Andronescu, V. Bereg, H. H. Hoos, and A. Condon, “RNA STRAND: The RNA secondary structure and statistical analysis database,” BMC Bioinformatics, Vol. 9, No. 1, p. 340, 2008. [5] A. L. Barabasi and Z. N. Oltvai, “Network biology: understanding the cell’s functional organization,” Nature Reviews Genetics, Vol. 5, No. 2, pp. 101–113, 2004. [6] D. Barber, Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012. [7] N. Begum, M. A. Fattah, and F. Ren, “Automatic text summarization using support vector machine,” International Journal of Innovative Computing, Information and Control, Vol. 5, No. 7, pp. 1987–1996, 2009. [8] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, “Support vector clustering,” Machine Learning, Vol. 2, pp. 125–137, 2001. [9] D. Bielinska-Waz, T. Clark, P. Waz, W. Nowak, and A. Nandy, “2D-dynamic representation of DNA sequences,” Chemical Physics Letters, Vol. 442, pp. 140–144, 2007. [10] D. Bielinska-Waz, W. Nowak, P. Waz, A. Nandy, and T. Clark, “Distribution moments of 2D-graphs as descriptors of DNA sequences,” Chemical Physics Letters, Vol. 443, pp. 408–413, 2007. [11] L. Breiman, J. Friedman, C. J. Stone, and R. Olshen, Classiﬁcation and Regression Trees. Chapman and Hall/CRC, 1984. [12] K. M. Cadigan, U. Grossniklaus, and W. J. Gehring, “Functional redundancy: The respective roles of the two sloppy paired genes in drosophila segmentation,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 91, No. 14, pp. 6324–6328, 1994. [13] L. Cai, R. L. Malmberg, and Y. Wu, “Stochastic modeling of RNA pseudoknotted structures: a grammatical approach,” Vol. 19, pp. 66–73, 2003. [14] Z. Cao, B. Liao, and R. Li, “A group of 3D graphical representation of DNA sequences based on dual nucleotides,” International Journal of Quantum Chemistry, Vol. 108, pp. 1485–1490, 2008. [15] C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines,” 2001. Software available at http://www.csie.ntu.edu.tw/ ∼ cjlin/libsvm. [16] Y. I. Chang, C. C. Wu, J. R. Chen, and Y. H. Jeng, “Mining sequence motifs from protein databases based on a bit pattern approach,” International Journal of Innovative Computing, Information and Control, Vol. 8, No. 1B, pp. 647–657, 2012. [17] R. C. Chen and S. P. Chen, “Intrusion detection using a hybrid support vector machine based on entropy and TF-IDF,” International Journal of Innovative Computing, Information and Control, Vol. 4, No. 2, pp. 413–424, 2008. [18] R. Chi and K. Ding, “Novel 4D numerical representation of DNA sequences,” Chemical Physics Letters, Vol. 407, pp. 63–67, 2005. [19] C. S. Chin and M. P. Samanta, “Global snapshot of a protein interaction network percolation based approach,” Bioinformatics, Vol. 19, pp. 2413–2419, 2003. [20] C. H. Chin, Prediction of Essential Proteins and Functional Modules from Protein-Protein Interaction Networks. Phd dissertation, National Central University, Chung-Li, Taiwan, 2010. [21] A. E. Clatworthy, E. Pierson1, and D. T. Hung, “Targeting virulence: a new paradigm for antimicrobial therapy,” Nature chemical biology, Vol. 3, pp. 541–548, 2007. [22] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, Vol. 20, No. 3, pp. 273–297, 1995. [23] G. Csardi and T. Nepusz, “The igraph software package for complex network research,”InterJournal, Vol. Complex Systems, p. 1695, 2006. [24] L. M. Cullen and G. M. Arndt, “Genome-wide screening for gene function using RNAi in mammalian cells,” Immunology and Cell Biology, Vol. 83, No. 3, pp. 217–223, 2003. [25] J. Demsar, “Statistical comparisons of classiﬁers over multiple data sets,” Journal of Machine Learning Research, Vol. 7, pp. 1–30, 2006. [26] C. H. Q. Ding and I. Dubchak, “Multi-class protein fold recognition using support vector machines and neural networks,” Bioinformatics, Vol. 17, pp. 349–358, 2001. [27] R. M. Dirks and N. A. Pierce, “An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots.” Wiley InterScience (www.interscience.wiley.com), Wiley Periodicals, Inc., 2004. [28] I. Dubchak, I. Muchnik, C. Mayor, I. Dralyuk, and S.-H. Kim, “Recognition of a protein fold in the context of the scop classiﬁcation,” Proteins: Structure, Function, and Genetics, Vol. 35, No. 4, pp. 401–407, 1999. [29] J. T. Eﬁrd, S. Lea, A. Toland, and C. J. Phillips, “Informational odds ratio: A useful measure of epidemiologic association in environment exposure studies,” Environmental Health Insights, Vol. 6, pp. 17–25, 2012. [30] J. Eickholt and J. Cheng, “Predicting protein residueresidue contacts using deep networks and boosting,” Bioinformatics, Vol. 28, No. 23, p. 30663072, 2012. [31] F. Fleuret, “Fast binary feature selection with conditional mutual information,” Journal of Machine Learning Research, Vol. 5, pp. 1531–1555, 2004. [32] W. J. Fu, R. J. Carroll, and S. Wang, “Estimating misclassiﬁcation error with small samples via bootstrap cross-validation,” Bioinformatics, Vol. 9, pp. 1979–1986, 2005. [33] X. Guo, M. Randic, and S. Basak, “A novel 2-D graphical representation of DNA sequences of low degeneracys,” Chemical Physics Letters, Vol. 350, pp. 106–112, 2001. [34] R. Gupta, A. Mittal, and K. Singh, “A time-series-based feature extraction approach for prediction of protein structural class,” EURASIP Journal on Bioinformatics and Systems Biology, 2008. [35] A. M. Gustafson, E. S. Snitkin, S. C. Parker, C. DeLisi, and S. Kasif, “Towards the identiﬁcation of essential genes using targeted genome sequencing and comparative analysis,”BMC Genomics, Vol. 7, 2006. [36] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature Extraction. Springer, 2006. [37] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. second ed., 2009. [38] X. He and J. Zhang, “Why do hubs tend to be essential in protein networks?,” PLoS Genet, Vol. 2, 2006. [39] C.-Y. Hor, C.-B. Yang, C.-H. Chang, C.-T. Tseng, and H.-H. Chen, “A tool preference choice method for rna secondary structure prediction by svm with statistical tests,” Evolutionary Bioinformatics, Vol. 9, pp. 163–184, 04 2013. [40] M.-K. Hu, “Visual pattern recognition by moment invariants,” IRE Transactions on Information Theory, Vol. 8, No. 2, pp. 179–187, 1962. [41] P. Hu, S. C. Janga, M. Babu, J. J. Diaz-Mejia, G. Butland, W. Yang, O. Pogoutse, X. Guo, S. Phanse, P. Wong, S. Chandran, C. Christopoulos, A. Nazarians-Armavil, N. K. Nasseri, G. Musso, M. Ali, N. Nazemof, V. Eroukova, A. Golshani, A. Paccanaro, J. F. Greenblatt, G. Moreno-Hagelsieb, and A. Emili, “Global functional atlas of escherichia coli encompassing previously uncharacterized proteins,” PLoS BIOLOGY, Vol. 7, pp. 929–D947, 2009. [42] C.-D. Huang, C.-T. Lin, and N. R. Pal, “Hierarchical learning architecture with automatic feature selection for multiclass protein fold classiﬁcation,” IEEE Transaction on Nanobioscience, Vol. 2, No. 4, pp. 221–232, 2003. [43] G. Huang, B. Liao, Y. Li, and Z. Liu, “H curves: a novel 2D graphical representation for DNA sequences,” Chemical Physics Letters, Vol. 462, pp. 129–132, 2008. [44] Y. C. Hwang, C. C. Lin, J. Y. Chang, H. Mori, H. F. Juan, and H. C. Huang, “Predicting essential genes based on network and sequence analysis,” Molecular BioSystems, Vol. 5, No. 12, pp. 1672–1678, 2009. [45] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, “A comprehensive two-hybrid analysis to explore the yeast protein interactome,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 98, No. 8, pp. 4569–4574, 2001. [46] H. Jeong, S. P. Mason, A.-L. Barabasi, and Z. N. Oltvai, “Lethality and centrality in protein networks,” Nature, Vol. 411, pp. 41–42, 2001. [47] Y. Z. Jun Wang, “Characterization and similarity analysis of DNA sequences based on mutually direct-complementary triplets,” Chemical Physics Letters, Vol. 425, pp. 324–328, 2006. [48] D. Koller and N. Friedman, Probabilistic Graphical Models. The MIT Press, 2012. [49] L. Kuncheva, “”Fuzzy” versus ”nonfuzzy” in combining classiﬁers designed by boosting,”IEEE Transactions on Fuzzy Systems, Vol. 11, No. 6, pp. 729– 741, 2003. [50] L. Kuncheva, “Measures of diversity in classiﬁer ensembles and their relationship with the ensemble accuracy,” Machine Learning, Vol. 51, No. 2, pp. 181–207, 2003. [51] J. Kyte and R. F. Doolittle, “A simple method for displaying the hydropathic character of a protein,” Journal of Molecular Biology, Vol. 157, pp. 105–132, 1982. [52] P. D. Lena, K. Nagata, and P. Baldi, “Deep architectures for protein contact map prediction,” Bioinformatics, Vol. 28, No. 19, p. 24492457, 2012. [53] L. Leydesdorﬀ and L. Vaughan, “Co-occurrence matrices and their applications in information science: Extending ACA to the web environment,” Journal of the American Society for Information Science and Technology, Vol. 57, No. 12, pp. 1616–1628, 2006. [54] B. Liao, R. Li, and W. Zhu, “On the similarity of DNA primary sequences based on 5-D representation,” Journal of Mathematical Chemistry, Vol. 42, pp. 47–57, 2007. [55] B. Liao and T. M. Wang, “3-D graphical representation of DNA sequences and their numerical characterization,” Journal of Molecular Structure, Vol. 681, pp. 209–212, 2004. [56] B. Liao and T. M. Wang, “Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping trinucleotides of nucleotide bases,” Journal of Chemical Information and Computer Science, Vol. 44, pp. 1666–1670, 2004. [57] C. Y. Lin, C. B. Yang, C. Y. Hor, and K. S. Huang, “Disulﬁde bonding state prediction with svm based on protein types,” Bio-Inspired Computing: Theories and Applications, pp. 1436–1442, 2010. [58] C. Y. Lin, C. H. Chin, H. H. Wu, S. H. Chen, C. W. Ho, and M. T. Ko, “Hubba: hub objects analyzer - framework of interactome hubs identiﬁcation for network biology,”Nucleic Acids Research, Vol. 36, pp. W438–W443, 2008. [59] X. Liu, Q. Dai, Z. Xiu, and T. Wang, “PNN-curve: a new 2D graphical representation of DNA sequences and its application,” Journal of Theoretical Biology, Vol. 243, pp. 555–561, 2006. [60] R. B. Lyngsφ and C. N. S. Pedersen, “Pseudoknots in RNA secondary structures,” Research in Computational Molecular Biology, pp. 201–209, 2000. [61] D. J. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. [62] D. Martens, J. Huysmans, R. Setiono, J. Vanthienen1, and B. Baesens, “Rule extraction from support vector machines: An overview of issues and application in credit scoring,”Studies in Computational Intelligence, Vol. 80, pp. 33–63, 2008. [63] D. Mathews, M. Disney, J. Childs, S. Schroeder, M. Zuker, and D. Turner, “Incorporating chemical modiﬁcation constraints into a dynamic programming algorithm for prediction of RNA secondary structure,” Proceedings of the National Academy of Sciences of the United States of America, 2004. [64] H. Matsui, K. Sato, and Y. Sakakibara, “Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures,” Vol. 21, No. 11, pp. 2611–2617, 2005. [65] J. S. McCaskill, “The equilibrium partition function and base pair binding probabilities for RNA secondary structure,” Biopolymers, Vol. 29, pp. 1105–1119, 1990. [66] T. M. Mitchell, Machine Learning. New York, NY, USA: McGraw-Hill, Inc., 1 ed., 1997. [67] R. Moddemeijer, “On estimation of entropy and mutual information of continuous distributions,” Signal Processing, Vol. 16, No. 3, pp. 233–246, 1989. [68] M. Nirenberg, P. Leder, M. Bernﬁeld, R. Brimacombe, J. Trupin, F. Rottman, and C. O’Neal, “RNA codewords and protein synthesis, VII. on the general nature of the RNA code,” Vol. 53, pp. 1161–1168, 1965. [69] J. Peden, Analysis of Codon Usage. Phd dissertation, University of Nottingham, UK, 1999. [70] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundance,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226–1238, 2005. [71] D. B. Percival and A. T. Walden, Wavelet Methods for Time Series Analysis. Cambridge University Press, 2000. [72] J. Pizarro, E. Guerrero, and P. L. Galindo, “Multiple comparison procedures applied to model selection,” Neurocomputing, Vol. 4, p. 155V173, 2002. [73] N. Prˇzulj, D. Wigle, and I. Jurisica, “Functional topology in a network of protein interactions,” Bioinformatics, Vol. 20, pp. 340–348, 1998. [74] X. Q. Qi, J. Wen, and Z. H. Qi, “New 3D graphical representation of DNA sequence based on dual nucleotides,” Journal of Theoretical Biology, Vol. 249, pp. 681–690, 2007. [75] Z. H. Qi and T. R. Fan, “PN-curve: a 3D graphical representation of DNA sequences and their numerical characterization,” Chemical Physics Letters, Vol. 442, pp. 434–440, 2007. [76] J. R. Quinlan, C4.5: programs for machine learning. Morgan Kaufmann Publisherss, 1993. [77] R Development Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2008. ISBN 3-900051-07-0. [78] M. Randic, “Graphical representations of DNA as 2-D map,” Chemical Physics Letters, Vol. 386, pp. 468–471, 2004. [79] M. Randic, V. Marjan, N. Lers, and D. Plavsic, “Novel 2-D graphical representation of DNA sequences and their numerical characterization,” Chemical Physics Letters, Vol. 368, pp. 1–6, 2003. [80] M. Randic, M. Vracko, N. Lers, and D. Plavsic, “Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation,” Chemical Physics Letters, Vol. 371, pp. 202–207, 2003. [81] S. Raudys and F. Roli, “The behavior knowledge space fusion method: Analysis of generalization error and strategies for performance improvement,” Multiple Classiﬁer Systems, Vol. 2709 of Lecture Notes in Computer Science, pp. 55–64, Springer Berlin Heidelberg, 2003. [82] J. Reeder and R. Giegerich, “Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics,” BMC Bioinformatics, Vol. 5, pp. 104–116, 2004. [83] J. Reeder, P. Steﬀen, and R. Giegerich, “pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows,” Nucleic Acids Research, Vol. 35, pp. 1–5, 2007. [84] E. Rivas and S. R. Eddy, “A dynamic programming algorithm for RNA structure prediction including pseudoknots,” Journal of Molecular Biology, Vol. 285, pp. 2053–2068, 1999. [85] T. Roemer, B. Jiang, J. Davison, T. Ketela, K. Veillette, A. Breton, F. Tandia, A. Linteau, S. Sillaots, C. Marta, N. Martel, S. Veronneau, S. Lemieux, S. Kauﬀman, J. Becker, R. Storms, C. Boone, and H. Bussey, “Large-scale essential gene identiﬁcation in candida albicans and applications to antifungal drug discovery,” Molecular Microbiology, Vol. 50, pp. 167–181, 2003. [86] D. Ruta and B. Gabrys, “A theoretical analysis of the limits of majority voting errors for multiple classiﬁer systems,” Pattern Analysis & Applications, Vol. 5, No. 4, pp. 333–350, 2002. [87] L. Salwinski, C. S. Miller, A. J. Smith, F. K. Pettit, J. U. Bowie, and D. Eisenberg, “The database of interacting proteins: 2004 update,” Nucleic Acids Research, Vol. 32, pp. D449–D451, 2004. [88] Q. S. She, H. Y. Su, L. Dong, and J. Chu, “Support vector machine with adaptive parameters in image coding,” International Journal of Innovative Computing, Information and Control, Vol. 4, No. 2, pp. 359–367, 2008. [89] T. Sing, O. Sander, N. Beerenwinkel, and T. Lengauer, “ROCR: visualizing classiﬁer performance in R,” Bioinformatics, Vol. 21, pp. 3940–3941, 2005. [90] J. Song and H. Tang, “A new 2-D graphical representation of DNA sequences and their numerical characterization,” Journal of Biochemical and Biophysical Methods, Vol. 63, pp. 228–239, 2005. [91] J. M. Sotoca and F. Pla, “Supervised feature selection by clustering using conditional mutual information-based distances,” Pattern Recognition, Vol. 43, pp. 2068–2081, 2010. [92] F. Tahi, “A fast algorithm for RNA secondary structure prediction including pseudoknots,” Proceedings of the Third IEEE Symposium on BioInformatics and BioEngineering, Bethesda, Maryland, USA, pp. 11–17, 2003. [93] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Addison-Wesley, 2005. [94] van Batenburg FH, G. AP, and P. CW, “Pseudobase: structural information on RNA pseudoknots,” Nucleic Acids Research, Vol. 29, No. 1, pp. 194–195, 2001. [95] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995. [96] G. Wang, J. Ma, and S. Yan, “IGF-bagging: Information gain based feature selection for bagging,” International Journal of Innovative Computing, Information and Control, Vol. 7, No. 11, pp. 6247–6259, 2011. [97] I. H. Witten and E. Frank, Data Mining:Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000. [98] S. Wuchty and P. F. Stadle, “Centers of complex networks,” Journal of Theoretical Biology, pp. 45–53, 2003. [99] H. Yu, D. Greenbaum, H. X. Lu, X. Zhu, and M. Gerstein, “Genomic analysis of essentiality within protein networks,” Trends in Genetics, Vol. 6, pp. 227 – 231, 2004. [100] H. Yu, P. M. Kim, E. Sprecher, V. Trifonov, and M. Gerstein, “The importance of bottlenecks in protein networks: Correlation with gene essentiality and expression dynamics,”PLoS Computational Biology, Vol. 3, pp. 713–720, 2007. [101] J.-F. Yu, X. Sun, and J.-H. Wang, “Tn curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications,” Journal of Theoretical Biology, Vol. 261, pp. 459–468, 2009. [102] J. H. Zar, Biostatistical Analysis. Prentice Hall, 2009. [103] R. Zhang and Y. Lin, “DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes,” Nucleic Acids Research, Vol. 37, p. D455D458, 2008. [104] J. Zhong, J. Wang, W. Peng1, Z. Zhang, and Y. Pan, “Prediction of essential proteins based on gene expression programming,” BMC Genomics, Vol. 14, 2013. [105] E. Zotenko, J. Mestre, D. P. O’Leary, and T. M. Przytycka, “Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality,” PLOS Computational Biology, Vol. 4, 2008. [106] M. Zuker, “Mfold web server for nucleic acid folding and hybridization prediction,” Nucleic Acids Research, Vol. 31, No. 13, pp. 3406–3415, 2003.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0709114-104627.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS