Responsive image
博碩士論文 etd-0730110-230815 詳細資訊
Title page for etd-0730110-230815
論文名稱
Title
GAGS : 一個全新微陣列基因選取演算法用於基因表現分類
GAGS : A Novel Microarray Gene Selection Algorithm for Gene Expression Classification
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2010-07-01
繳交日期
Date of Submission
2010-07-30
關鍵字
Keywords
微陣列資料分析、基因演算法、基因選取
Feature selection, Gene expression data analysis, Genetic algorithm
統計
Statistics
本論文已被瀏覽 5716 次,被下載 946
The thesis/dissertation has been browsed 5716 times, has been downloaded 946 times.
中文摘要
找出造成癌症的原因一直是許多學者的目標,而微陣列技術是一個可以用於幫助找出癌症成因的技術。透過分類的方法對微陣列資料進行分析,可以找到與癌症可能相關的基因。然而,微陣列所包含的資料量相當龐大,故在分類之前皆需要透過基因選取的方法除去大量的雜訊基因,才能夠進行分類。在此,我們提出一個新的演算法來解決微陣列資料分類的問題,這個演算法主要包含了基因排序、基因篩選、基因減量以及利用基因演算法做分類樣板學習以尋找對於不同癌症各自重要的基因,並且將分別找到的基因做各自的分類,最後再將分類結果作融合以提升分類的正確率。

在實驗結果方面,我們對四組微陣列資料以LOOCV的實驗方式進行實驗,比較了其他21種方法的正確率,我們提出的方法在其中三組資料中贏過了其他最近文獻中的21種方法,並且其正確率達到100%。另外,從原先的四組資料外,再加上五組微陣列資料,共九組資料以 50% VS 50% 的實驗方式進行實驗。再比較了先前文獻中的五個不同方法所得到的結果,我們提出的方法在八組微陣列資料中得到最高的正確率。
Abstract
In this thesis, we have proposed a novel microarray gene selection algorithm consisting of five processes for solving gene expression classification problem. A normalization process is first used to remove the differences among different scales of genes. Second, an efficient gene ranking process is proposed to filter out the unrelated genes. Then, the genetic algorithm is adopted to find the informative gene subsets for each class. For each class, these informative gene subsets are adopted to classify the testing dataset separately. Finally, the separated classification results are fused to one final classification result.

In the first experiment, 4 microarray datasets are used to verify the performance of the proposed algorithm. The experiment is conducted using the leave-one-out-cross-validation (LOOCV) resampling method. We compared the proposed algorithm with twenty one existing methods. The proposed algorithm obtains three wins in four datasets, and the accuracies of three datasets all reach 100%. In the second experiment, 9 microarray datasets are used to verify the proposed algorithm. The experiment is conducted using 50% VS 50% resampling method. Our proposed algorithm obtains eight wins among nine datasets for all competing methods.
目次 Table of Contents
Chapter 1. Introduction 1
Chapter 2. Background materials and related work 3
2.1. Background materials 3
2.1.1. Genetic algorithm 3
2.1.2. Support vector machine4
2.1.3. K-nearest-neighbor 4
2.2. Related work 5
Chapter 3. Proposed method 9
3.1. Normalization process 10
3.2. Gene ranking process 13
3.3. Classification pattern learning process 19
3.3.1. Classification pattern generation phase 23
3.3.2. Evaluation phase 24
3.3.3. Crossover phase 29
3.3.4. Mutation phase 30
3.3.5. Survival phase 32
3.4. Classification process 33
3.5. Fusion and verification process 34
Chapter 4. Experiments 36
4.1. Experimental environment 36
4.2. Comparisons of the classification results 37
4.3. Comparisons of the gene selection methods 41
4.4. Influences of different fitness functions in the gene selection process 43
Chapter 5. Conclusions 45
References 46
參考文獻 References
[1] Saeys, Y., Inza I. and Larrañaga, P. (2007) A review of feature selection techniques in bioinformatics. Bioinformatics, 23, 2507-2517.
[2] Shamos, M.I. and Hoey, D. (1975) Closest-point problems. Proc. 16th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 151-162.
[3] Holland, J.H. (1975) Adaptation in natural and artificial system. University of Michigan Press.
[4] Cortes, C. and Vapnik, V., (1995) Support-vector networks. Machine Learning, 20, 273-297.
[5] Chang, C.C. and Lin, C.J. (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm .
[6] Yeo, G. and Poggio, T. (2001) Mutliclass classification of SRBCT tumors. Technical Report AI Memo 2001-018 CBCL Memo 206, MIT Press.
[7] Su, A.I., Welsh, J.B., Sapinoso, L.M., Kern, S.G., Dimitrov, P., Lapp, H., Schultz, P.G., Powell, S.M., Moskaluk, C.A., Frierson, H.F., Jr and Hampton, G.M. (2001) Molecular classification of human carcinomas by use of gene expression signatures. Cancer Research, 61, 7388-7393.
[8] Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S. and Golub, T.R. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences USA, 1998, 15149-15154.
[9] Yeang, C.H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, M.R., Angelo, M., Lander, E., Mesirov, J. and Golub, T.R. (2001) Molecular classification of multiple tumor types. Bioinformatics, 17, S316-S322.
[10] Lee, Y. and Lee, C.K. (2003) Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics, 19, 1132-1139.
[11] Friedman, J.H. (1994) Flexible metric nearest neighbor classification. Technical report Department of Statistics, Stanford University.
[12] Duda, R.O., Hart, P.E. and Stork, D.G. (2000) Pattern Classification 2nd edition. A Wiley-Interscience Publication.
[13] Hong, J.H. and Cho, S.B. (2006) The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med., 36, 43–58.
[14] Chen, Y.H. and Zhao, Y. (2008) A novel ensemble of classifiers for microarray data classification. Applied Soft Computing, 8, 1664-1669.
[15] Yu, L. and Liu, H. (2004) Redundancy based feature selection for microarray data. Proceeding of ACM Special Interest Group Discovery and Data Mining ’04, Research Track Poster.
[16] Wang, Z.Y., Palade, V. and Xu, Y. (2006) Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis. Proceedings of the 2006 International Symposium on Evolving Fuzzy Systems (IEEE), 241–246.
[17] Tan, A.C., Naiman, D.Q., Xu, L., Winslow, R.L. and Geman, D. (2005) Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 21, 3896–3904.
[18] Xiong, H. and Chen, X. (2006) Kernel-based distance metric learning for microarray data classification. BMC Bioinformatics, 7, 299.
[19] Kwok, Y.K. and Ahmad, I. (1997) Efficient scheduling of arbitrary task graphs to multiprocessors using a parallel genetic algorithm. Parallel and Distributed Computing, 47, no.1, 58-77.
[20] Java-ML : Java Machine Leraning. Software available at http://java-ml.sourceforge.net/ .
[21] Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gassenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D. and Lander E.S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.
[22] Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R. and Korsmeyer, S.J. (2001) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30, 41-47.
[23] Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S. and Golub, T.R. (2002) Prediction of central nervous system embryonal tumor outcome based on gene expression. Nature, 415, 436-442.
[24] Petricoin, E.F., Ardekanl, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C. and Liotta, L.A. (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572-577.
[25] Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D. and Levine, A.J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences USA, 1999, 96, 6745-6750.
[26] Gordon, G.J., Jenson, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J. and Bueno, R. (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelima. Cancer Research, 62, 4936-4967.
[27] Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D'Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R. and Sellers, W.R. (2004) Gene expression correlations of clinical prostate cancer behavior. Cancer Cell, 1, 203-209.
[28] Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C. and Golub, T.R. (2002) Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nature Medicine, 8, 68-74.
[29] St. Jude Research (http://www.stjuderesearch.org/data/).
[30] Wu, J. S., (2010) MARS: A microarray attributes reduction scheme for microarray cancer classification problem, CIIS lab technique report, Department of Computer Science Engineering, National Sun Yet-Sen University.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code