論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
蛋白質結構相似度之有效判別演算法 An Efficient Algorithm for Determining Protein Structure Similarity |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
49 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2006-07-06 |
繳交日期 Date of Submission |
2006-08-27 |
關鍵字 Keywords |
蛋白質、B曲線、相似度、結構 Protein Structure, SVM, RMSD, B-Spline |
||
統計 Statistics |
本論文已被瀏覽 5733 次,被下載 2109 次 The thesis/dissertation has been browsed 5733 times, has been downloaded 2109 times. |
中文摘要 |
在生物體中有著各式各樣的蛋白質,每種蛋白質都有它特有的功能,參與生物體內各種變化。所以,了解各種蛋白質的功能是目前相當重要的研究項目。而蛋白質的結構與功能有著密不可分的關係,所以目前有許多的研究都在預測蛋白質的3級結構。在蛋白質結構相似度方面,傳統上用來比較蛋白質相似度的方法是使用distance RMSD (Root Mean Square Deviation) 作為標準。現在,我們提出另外一種蛋白質相似度比較的方法,使用B-Spline Curve fitting 方式來比較蛋白質之間的相似度。為了驗證我們的方法,我們隨機選擇在CATH (蛋白質結構分類資料庫)中的家族來做實驗。實驗結果得出,我們的方法的確優於distance RMSD。另外,我們也使用SVM(Support Vector Machine)來幫助我們獲得更好的分類結果。 |
Abstract |
Protein is a fundamental material of life. There are many kinds of proteins in the body. If one of them malfunctions, it will cause physical problems. Therefore, many scientists try to analyze the functions of proteins. It is believed that the protein structure determines its function. The more similar the structures are, the more similar their functions are. Therefore, the prediction and comparison of protein structures are important topics in bioinformatics. Typically, distance RMSD (Root Mean Square Deviation) is a method used by most scientists to measure the distance between two structures. In this thesis, we propose a new algorithm to compare two protein structures, which is based on the comparison of curves in the space. To test and verify our method, we randomly choose some families in the CATH database and try to identify them. Experimental results show that our method outperforms RMSD. Furthermore, we also use the SVM (Support Vector Machine) tool to help us to obtain the better classification. |
目次 Table of Contents |
ABSTRACT . . . . . 0 Chapter 1. Introduction . . . . 1 Chapter 2. Previous Works . . . . . . . 3 Chapter 3. The B-Spline Curve . . . . . . . . 8 Chapter 4. Support Vector Machine . . . . . . 13 Chapter 5. Structure Comparison with the B-Spline Curve ... 18 Chapter 6. Experimental Results . . . 22 Chapter 7. Conclusion . . . 36 BIBLIOGRAPHY . . . 37 |
參考文獻 References |
[1] T. Akutsu and H. Arimura, "On approximation algorithms for local multiple alignment," Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, Tokyo, Japan, pp. 1-7, 2000. [2] D. J. Bacon and W. F. Anderson, "Multiple sequence alignment," Journal of Molecular Biology, Vol. 191, pp. 153-161, 1986. [3] V. Bafna, E. L. Lawler, and P. Pevzner, "Approximation algorithms for multiple sequence alignment," Proc. of 5th Ann. Symp. On Pattern Combinatorial Matching, Vol. 807, pp. 43-53, 1994. [4] A. Ben-Hur, D. Horn, H.T.Siegelmann, and V. Vapnik, "Support vector clustering," Machine Learning, Vol. 2, pp. 125-137, 2001. [5] P. E. Bezier, "Example of an existing system in the motor industry: The unsure system," Royal Society Publishing, Vol. A321, pp. 207-218, 1971. [6] T. L. Blundell, B. L. Sibanda, M. J. E. Sternberg, and J. M. Thornton, "Knowledge-based prediction of protein structures and the design of novel molecules.," Nature, Vol. 326, pp. 347-352, 1987. [7] C. D. Boor, "On calculating with B-splines," Journal of Approximation Theory, Vol. 6, pp. 50-62, 1972. [8] H. Carrillo and D. J. Lipman, "The multiple sequence alignment problem in biology," Journal of Applied Mathematics, Vol. 48, pp. 1073-1082, 1988. [9] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. [10] C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995. [11] M. G. Cox, "The numerical evaluation of b-splines," Journal of the Institute of Mathematics and Its Applications, Vol. 10, p. 134, 1972. [12] M. O. Dayho®, W. C. Barker, and L. Hunt, "Establishing homologies in protein sequences," Methods Enzymol, Vol. 91, pp. 524-545, 1983. [13] G. Farin, Curves and Surfaces for Computer Aided Geometric Design : A Prac- tical Guide. Boston: Academic Press, second ed., 1990. [14] D. F. Feng, M. S. Johnson, and R. F. Doolittle, "Aligning amino acid se- quences: comparison of commonly used methods," Journal of Molecular Evo- lution, Vol. 21, pp. 112-125, 1985. [15] C. F. Gerald and P. O. Wheatley, Applied Numerical Mathematics. Addison Wesley Publishing, fourth ed., 1990. [16] H. Hagen, Curves and Surfaces Design. SIAM Activity Group on Geometric Design, 1992. [17] A. Harrison, F. Pearl, R. Mott, J. Thornton, and C. Orengo, "Quantifying the similarities within fold space.," Journal of Molecular Biology, Vol. 323, No. 5, pp. 909-26, Nov 8 2002. [18] A. Harrison, F. Pearl, R. Mott, J. Thornton, and C. Orengo, "Recognizing the fold of a protein structure.," Bioinformatics, Vol. 19, pp. 1748-1759, 2003. [19] S. Heniko® and J. G. Heniko®, "Amino acid substitution matrices from protein blocks," Proceedings of the National Academy of Sciences, Vol. 89, pp. 10915- 10919, 1992. [20] W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pat- tern recognition of hydrogen-bonded and geometrical features," Biopolymers, Vol. 22, pp. 2577-2637, 1983. [21] P. Lancaster and K. Salkauskas, Curves and Surfaces Fitting. Landon: Ed- mundsbury Press., third ed., 1990. [22] R. C. T. Lee, "Computational biology." http://www.csie.ncnu.edu.tw/, De- partment of Computer Science and Information Engineering, National Chi-Nan University, Taiwan, 2001. [23] R. Lewin, "When does homology mean something else?," Science, Vol. 237, p. 1570, 1987. [24] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, "SCOP: a struc- tural classi‾cation of proteins database for the investigation of sequences and structures," Journal of Molecular Biology, Vol. 247, pp. 536 - 540, 1995. [25] S. Needleman and C. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of Molecular Biology, Vol. 48, pp. 442-453, 1970. [26] K. Nishikawa and T. Ooi, "Comparison of homologous tertiary structures of proteins," Journal of Theoretical Biology, Vol. 43, pp. 351-374, 1974. [27] C. A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells, and J. M. Thornton, "CATH- a hierarchic classi‾cation of protein domain structures," Structure, Vol. 5, No. 8, pp. 1093 - 1108, 1997. [28] S. T. Rao and M. G. Rossmann, "Comparison of super-secondary structures in proteins," Journal of Molecular Biology, Vol. 76, pp. 241-256, 1973. [29] R. A. Riesenfeld, Berstein-Bezier Methods for the ComputerAided Design of Free-FormCurves and Surfaces. Ph.D. Thesis, Syracuse University, 1973. [30] I. Schoenberg, "Contribution to the problem of approximation of equidistant data by analytic functions.," Quarterly of Applied Mathematics, Vol. 4, pp. 45- 99; 112-141, 1946. [31] R. M. Schwartz and M. O. Dayho®, Matrices for detecting distant relationships. In M. Dayho®, editor, Atlas of Protein Sequence and Structure, Volume 5, pages 353-358. National Biomedical Research Foundation, Washington, DC, USA, 1979. [32] T. B. Sebastian, P. N. Kelin, and B. Kimia, "Alignment-based recognition of shape outlines.," Proceedings of 4th International Workshop on Visual Form, Capri, Italy, pp. 606-618, 2001. [33] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, second ed., 1997. [34] W. R. Taylor and C. A. Orengo, "Protein structure alignment.," Journal of Molecular Biology, Vol. 208, pp. 1-22, 1989. [35] U. Tonges, S. W. Perrey, J. Stoye, and A. W. M. Dress, "A general method for fast multiple sequence alignment," Gene, Vol. 172, No. 1, pp. 33-41, 1996. [36] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995. [37] M. Waterman, Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London: CRC Press, 1995. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |