Responsive image
博碩士論文 etd-0827106-160923 詳細資訊
Title page for etd-0827106-160923
論文名稱
Title
蛋白質結構相似度之有效判別演算法
An Efficient Algorithm for Determining Protein Structure Similarity
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
49
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2006-07-06
繳交日期
Date of Submission
2006-08-27
關鍵字
Keywords
蛋白質、B曲線、相似度、結構
Protein Structure, SVM, RMSD, B-Spline
統計
Statistics
本論文已被瀏覽 5733 次,被下載 2109
The thesis/dissertation has been browsed 5733 times, has been downloaded 2109 times.
中文摘要
在生物體中有著各式各樣的蛋白質,每種蛋白質都有它特有的功能,參與生物體內各種變化。所以,了解各種蛋白質的功能是目前相當重要的研究項目。而蛋白質的結構與功能有著密不可分的關係,所以目前有許多的研究都在預測蛋白質的3級結構。在蛋白質結構相似度方面,傳統上用來比較蛋白質相似度的方法是使用distance RMSD (Root Mean Square Deviation) 作為標準。現在,我們提出另外一種蛋白質相似度比較的方法,使用B-Spline Curve fitting 方式來比較蛋白質之間的相似度。為了驗證我們的方法,我們隨機選擇在CATH (蛋白質結構分類資料庫)中的家族來做實驗。實驗結果得出,我們的方法的確優於distance RMSD。另外,我們也使用SVM(Support Vector Machine)來幫助我們獲得更好的分類結果。
Abstract
Protein is a fundamental material of life. There are many kinds of proteins in the body. If one of them malfunctions, it will cause physical problems. Therefore, many scientists try to analyze the functions of proteins. It is believed that the protein structure determines its function. The more similar the structures are, the more similar their functions are. Therefore, the prediction and
comparison of protein structures are important topics in
bioinformatics. Typically, distance RMSD (Root Mean Square
Deviation) is a method used by most scientists to measure the distance between two structures. In this thesis, we propose a new algorithm to compare two protein structures, which is based on the comparison of curves in the space. To test and verify our method, we randomly choose some families in the CATH database and try to identify them. Experimental results show that our method outperforms
RMSD. Furthermore, we also use the SVM (Support Vector Machine) tool to help us to obtain the better classification.
目次 Table of Contents
ABSTRACT . . . . . 0
Chapter 1. Introduction . . . . 1
Chapter 2. Previous Works . . . . . . . 3
Chapter 3. The B-Spline Curve . . . . . . . . 8
Chapter 4. Support Vector Machine . . . . . . 13
Chapter 5. Structure Comparison with the B-Spline Curve ... 18
Chapter 6. Experimental Results . . . 22
Chapter 7. Conclusion . . . 36
BIBLIOGRAPHY . . . 37
參考文獻 References
[1] T. Akutsu and H. Arimura, "On approximation algorithms for local multiple alignment," Proceedings of the Fourth Annual International Conference on
Computational Molecular Biology, Tokyo, Japan, pp. 1-7, 2000.
[2] D. J. Bacon and W. F. Anderson, "Multiple sequence alignment," Journal of Molecular Biology, Vol. 191, pp. 153-161, 1986.
[3] V. Bafna, E. L. Lawler, and P. Pevzner, "Approximation algorithms for multiple sequence alignment," Proc. of 5th Ann. Symp. On Pattern Combinatorial Matching, Vol. 807, pp. 43-53, 1994.
[4] A. Ben-Hur, D. Horn, H.T.Siegelmann, and V. Vapnik, "Support vector clustering," Machine Learning, Vol. 2, pp. 125-137, 2001.
[5] P. E. Bezier, "Example of an existing system in the motor industry: The unsure
system," Royal Society Publishing, Vol. A321, pp. 207-218, 1971.
[6] T. L. Blundell, B. L. Sibanda, M. J. E. Sternberg, and J. M. Thornton,
"Knowledge-based prediction of protein structures and the design of novel
molecules.," Nature, Vol. 326, pp. 347-352, 1987.
[7] C. D. Boor, "On calculating with B-splines," Journal of Approximation Theory,
Vol. 6, pp. 50-62, 1972.
[8] H. Carrillo and D. J. Lipman, "The multiple sequence alignment problem in
biology," Journal of Applied Mathematics, Vol. 48, pp. 1073-1082, 1988.
[9] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines,
2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[10] C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning,
Vol. 20, No. 3, pp. 273-297, 1995.
[11] M. G. Cox, "The numerical evaluation of b-splines," Journal of the Institute of
Mathematics and Its Applications, Vol. 10, p. 134, 1972.
[12] M. O. Dayho®, W. C. Barker, and L. Hunt, "Establishing homologies in protein
sequences," Methods Enzymol, Vol. 91, pp. 524-545, 1983.
[13] G. Farin, Curves and Surfaces for Computer Aided Geometric Design : A Prac-
tical Guide. Boston: Academic Press, second ed., 1990.
[14] D. F. Feng, M. S. Johnson, and R. F. Doolittle, "Aligning amino acid se-
quences: comparison of commonly used methods," Journal of Molecular Evo-
lution, Vol. 21, pp. 112-125, 1985.
[15] C. F. Gerald and P. O. Wheatley, Applied Numerical Mathematics. Addison
Wesley Publishing, fourth ed., 1990.
[16] H. Hagen, Curves and Surfaces Design. SIAM Activity Group on Geometric
Design, 1992.
[17] A. Harrison, F. Pearl, R. Mott, J. Thornton, and C. Orengo, "Quantifying the
similarities within fold space.," Journal of Molecular Biology, Vol. 323, No. 5,
pp. 909-26, Nov 8 2002.
[18] A. Harrison, F. Pearl, R. Mott, J. Thornton, and C. Orengo, "Recognizing the
fold of a protein structure.," Bioinformatics, Vol. 19, pp. 1748-1759, 2003.
[19] S. Heniko® and J. G. Heniko®, "Amino acid substitution matrices from protein
blocks," Proceedings of the National Academy of Sciences, Vol. 89, pp. 10915-
10919, 1992.
[20] W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pat-
tern recognition of hydrogen-bonded and geometrical features," Biopolymers,
Vol. 22, pp. 2577-2637, 1983.
[21] P. Lancaster and K. Salkauskas, Curves and Surfaces Fitting. Landon: Ed-
mundsbury Press., third ed., 1990.
[22] R. C. T. Lee, "Computational biology." http://www.csie.ncnu.edu.tw/, De-
partment of Computer Science and Information Engineering, National Chi-Nan
University, Taiwan, 2001.
[23] R. Lewin, "When does homology mean something else?," Science, Vol. 237,
p. 1570, 1987.
[24] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, "SCOP: a struc-
tural classi‾cation of proteins database for the investigation of sequences and
structures," Journal of Molecular Biology, Vol. 247, pp. 536 - 540, 1995.
[25] S. Needleman and C. Wunsch, "A general method applicable to the search for
similarities in the amino acid sequence of two proteins," Journal of Molecular
Biology, Vol. 48, pp. 442-453, 1970.
[26] K. Nishikawa and T. Ooi, "Comparison of homologous tertiary structures of
proteins," Journal of Theoretical Biology, Vol. 43, pp. 351-374, 1974.
[27] C. A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells, and J. M.
Thornton, "CATH- a hierarchic classi‾cation of protein domain structures,"
Structure, Vol. 5, No. 8, pp. 1093 - 1108, 1997.
[28] S. T. Rao and M. G. Rossmann, "Comparison of super-secondary structures in
proteins," Journal of Molecular Biology, Vol. 76, pp. 241-256, 1973.
[29] R. A. Riesenfeld, Berstein-Bezier Methods for the ComputerAided Design of
Free-FormCurves and Surfaces. Ph.D. Thesis, Syracuse University, 1973.
[30] I. Schoenberg, "Contribution to the problem of approximation of equidistant
data by analytic functions.," Quarterly of Applied Mathematics, Vol. 4, pp. 45-
99; 112-141, 1946.
[31] R. M. Schwartz and M. O. Dayho®, Matrices for detecting distant relationships.
In M. Dayho®, editor, Atlas of Protein Sequence and Structure, Volume 5,
pages 353-358. National Biomedical Research Foundation, Washington, DC,
USA, 1979.
[32] T. B. Sebastian, P. N. Kelin, and B. Kimia, "Alignment-based recognition of
shape outlines.," Proceedings of 4th International Workshop on Visual Form,
Capri, Italy, pp. 606-618, 2001.
[33] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology.
PWS Publishing Company, Boston, second ed., 1997.
[34] W. R. Taylor and C. A. Orengo, "Protein structure alignment.," Journal of
Molecular Biology, Vol. 208, pp. 1-22, 1989.
[35] U. Tonges, S. W. Perrey, J. Stoye, and A. W. M. Dress, "A general method for
fast multiple sequence alignment," Gene, Vol. 172, No. 1, pp. 33-41, 1996.
[36] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[37] M. Waterman, Introduction to Computational Biology: Maps, Sequences and
Genomes. Chapman and Hall, London: CRC Press, 1995.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code