Responsive image
博碩士論文 etd-0725109-190346 詳細資訊
Title page for etd-0725109-190346
論文名稱
Title
雙硫鍵預測之多階段方法
A Multi-phase Approach for Disulfide Bond Prediction
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
67
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2009-07-07
繳交日期
Date of Submission
2009-07-25
關鍵字
Keywords
多階段、預測、雙硫鍵、半胱氨酸
disulfide bond, prediction, multi-phase, cysteine
統計
Statistics
本論文已被瀏覽 5674 次,被下載 1443
The thesis/dissertation has been browsed 5674 times, has been downloaded 1443 times.
中文摘要
雙硫鍵的資訊可以用來幫助蛋白質二級結構、三級結構以及全原子座標的預測。前人的研究大多只著重在半胱氨酸的狀態分類或是雙硫鍵的鍵結預測,其中鍵結預測的部份更藉由加入些許的限制讓整個過程能在現實中達成。在這篇論文中,我們提出了一個多階段的方法來解決這些問題。在半胱氨酸狀態分類的部份,我們的方法除了可以輸出鍵結的數量外,更達到了 90.7 % 的準確率。我們利用這些預測的狀態資訊,在鍵結預測的階段選取適當的配對組合。而為了解決資料分布的不均衡,我們也提出了縮減採樣率(down-sample)的方法以達到降低處理時間的要求。最後,我們使用最大權重配對問題(weighted graph matching)的演算法來完成鍵結的配對方式,達到了63.5% 的準確率。在針對整體系統的效能方面,更獲得了近 48% 的準確率。我們使用了來自 SWISS-PROT 與 PDB 等著名資料庫的資料集作為驗證,並證明效能都優於前人的方法。
Abstract
Disulfide bond information can help the prediction of protein secondary structure, tertiary structure and all-atom coordinates. Most of previous works focused on either state classification or connectivity prediction with some assumption that some constraints were added to make the problem solvable in reality. In this thesis, we propose a multi-phase approach to solve the problem. Our method can export the number of bonds and achieve 90.7% accuracy in the state classification. For the connectivity prediction problem, we use the number of bonds we predict as a base to decide bond pairs. For overcoming the ratio imbalance of samples, we propose a down-sampling method to reducing processing time. Finally, we perform the weighted graph matching algorithm to obtain the bonding pattern, which achieves 63.5% accuracy. We also achieve 48% accuracy for the thorough prediction. Our method is validated by the datasets derived from SWISS-PROT and PDB. The
results are better than the previous works.
目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Amino Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Position-Specific Score Matrix . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Martelli's Method for State Classification . . . . . . . . . . . . 14
2.4.2 Chen's Method for State Classification . . . . . . . . . . . . . 15
2.4.3 Tsai's Method for Connectivity Prediction . . . . . . . . . . . 16
Chapter 3. Algorithms for Disulfide Bond Prediction . . . . . . . . . 18
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 State Classification . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Connectivity Prediction . . . . . . . . . . . . . . . . . . . . . 32
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 38
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Page
4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 State Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Connectivity Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5 Overall System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Chapter 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
參考文獻 References
[1] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller,
and D. J. Lipman, "Gapped blast and psi-blast: a new generation of protein
database search programs," Nucleic Acids Research, Vol. 25, No. 17, pp. 3389-
3402, 1997.
[2] P. Baldi, J. Cheng, and A. Vullo, "Large-scale prediction of disulphide bond
connectivity," Advances in Neural Information Processing Systems 17, Cam-
bridge, MA, pp. 97-104, MIT Press, 2005.
[3] A. Ceroni, P. Frasconi, A. Passerini, and A. Vullo, "Predicting the disul‾de
bonding state of cysteines with combinations of kernel machines," Journal of
VLSI Signal Processing, Vol. 35, pp. 287-295, 2003.
[4] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines,"
2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[5] B.-J. Chen, C.-H. Tsai, C. hsiung Chan, and C.-Y. Kao, "Disulfide connectivity
prediction with 70% accuracy using two-level models," PROTEINS: Structure,
Function, and Genetics, Vol. 64, pp. 246-252, 2006.
[6] G. Chen, H. Deng, Y. Gui, Y. Pan, and X. Wang, "Cysteine separations pro‾les
on protein secondary structure infer disulfide connectivity," Granular Comput-
ing, 2006 IEEE International Conference on, pp. 663-665, May 2006.
[7] Y.-C. Chen and J.-K. Hwang, "Prediction of disulfide connectivity from protein
sequences," PROTEINS: Structure, Function, and Genetics, Vol. 61, pp. 507-
512, 2005.
[8] Y.-C. Chen, Y.-S. Lin, C.-J. Lin, and J.-K. Hwang, "Prediction of the bonding
states of cysteines using the support vector machines based on multiple feature
vectors and cysteine state sequences," PROTEINS: Structure, Function, and
Genetics, Vol. 55, pp. 1036-1042, 2004.
[9] J. Cheng, H. Saigo, and P. Baldi, "Large-scale prediction of disulphide bridges
using kernel methods, two-dimensional recursive neural networks, and weighted
graph matching," PROTEINS: Structure, Function, and Genetics, Vol. 62,
pp. 617-629, 2006.
[10] C.-C. Chuang, C.-Y. Chen, J.-M. Yang, P.-C. Lyu, and J.-K. Hwang, "Relation-
ship between protein structures and disulfide-bonding patterns," PROTEINS:
Structure, Function, and Genetics, Vol. 53, pp. 1-5, 2003.
[11] M. O. DayhoR, R. M. Schwartz, and B. C. Orcutt, "A model of evolutionary
change in proteins," Atlas of Protein Sequence and Structure (M. O. DayhoR,
ed.), pp. 345-352+, Nat. Biomed. Research Foundation, 1978.
[12] R. Ed, wmatch: a C Program to solve maximum weight matching, 1999. Soft-
ware available at http://elib.zib.de/pub/Packages/mathprog/.
[13] P. Fariselli and R. Casadio, "Prediction of disulfide connectivity in proteins,"
Bioinformatics, Vol. 17, No. 10, pp. 957-964, 2001.
[14] P. Fariselli, P. Riccobelli, and R. Casadio, "Role of evolutionary information
in predicting the disulfide-bonding state of cysteine in proteins," PROTEINS:
Structure, Function, and Genetics, Vol. 36, pp. 340-346, 1999.
[15] F. Ferre and P. Clote, "Disulfide connectivity prediction using secondary struc-
ture information and diresidue frequencies," Bioinformatics, Vol. 21, No. 10,
pp. 2336-2346, 2005.
[16] A. Fiser and I. Simon, "Predicting the oxidation state of cysteines by multiple
sequence alignment," Bioinformatics, Vol. 16, No. 3, pp. 251-256, 2000.
[17] P. Frasconi, A. Passerini, and A. Vullo, "A two-stage svm architecture for
predicting the disulfide bonding state of cysteines," Neural Networks for Signal
Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on, pp. 25-34,
2002.
[18] H. N. Gabow, "An efficient implementation of edmonds' algorithm for maxi-
mum matching on graphs," Journal of the ACM, Vol. 23, No. 2, pp. 221-234,
1976.
[19] M. Gribskov, A. D. Mclachlan, and D. Eisenberg, "Profile analysis: detection
of distantly related proteins," Proceedings of the National Academy of Sciences
of the United States of America, Vol. 84, No. 13, pp. 4355-4358, 1987.
[20] P. M. Harrison and M. J. E. Sternberg, "Analysis and classification of disul-
phide connectivity in proteins : The entropic eRect of cross-linkage," Journal
of Molecular Biology, Vol. 244, No. 4, pp. 448-463, 1994.
[21] S. HenikoR and J. G. HenikoR, "Amino acid substitution matrices from protein
blocks," Proceedings of the National Academy of Sciences of the United States
of America, Vol. 89, No. 22, pp. 10915-10919, 1992.
[22] L. Holm and C. Sander, "Mapping the protein universe," Science, Vol. 273,
No. 5275, pp. 595-602, 1996.
[23] D. T. Jones, "Protein secondary structure prediction based on position-specific
scoring matrices," Journal of Molecular Biology, Vol. 292, No. 2, pp. 195-202,
1999.
[24] J. R. G. L., A. P. Shilton, M. M. Parker, and M. Palaniswami, "Prediction
of cystine connectivity using svm," Bioinformation, Vol. 1, No. 2, pp. 69-74,
2005.
[25] H.-L. Liu and S.-C. Chen, "Prediction of disulfide connectivity in proteins with
support vector machine," Journal of the Chinese Institute of Chemical Engi-
neers, Vol. 38, No. 1, pp. 63-70, 2007.
[26] C.-H. Lu, Y.-C. Chen, C.-S. Yu, and J.-K. Hwang, "Predicting disulfide con-
nectivity patterns," PROTEINS: Structure, Function, and Genetics, Vol. 67,
pp. 262-270, 2007.
[27] P. L. Martelli, P. Fariselli, L. Malaguti, and R. Casadio, "Prediction of the
disulfide-bonding state of cysteines in proteins at 88% accuracy," Protein Sci-
ence, Vol. 11, pp. 2735-2739, 2002.
[28] J. Meiler, M. Muller, A. Zeidler, and F. Schmaschke, "Generation and evalu-
ation of dimension-reduced amino acid parameter representations by artificial
neural networks," Journal of Molecular Modeling, Vol. 7, pp. 360-369, 2001.
[29] L. A. Mirny and E. I. Shakhnovich, "How to derive a protein folding potential?
a new approach to an old problem," Journal of Molecular Biology, Vol. 264,
No. 5, pp. 1164-1179, 1996.
[30] S. M.Muskal, S. R.Holbrook, and S.-H. Kim, "Prediction of the disulfide-
bonding state of cysteine in proteins," Protein Engineering, Vol. 3, No. 8,
pp. 667-672, 1990.
[31] M. Mucchielli-Giorgi, S. Hazout, and P. TuRery, "Predicting the disulfide bond-
ing state of cysteines using protein descriptors," PROTEINS: Structure, Func-
tion, and Genetics, Vol. 46, pp. 243-249, 2002.
[32] R. Rubinstein and A. Fiser, "Predicting disulfide bond connectivity in proteins
by correlated mutations analysis," Bioinformatics, Vol. 24, No. 4, pp. 498-504,
2008.
[33] R. Singh, "A review of algorithmic techniques for disulfide-bond determina-
tion," Brief Funct Genomic Proteomic, Vol. 7, No. 2, pp. 157-172, 2008.
[34] C.-H. Tsai, B.-J. Chen, C.-H. Chan, H.-L. Liu, and C.-Y. Kao, "Improving
disulfide connectivity prediction with sequential distance between oxidized cys-
teines," Bioinformatics, Vol. 21, No. 24, pp. 4416-4419, 2005.
[35] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1999.
[36] M. Vincent, A. Passerini, M. Labbe, and P. Frasconi, "A simplified approach to
disulfide connectivity prediction from protein sequences," BMC Bioinformatics,
Vol. 9, No. 1, p. 20, 2008.
[37] A. Vullo and P. Frasconi, "Disulfide connectivity prediction using recursive
neural networks and evolutionary information," Bioinformatics, Vol. 20, No. 5,
pp. 653-659, 2004.
[38] E. Zhao, H.-L. Liu, C.-H. Tsai, H.-K. Tsai, C.-H. Chan, and C.-Y. Kao, "Cys-
teine separations profiles (csp) on protein sequences infer disulfide connectivity,"
Bioinformatics, Vol. 21, No. 8, pp. 1415-1420, 2004.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code