論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available
論文名稱 Title |
利用叢集分類進行多重序列排列 Multiple Sequence Alignment Using the Clustering Method |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
88 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2001-07-10 |
繳交日期 Date of Submission |
2001-08-23 |
關鍵字 Keywords |
生物資訊、同盟線性評分方式、多重序列排列 Affine gap penalty, Bioinformatics, Multiple Sequence Alignment |
||
統計 Statistics |
本論文已被瀏覽 5671 次,被下載 0 次 The thesis/dissertation has been browsed 5671 times, has been downloaded 0 times. |
中文摘要 |
多重序列排列是計算生物學上一個很重要的課題,如幫助預測蛋白質二級結構,演化樹的分析,在多個序列找出共有的功能,結構等。但是多重序列排列的問題複雜度卻令人沮喪。如果用動態程式規畫比較兩個長度皆為n的序列所需的時間是和n的平方成常數正比的;而比較k個長度皆為n的序列所需的時間則和n的k次方成常數正比。 在這篇論文上,我們提出了一個把各別的序列排列做排列的方法,而且也提出了一個根據序列的相似度來做叢集分類的方法。根據我們的實驗結果,在相似的序列上我們的方法所排列出來的結果比 Clustal W 這個程式還好,執行時間更快。 |
Abstract |
The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. Due to its importance, many algorithms have been proposed. With dynamic programming, finding the optimal alignment for a pair of sequences can be done in O(n2) time, where n is the length of the two strings. Unfortunately, for the general optimization problem of aligning k sequences of length n , O(nk) time is required. In this thesis, we shall first propose an efficient group alignment method to perform the alignment between two groups of sequences. Then we shall propose a clustering method to build the tree topology for merging. The clustering method is based on the concept that the two sequences having the longest distance should be split into two clusters. By our experiments, both the alignment quality and required time of our algorithm are better than those of NJ (neighbor joining) algorithm and Clustal W algorithm. |
目次 Table of Contents |
TABLE OF CONTENTS Page LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Multiple Sequence Alignment . . . . . . . . . . . . . . . . . 4 2.1 Multiple sequence Alignment Problem . . . . . . . . . . . . . . . . . 4 2.2 Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 A±ne Gap Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Complexity of Multiple Sequence Alignment . . . . . . . . . . . . . . 14 Chapter 3. Previous Algorithms . . . . . . . . . . . . . . . . . . . . . . 15 3.1 Tree Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Star Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Progressive Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4. The Clustering Method for Multiple Sequence Alignment 22 Chapter 5. Experiment Results and Performance Analysis . . . . . . 35 Chapter 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Appendixes Page A. Blosum Score Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 47 B. PAM Score Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 C. Gonnet Score Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 68 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 |
參考文獻 References |
[1] S. Altschul and B. W. Erickson., Optimal sequence alignment using a±ne gap costs.," Journal of Molecular Biology, Vol. 48, No. 4, pp. 603{616, 1986. [2] S. F. Altschul, Gap costs for multiple sequence alignment," Journal of Theo- retical Biology, Vol. 138, pp. 297{309, 1989. [3] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool," Journal of Molecular Biology, Vol. 215, pp. 403{410, 1990. [4] S. F. Altschul and D. J. Lipman, Trees, stars and multiple sequence align- ment," SIAM Journal on Applied Mathematics, Vol. 49, No. 1, pp. 197{209, 1989. [5] D. J. Bacon and W. F. Anderson, Multiple sequence alignment," Journal of Molecular Biology, Vol. 191, pp. 153{161, 1986. [6] V. Bafna, E. L. Lawler, and P. Pevzner, Approximation algorithms for multiple sequence alignment," In 5th Ann. Symp. On Pattern Combinatorial Matching, Vol. 807, pp. 43{53, 1994. [7] J. G. Barton and M. J. E. Sternberg, A strategy for rapid multiple alignment of protein sequences," Journal of Molecular Biology, Vol. 198, pp. 327{337, 1987. [8] S. A. Benner, M. A. Cohen, and G. H. Gonnet, Empirical and structural mod- els for insertions and deletions in the divergent evolution of proteins," Journal of Molecular Biology, Vol. 229, pp. 1065{1082, 1993. [9] M. P. Berger and P. J. Munson, A novel randomized iterative strategy for aligning multiple protein sequences," Computer Applications in the Biosciences, Vol. 7, pp. 479{484, 1991. [10] H. Carrillo and D. J. Lipman, The multiple sequence alignment problem in biology," SIAM Journal on Applied Mathematics, Vol. 48, pp. 1073{1082, 1988. [11] S. C. Chan, A. K. C. Wong, and D. K. Y. Chiu, A survey of multiple sequence comparison methods," Bulletin of Mathematical Biology, Vol. 54, pp. 563{598, 1992. [12] M. O. DayhoR., Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC, 1978. [13] D. F. Feng and R. F. Doolittle, Progressive sequence alignment as a prereq- uisite to correct phylogenetic trees," Journal of Molecular Evolution, Vol. 25, pp. 351{360, 1987. [14] L. R. Foulds and R. L. Graham, The steiner problem in phylogeny is np- complete," Proceedings of the National Academy of Sciences of the United States of America, Vol. 3, pp. 43{49, 1982. [15] O. Gotoh, Optimal alignment between groups of sequences and its application to multiple sequence alignment," Computer Applications in the Biosciences, Vol. 9, pp. 361{370, 1993. [16] S. K. Gupta, J. D. Kececioglu, and A. A. SchaRer, Improving the practical space and time e±ciency of the shortest-paths approach to sum-of-pairs mul- tiple sequence alignment," Journal of Computational Biology, Vol. 2, No. 3, pp. 459{472, 1995. [17] D. Gus‾eld, E±cient methods for multiple sequence alignment with guaran- teed error bounds," Bulletin of Mathematical Biology, Vol. 30, pp. 141{154, 1993. [18] M. Hirosawa, M. Hoshida, M. Ishikawa, and T. Toya, Mascot: Multiple align- ment system for protein sequence based on tree-way dynamic programming," Computer Applications in the Biosciences, Vol. 9, pp. 161{167, 1993. [19] M. Hirosawa, Y. Totoki, M. Hoshida, and M. Ishikawa, Comprehensive study on iterative algorithms of multiple sequence alignment," Computer Applications in the Biosciences, Vol. 11, No. 1, pp. 13{18, 1995. [20] T. Ikeda and H. Imai, Fast A* algorithms for multiple sequence alignment," Proceedings of the Genome Informatics Workshop 1994, pp. 90{99, 1994. [21] T. Jiang, E. L. Lawler, and L. Wang, Aligning sequences via an evolutionary tree: Complexity and approximation," In Proceedings of the Symposium on the Theoretical Aspects of Computer Science, pp. 760{769, 1994. [22] T. Jiang and L. Wang., On the complexity of multiple sequence alignment.," Journal of Computational Biology, Vol. 1, No. 4, pp. 337{348, 1994. [23] T. Jiang, L. Wang, and E. L. Lawler, Approximation algorithms for tree align- ment with a given phylogeny," Algorithmica, Vol. 16, pp. 302{315, 1996. [24] J. Kececioglu, The maximum weight trace problem in multiple sequence align- ment," In 4th Ann. Symp. On Pattern Combinatorial Matching, Springer Ver- lag Lecture notes in Computer Science, Vol. 684, pp. 106{119, 1993. [25] C. Korostensky and G. H. Gonnet, Near optimal multiple sequence alignments using a traveling salesman problem approach," SPIRE 1999, pp. 105{114, 1999. [26] A. M. Lesk, M. Levitt, and C. Chothia, Alignment of the amino acid sequences of distantly related proteins using variable gap penalties," Protein Engineering, Vol. 1, pp. 77{78, 1986. [27] D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, A tool for multiple sequence alignment," Proceedings of the National Academy of Sciences of the United States of America, Vol. 86, pp. 4412{4415, 1989. [28] P. K. Mehta, J. Heringa, and P. Argos, A fast and simple approach to pre- diction of protein secondary structure from multiply aligned sequences with accuracy above 70 percent," Protein Science, Vol. 4, pp. 2517{2525, 1995. [29] D. Naor and D. L. Brutlag, On near-optimal alignments of biological se- quences," Journal of Computational Biology, Vol. 1, pp. 349{366, 1994. [30] S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequences of two proteins," Journal of Molecular Biology, Vol. 48, pp. 443{453, 1970. [31] P. A. Pevsner, Multiple alignment, communication cost and graph matching," SIAM Journal on Applied Mathematics, Vol. 52, pp. 1763{1779, 1992. [32] B. Rost and G. Sander, Prediction of protein secondary structure at better than 70pp. 584{599, 1993. [33] N. Saitou and M. Nei, The neighbor-joining method: a new method for recon- structing phylogenetic trees," Molecular Biology and Evolution, Vol. 4, pp. 406{ 425, 1987. [34] A. A. Salarnov and V. V. Solovyev, Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments," Journal of Molecular Biology, Vol. 247, pp. 11{15, 1995. [35] D. SankoR, Minimal mutation trees of sequences," SIAM Journal on Applied Mathematics, Vol. 28, pp. 443{453, 1975. [36] R. M. Schwartz and M. O. DayhoR., Matrices for detecting distant relationships. In M. DayhoR, editor, Atlas of Protein Sequence and Structure, volume 5, pages 353-358. National Biomedical Research Foundation, Washington, DC, 1979. [37] J. Stoye, S. W. Perrey, and A. W. M. Dress, Improving the divide-and-conquer approach to sum-of-pairs multiple sequence alignment," Applied Mathematics Letters, Vol. 10, No. 2, pp. 67{73, 1997. [38] J. Stoye, S. W. Perrey, and A. W. M. Dress, The number of standard and of eRective multiple alignments," Applied Mathematics Letters, Vol. 11, No. 4, pp. 43{49, 1998. [39] W. R. Taylor, Multiple sequence alignment by a pairwise algorithm," Com- puter Applications in the Biosciences, Vol. 3, pp. 81{87, 1987. [40] J. D. Thompson, D. G. Higgins, and T. J. Gibson, Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-speci‾c gap penalties and weight matrix choice," Nucleic acids res, Vol. 22, No. 22, pp. 4673{4680, 1994. [41] U. Tonges, S. W. Perrey, J. Stoye, and A. W. M. Dress, A general method for fast multiple sequence alignment," Gene, Vol. 172, No. 1, pp. 33{41, 1996. [42] M. Vingron and M. S. Waterman, Sequence alignment and penalty choice. re- view of concepts, case studies and implications," Journal of Molecular Biology, Vol. 235, pp. 1{12, 1994. [43] L. Wang and D. Gus‾eld, Improved approximation algorithms for tree align- ment," Proc. 7th Symp. on Combinatorial Pattern Matching, pp. 220{233, 1996. [44] M. Waterman and M. Perlwitz, Line geometries for sequence comparison," Bulletin of Mathematical Biology, Vol. 46, pp. 567{577, 1984. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外均不公開 not available 開放時間 Available: 校內 Campus:永不公開 not available 校外 Off-campus:永不公開 not available 您的 IP(校外) 位址是 18.191.240.243 論文開放下載的時間是 校外不公開 Your IP address is 18.191.240.243 This thesis will be available to you on Indicate off-campus access is not available. |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |