Responsive image
博碩士論文 etd-0823101-124209 詳細資訊
Title page for etd-0823101-124209
論文名稱
Title
利用叢集分類進行多重序列排列
Multiple Sequence Alignment Using the Clustering Method
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
88
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2001-07-10
繳交日期
Date of Submission
2001-08-23
關鍵字
Keywords
生物資訊、同盟線性評分方式、多重序列排列
Affine gap penalty, Bioinformatics, Multiple Sequence Alignment
統計
Statistics
本論文已被瀏覽 5671 次,被下載 0
The thesis/dissertation has been browsed 5671 times, has been downloaded 0 times.
中文摘要
多重序列排列是計算生物學上一個很重要的課題,如幫助預測蛋白質二級結構,演化樹的分析,在多個序列找出共有的功能,結構等。但是多重序列排列的問題複雜度卻令人沮喪。如果用動態程式規畫比較兩個長度皆為n的序列所需的時間是和n的平方成常數正比的;而比較k個長度皆為n的序列所需的時間則和n的k次方成常數正比。
在這篇論文上,我們提出了一個把各別的序列排列做排列的方法,而且也提出了一個根據序列的相似度來做叢集分類的方法。根據我們的實驗結果,在相似的序列上我們的方法所排列出來的結果比 Clustal W 這個程式還好,執行時間更快。
Abstract
The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. Due to its importance, many algorithms have been proposed. With dynamic programming, finding the optimal alignment for a pair of sequences can be done in O(n2) time, where n is the length of the two strings. Unfortunately, for the general optimization problem of aligning k sequences of length n , O(nk) time is required.
In this thesis, we shall first propose an efficient group alignment method to perform the alignment between two groups of sequences. Then we shall propose a clustering method to build the tree topology for merging. The clustering method is based on the concept that the two sequences having the longest distance should be split into two clusters. By our experiments, both the alignment quality and required time of our algorithm are better than those of NJ (neighbor joining) algorithm and Clustal W algorithm.
目次 Table of Contents
TABLE OF CONTENTS
Page
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Multiple Sequence Alignment . . . . . . . . . . . . . . . . . 4
2.1 Multiple sequence Alignment Problem . . . . . . . . . . . . . . . . . 4
2.2 Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 A±ne Gap Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Complexity of Multiple Sequence Alignment . . . . . . . . . . . . . . 14
Chapter 3. Previous Algorithms . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Tree Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Star Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Progressive Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 4. The Clustering Method for Multiple Sequence Alignment 22
Chapter 5. Experiment Results and Performance Analysis . . . . . . 35
Chapter 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Appendixes
Page
A. Blosum Score Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 47
B. PAM Score Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
C. Gonnet Score Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 68
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
參考文獻 References
[1] S. Altschul and B. W. Erickson., Optimal sequence alignment using a±ne gap
costs.," Journal of Molecular Biology, Vol. 48, No. 4, pp. 603{616, 1986.
[2] S. F. Altschul, Gap costs for multiple sequence alignment," Journal of Theo-
retical Biology, Vol. 138, pp. 297{309, 1989.
[3] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local
alignment search tool," Journal of Molecular Biology, Vol. 215, pp. 403{410,
1990.
[4] S. F. Altschul and D. J. Lipman, Trees, stars and multiple sequence align-
ment," SIAM Journal on Applied Mathematics, Vol. 49, No. 1, pp. 197{209,
1989.
[5] D. J. Bacon and W. F. Anderson, Multiple sequence alignment," Journal of
Molecular Biology, Vol. 191, pp. 153{161, 1986.
[6] V. Bafna, E. L. Lawler, and P. Pevzner, Approximation algorithms for multiple
sequence alignment," In 5th Ann. Symp. On Pattern Combinatorial Matching,
Vol. 807, pp. 43{53, 1994.
[7] J. G. Barton and M. J. E. Sternberg, A strategy for rapid multiple alignment
of protein sequences," Journal of Molecular Biology, Vol. 198, pp. 327{337,
1987.
[8] S. A. Benner, M. A. Cohen, and G. H. Gonnet, Empirical and structural mod-
els for insertions and deletions in the divergent evolution of proteins," Journal
of Molecular Biology, Vol. 229, pp. 1065{1082, 1993.
[9] M. P. Berger and P. J. Munson, A novel randomized iterative strategy for
aligning multiple protein sequences," Computer Applications in the Biosciences,
Vol. 7, pp. 479{484, 1991.
[10] H. Carrillo and D. J. Lipman, The multiple sequence alignment problem in
biology," SIAM Journal on Applied Mathematics, Vol. 48, pp. 1073{1082, 1988.
[11] S. C. Chan, A. K. C. Wong, and D. K. Y. Chiu, A survey of multiple sequence
comparison methods," Bulletin of Mathematical Biology, Vol. 54, pp. 563{598,
1992.
[12] M. O. DayhoR., Atlas of Protein Sequence and Structure. National Biomedical
Research Foundation, Washington, DC, 1978.
[13] D. F. Feng and R. F. Doolittle, Progressive sequence alignment as a prereq-
uisite to correct phylogenetic trees," Journal of Molecular Evolution, Vol. 25,
pp. 351{360, 1987.
[14] L. R. Foulds and R. L. Graham, The steiner problem in phylogeny is np-
complete," Proceedings of the National Academy of Sciences of the United States
of America, Vol. 3, pp. 43{49, 1982.
[15] O. Gotoh, Optimal alignment between groups of sequences and its application
to multiple sequence alignment," Computer Applications in the Biosciences,
Vol. 9, pp. 361{370, 1993.
[16] S. K. Gupta, J. D. Kececioglu, and A. A. SchaRer, Improving the practical
space and time e±ciency of the shortest-paths approach to sum-of-pairs mul-
tiple sequence alignment," Journal of Computational Biology, Vol. 2, No. 3,
pp. 459{472, 1995.
[17] D. Gus‾eld, E±cient methods for multiple sequence alignment with guaran-
teed error bounds," Bulletin of Mathematical Biology, Vol. 30, pp. 141{154,
1993.
[18] M. Hirosawa, M. Hoshida, M. Ishikawa, and T. Toya, Mascot: Multiple align-
ment system for protein sequence based on tree-way dynamic programming,"
Computer Applications in the Biosciences, Vol. 9, pp. 161{167, 1993.
[19] M. Hirosawa, Y. Totoki, M. Hoshida, and M. Ishikawa, Comprehensive study
on iterative algorithms of multiple sequence alignment," Computer Applications
in the Biosciences, Vol. 11, No. 1, pp. 13{18, 1995.
[20] T. Ikeda and H. Imai, Fast A* algorithms for multiple sequence alignment,"
Proceedings of the Genome Informatics Workshop 1994, pp. 90{99, 1994.
[21] T. Jiang, E. L. Lawler, and L. Wang, Aligning sequences via an evolutionary
tree: Complexity and approximation," In Proceedings of the Symposium on the
Theoretical Aspects of Computer Science, pp. 760{769, 1994.
[22] T. Jiang and L. Wang., On the complexity of multiple sequence alignment.,"
Journal of Computational Biology, Vol. 1, No. 4, pp. 337{348, 1994.
[23] T. Jiang, L. Wang, and E. L. Lawler, Approximation algorithms for tree align-
ment with a given phylogeny," Algorithmica, Vol. 16, pp. 302{315, 1996.
[24] J. Kececioglu, The maximum weight trace problem in multiple sequence align-
ment," In 4th Ann. Symp. On Pattern Combinatorial Matching, Springer Ver-
lag Lecture notes in Computer Science, Vol. 684, pp. 106{119, 1993.
[25] C. Korostensky and G. H. Gonnet, Near optimal multiple sequence alignments
using a traveling salesman problem approach," SPIRE 1999, pp. 105{114, 1999.
[26] A. M. Lesk, M. Levitt, and C. Chothia, Alignment of the amino acid sequences
of distantly related proteins using variable gap penalties," Protein Engineering,
Vol. 1, pp. 77{78, 1986.
[27] D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, A tool for multiple sequence
alignment," Proceedings of the National Academy of Sciences of the United
States of America, Vol. 86, pp. 4412{4415, 1989.
[28] P. K. Mehta, J. Heringa, and P. Argos, A fast and simple approach to pre-
diction of protein secondary structure from multiply aligned sequences with
accuracy above 70 percent," Protein Science, Vol. 4, pp. 2517{2525, 1995.
[29] D. Naor and D. L. Brutlag, On near-optimal alignments of biological se-
quences," Journal of Computational Biology, Vol. 1, pp. 349{366, 1994.
[30] S. B. Needleman and C. D. Wunsch, A general method applicable to the
search for similarities in the amino acid sequences of two proteins," Journal of
Molecular Biology, Vol. 48, pp. 443{453, 1970.
[31] P. A. Pevsner, Multiple alignment, communication cost and graph matching,"
SIAM Journal on Applied Mathematics, Vol. 52, pp. 1763{1779, 1992.
[32] B. Rost and G. Sander, Prediction of protein secondary structure at better
than 70pp. 584{599, 1993.
[33] N. Saitou and M. Nei, The neighbor-joining method: a new method for recon-
structing phylogenetic trees," Molecular Biology and Evolution, Vol. 4, pp. 406{
425, 1987.
[34] A. A. Salarnov and V. V. Solovyev, Prediction of protein secondary structure
by combining nearest-neighbor algorithms and multiple sequence alignments,"
Journal of Molecular Biology, Vol. 247, pp. 11{15, 1995.
[35] D. SankoR, Minimal mutation trees of sequences," SIAM Journal on Applied
Mathematics, Vol. 28, pp. 443{453, 1975.
[36] R. M. Schwartz and M. O. DayhoR., Matrices for detecting distant relationships.
In M. DayhoR, editor, Atlas of Protein Sequence and Structure, volume 5, pages
353-358. National Biomedical Research Foundation, Washington, DC, 1979.
[37] J. Stoye, S. W. Perrey, and A. W. M. Dress, Improving the divide-and-conquer
approach to sum-of-pairs multiple sequence alignment," Applied Mathematics
Letters, Vol. 10, No. 2, pp. 67{73, 1997.
[38] J. Stoye, S. W. Perrey, and A. W. M. Dress, The number of standard and
of eRective multiple alignments," Applied Mathematics Letters, Vol. 11, No. 4,
pp. 43{49, 1998.
[39] W. R. Taylor, Multiple sequence alignment by a pairwise algorithm," Com-
puter Applications in the Biosciences, Vol. 3, pp. 81{87, 1987.
[40] J. D. Thompson, D. G. Higgins, and T. J. Gibson, Clustal w: improving
the sensitivity of progressive multiple sequence alignment through sequence
weighting, positions-speci‾c gap penalties and weight matrix choice," Nucleic
acids res, Vol. 22, No. 22, pp. 4673{4680, 1994.
[41] U. Tonges, S. W. Perrey, J. Stoye, and A. W. M. Dress, A general method for
fast multiple sequence alignment," Gene, Vol. 172, No. 1, pp. 33{41, 1996.
[42] M. Vingron and M. S. Waterman, Sequence alignment and penalty choice. re-
view of concepts, case studies and implications," Journal of Molecular Biology,
Vol. 235, pp. 1{12, 1994.
[43] L. Wang and D. Gus‾eld, Improved approximation algorithms for tree align-
ment," Proc. 7th Symp. on Combinatorial Pattern Matching, pp. 220{233, 1996.
[44] M. Waterman and M. Perlwitz, Line geometries for sequence comparison,"
Bulletin of Mathematical Biology, Vol. 46, pp. 567{577, 1984.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.191.240.243
論文開放下載的時間是 校外不公開

Your IP address is 18.191.240.243
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code