Responsive image
博碩士論文 etd-0727104-171533 詳細資訊
Title page for etd-0727104-171533
論文名稱
Title
單一核苷酸多型性序列區塊預測及其標籤篩選演算法
SNP Haplotype Block Inference and Tag Selection Algorithm
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
50
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2004-07-09
繳交日期
Date of Submission
2004-07-27
關鍵字
Keywords
單一核苷酸多型性、變異量、單套
diversity, haplotype, SNP
統計
Statistics
本論文已被瀏覽 5685 次,被下載 2280
The thesis/dissertation has been browsed 5685 times, has been downloaded 2280 times.
中文摘要
單一核苷酸多型性是由於在人體基因中單一個核苷酸改變所造成的。這些單一的核苷酸變異大約每一千個鹼基對就會有一個這樣的現象發生。在這些核苷酸位置上只會有二種可能的核苷酸會顯現出來。由於單一核苷酸多型性序列資料的變異有限加上其資料量十分的豐富,因此很適合拿來當做人類疾病特徵的標誌。
在近期的研究結果中曾指出人類基因體中有區塊狀的結構產生,而且在每個區塊中的變異是有限的。因此,在每個區塊中我們可以利用少部份的單一核苷酸多型性資料來表示這個區塊的變異情形。而這少部份的單一核苷酸多型性資料即被稱之為單一核苷酸多型性標籤。
我們提出了定量變異方法去求得資料內部的變異值。用此方法切割單一核苷酸多型性區塊之後,我們提出一些客觀的評估法來衡量切割的區塊是否恰當。從這個演算法我們求得人類第二十一號染色體的變異值為0.5。切割出來的區塊與NCBI網頁上的haplotype資料有著共同的性質。最後,我們發展標籤篩選演算法去挑選每個區塊中所需要的標籤為何,根據此演算法去挑選標籤我們得到資料壓縮的比率為0.78。
Abstract
SNP (single nucleotide polymorphisms, pronounce as snip) is one nucleotide position difference within human population.
These differences can be detected in human genome and the difference occurs once about every 1000 base pairs. There are only two possible nucleotides in each SNP position. As a genetic marker, SNP data can be used to capture human disease traits because of its abundance and low diversity.
In recent research results, it has been shown that there is a block-like structure in human genome, and only limited haplotype diversity can be observed. Consequently, we can use only a small fraction of SNPs to capture haplotype diversity in each block, and these SNPs are called tagSNPs.
We propose a fixed-diversity approach to capture the diversity of the entire data. After partitioning the haplotype blocks, we will provide an objective way for evaluating the result. We obtain that the diversity of chromosome 21 SNPs locates at 0.5 by using our algorithm. The partition result shows the concurrence property of the haplotype data downloaded from NCBI web site. Finally, we develop an algorithm for tagSNP selection within each block, and obtain the compression ratio 0.78.
目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . .. . . . . . 4
LIST OF TABLES . . . .. . . . . . . . . . . . . . . . . . 8
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . 0
Chapter 1. Introduction . . . . . . . . . . . . . . . . . 1
1.1 Introduction to SNP . . . . . . . . . . . . . . . . . 1
1.2 Characteristics of the Blocks . . . . . . . . . . 4
1.3 The Need of tagSNP Selection . . . . . . . . . . . 5
Chapter 2. Previous Works . . . . . . . . . . . . . . . . 7
2.1 Different Criteria for Block and tagSNP Selection . . 7
2.2 The Fixed-diversity Approach . . . . . . . . . . . . . 9
2.3 The NP-Complete Property . . . . . . . . . . . . . . 11
2.4 SCP Approaches . . . . . . . . . . . . . . . . . . . 12
Chapter 3. Main Ideas . . . . . . . . . . . . . . . . . . 14
3.1 Data Format . . . . . . . . . . . . . . . . . . . . 14
3.2 Problem Definition . . . . . . . . . . . . . . . . . 14
3.3 Diversity Calculation within One Block . . . . . . . 16
3.3.1 Classification . . . . . . . . . . . . . . . . . . 16
3.3.2 Calculation for Diversity . . . . . . . . . . . . . 19
3.4 Setting Fixed-diversity Threshold . . . . . . . . . . 19
Chapter 4. Our Methods . . . . . . . . . . . . . . . . . 20
4.1 Diversity Calculation . . . . . . . . . . . . . . . . 20
4.2 Dealing with Missing Data . . . . . . . . . . . . . . 21
4.2.1 Method 1 . . . . . . . . . . . . . . . . . . . . . 21
4.2.2 Method 2 . . . . . . . . . . . . . . . . . . . . . 24
4.2.3 Method 3 . . . . . . . . . . . . . . . . . . . . . 25
4.2.4 Method 4 . . . . . . . . . . . . . . . . . . . . . 26
4.3 Tag Selection Idea . . . . . . . . . . . . . . . . . 28
Chapter 5. Results and Discussion . . . . . . . . . . . . 31
5.1 The Power of Fixed-diversity Threshold . . . . . . . 31
5.1.1 Random Data Testing . . . . . . . . . . . . . . . 31
5.1.2 Properties of Block Length . . . . . . . . . . . . 32
5.1.3 Implication of Block Number Variance between Diversities . . . . . . . . . . . . . . . . . . . . . . 36
5.1.4 Secondary Block Boundary Effects . . . . . . . . . 38
5.2 Adopting Haplotype Data in Verification . . . . . . 39
5.3 Evaluation of the Partition Results . . . . . . . . 40
5.3.1 Definitions of Penalty Function . . . . . . . . . 40
5.3.2 Statistics of Partition Results . . . . . . . . . 43
5.4 The Number of Required Tags . . . . . . . . . . . . 44
Chapter 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . 46
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . 47
參考文獻 References
[1] http://www.hapmap.org/.
[2] http://www.perlegen.com/haplotype/.
[3] http://www.ncbi.nlm.nih.gov/.
[4] “A haplotype map of the human genome,” Physiol Genomics, Vol. 13, pp. 3–9, 2003.
[5] N.W. J. Akey, K. Zhung, R. Chakraborty, and L. Jin, “Distribution of recombination
crossovers and the origin of haplotype blocks: the interplay of population history,
recombination and mutation,” The American Journal of Human Genetics, Vol. 71,
pp. 1227–1234, 2002.
[6] E. Anderson and M. Slatkin, “Population-genetic basis of haplotype block in the
5q31 region,” The American Journal of Human Genetics, Vol. 74, pp. 40–49, 2004.
[7] E. C. Anderson, “Finding haplotype block boundaries by using the Minimum-
Description-Length principle,” The American Journal of Human Genetics, Vol. 73,
pp. 336–354, 2003.
[8] N. Arnheim, P. Calabrese, and M. Nordborg, “Hot and cold spots of recombination
in the human genome: the reason we should find them and how this can be achieved,”
The American Journal of Human Genetics, Vol. 73, pp. 5–16, 2003.
[9] V. Bafna, B. Halldorsson, R. Schwartz, and A. Clark, “Haplotype and informative
SNP selection algorithms: don’t block out information,” International Conference
on Research in Computational Molecular Biology, Berlin, Germany, 2003.
[10] Bntridder, B. V. Halldorsson, M. M. Halldorsson, Hurkens, Lenstra, Ravi, and
Stougie., “Approximation algorithms for the test cover problem,”Mathematical Programming,Vol. 98, pp. 477–491, 2003.
[11] A. J. Brookes, “The essence of SNPs,” Gene, Vol. 234, pp. 177–186, 1999.
[12] D. Claayton, “Choosing a set of haplotype tagging SNPs from a larger set of diallelic
loci.” www.nature.com/ng/journal/v29/n2/extref/ng1001-233-S10.pdf, 2001.
[13] M. J. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander, “High-resolution
haplotype structure in the human genome,” Nature Genetics, Vol. 29, pp. 229–232, 2001.
[14] S. Gabriel, S. Schaffner, H. Ngyen, J. Moore, J. Roy, B. Blumenstiel, J. Higgins,
M. Deflice, A. Lochner, M. Faggart, S. N. Liu-Cordero, C. Rotimi, A. Adeyemo,
R. Cooper, R. Ward, E. Lander, M. Daly, and D. Altshuler, “The structure of haplotype
blocks in the human genome,” Science, Vol. 296, No. 21, pp. 2225–2229, 2002.
[15] M. R. Garey and D. S. Johnson, Computers and Intracrability: a Guide to the Theory
of NP-completeness. WH Freeman and Company, first ed., 1979.
[16] I. Gray, D. Campbell, and B. Spurr, “Single nucleotide polymorphisms as tools in
human genetics,” Human Molecular Genetics, Vol. 9, No. 16, pp. 2403–2408, 2000.
[17] B. Halldorsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, “Combinatorial
problems arising in SNP and haplotype analysis,” Discrete Mathematics
and Theoretical Computer Science, pp. 26–47, 2003.
[18] X. Ke and L. R. Cardon, “Efficient selective screening of haplotype tag SNPs,”
Bioinformatics, Vol. 19, No. 2, pp. 287–288, 2003.
[19] Koivisto, Perola, Varilo, Hennah, Ekelund, Lukk, Peltonen, Ukkonen, and Mannila,
“An MDL method for finding haplotype blocks and for estimating the strength of
haplotype block boundaries,” Proceedings of Pacific Symposium on Biocomputing,
Vol. 8, Stanford University, USA, pp. 502–513, 2003.
[20] P. Nowotnyand, J. Kwon, and A. Goate, “SNP analysis to dissect human traits,”
Current Opinion in Neurobiology, Vol. 11, pp. 637–641, 2001.
[21] N. Patil, A. Berno, D. Hinds, W. Barrett, J. Doshi, C. Hacker, C. Kautzer, D. Lee,
C. Marjoribanks, C. Kautzer, B. Nguyen, M. Norris, J. Sheehan, N. Shen, D. Stern,
R. Stokowski, D. Thomas, M. Trulson, K. Vyas, K. Frazer, S. Fodor, and D. Cox,
“Blocks of limited haplotype diversity revealed by high-resolution scanning of human
chromosome 21,” Science, Vol. 294, No. 23, pp. 1719–1723, 2001.
[22] D. Reich, S. Schaffner, M. Daly, G. McVean, J. Mllikin, J. Higgins, D. Richter,
E. Lander, and D. Altshuler, “Human genome sequence variation and the influence
of gene history, mutation and recombination,” Nature Genetics, Vol. 32, pp. 135–142, 2002.
[23] M. Remm, A. Metspalu, E. Biocentre, and U. of Tartu, “How many SNPs do we
need for whole-genome linkage disequilibrium mapping?,” Human Genome Meeting, 2002.
[24] J. A. Schneider, M. Pungliya, J. Choi, R. Jiang, X. J. Sun, B. Salisbury, and
C. Stephens, “DNA variability of human genes,” Mechanisms of Ageing and Development, Vol. 124, pp. 17–25, 2003.
[25] R. Schwartz, “Haplotype motifs: an algorithmic approach to locating evolutionarily
conserved patterns in haploid sequences,” Proceedings of the Computational Systems
Bioinformatics, Stanford University, USA, pp. 1–9, 2003.
[26] P. Sebastiani, R. Lazarus, S.Weiss, L. Kunkel, I. Kohane, andM. Ramoni, “Minimal
haplotype tagging,” Proceedings of the National Academy of Sciences, Vol. 100,
No. 17, pp. 9900–9905, 2003.
[27] B. S. Shastry, “SNP alleles in human disease and evolution,” Journal of Human
Genetics, Vol. 47, pp. 561–566, 2002.
[28] H. A.-I. X. Su and F. D. L. Viga, “Selection of minimum subsets of single nucleotide polymorphisms to capture haplotype block diversity,” Pacific Symposium on Biocomputing, Lihue, Hawaii, USA, 2003.
[29] D.Wang, J. Fan, C. Siao, A. Berno, P. Young, R. Sapolsky, G. Ghandour, N. Perkins,
E. Winchester, J. Spencer, L. Kruglyak, L. Stein, L. Hsie, T. Topaloglou, vHubbell,
E. Robinson, M. Mittmann, M. Morris, N. Shen, D. Kilburn, J. Rioux, C. Nusbaum,
S. Rozen, T. Hudson, and E. Lander, “Large-scale identification, mapping
and genotyping of single-nucleotide polymorphisms in the human genome,” Science,
Vol. 280, No. 5366, pp. 1077–1082, 1998.
[30] L. Wen-Hsiung and D. Graur, Fundamentals of Molecular Evolution. Sinauer associates, Inc., first ed., 1990.
[31] L. WH and S. LA, “Low nucleotide diversity in man,” Genetics, Vol. 129, pp. 513–523, 1991.
[32] K. Zhang, T. Chen, M. Waterman, and F. Sun, “A set of dynamic programming algorithms for haplotype block partitioning and tag SNP selection via haplotype data
or genotype data,” In Proceedings of Discrete Mathematics and Theoretical Computer
Science Workshop on SNP, Rutgers University Busch Campus, Piscataway,NJ, USA, pp. 1–26, 2003.
[33] K. Zhang, M. Deng, T. Chenn, M.Waterman, and F. Sun, “A dynamic programming
algorithm for haplotype block partition,” Proceedings of the National Academy of
Sciences, Vol. 99, No. 11, pp. 7335–7339, 2002.
[34] K. Zhang, F. Sun,M. S.Waterman, and T. Chen, “Dynamic programming algorithms
for haplotype block partitioning: applications to human chromosome 21 ahplotype
data,” International Conference on Research in Computational Molecular Biology,
Berlin, Germany, 2003.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code