Responsive image
博碩士論文 etd-0819103-130449 詳細資訊
Title page for etd-0819103-130449
論文名稱
Title
DNA序列之啟動區預測方法
Promoter Prediction in DNA Sequences
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
33
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2003-06-20
繳交日期
Date of Submission
2003-08-19
關鍵字
Keywords
transcriptional element、TATA-box、promoter、promoter prediction、CpG island
CpG島, 啟動區預測, 轉錄元素, TATA盒, 啟動區
統計
Statistics
本論文已被瀏覽 5714 次,被下載 7553
The thesis/dissertation has been browsed 5714 times, has been downloaded 7553 times.
中文摘要
近年來,啟動區的預測吸引了許多研究專家的注意。不幸的是,大部份的預測演算法並不能得到足夠好的sensitivity和specificity。本篇論文的目的是在發展一個有效率的預測演算法來增加預測能力。我們不試著去一個一個找出啟動區較為顯著的特徵,例如和轉錄的相關的元素。我們的主要目的是使用電腦強大的運算能力來一次計算出所有可能的字串,這些字串需為啟動區的顯著特徵。 此外,我們必須對測試字串定義一些評分方法,包含啟動區和非啟動區字串。然後,我們可以得到一個適當的門檻值來決定是否測試字串為啟動區。透過最後的實驗結果,我們方法的預測結果和之前其他人的方法比起來得到了叫好的預測準確率。
Abstract
Recently, the prediction of promoters has attracted many
researchers' attention. Unfortunately, most previous prediction
algorithms did not provide high enough sensitivity and
specificity. The goal of this thesis is to develop an efficient
prediction algorithm that can increase the detection power (power
= 1 - false negative). We do not try to find more distinct
features in promoters one by one, such as transcriptional
elements. Our main idea is to use the computer power to calculate
all possible patterns which are the possible features of
promoters. Accordingly, we shall define some scoring methods for
training a given set of sequences, which involve promoter
sequences and non-promoter sequences. Then, we can obtain a
threshold value for determining whether a testing sequence is a
promoter or not. By the experimental results, our prediction has
higher correct rate than other previous methods.
目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Definition of the Promoter . . . . . . .. . . . . . . . . . . . 3
2.2 Significance of the Promoter Prediction. . . . . . . . . . . . . 4
2.3 The Features of Promoter Sequences . . . . . . . . . . . . . . . 6
2.3.1 TATA-Box and TTG-Box . . . . . . . . ... . . . . . . . . . . . 6
2.3.2 CpG Islands . . . . . . . . . . . . . .. . . . . . . . . . . . 6
2.4 Artificial Neural Network . . . . . . .. . . . . . . . . . . . . 7
2.5 Hidden Markov Model . . . . . . . . . .. . . . . . . . . . . . . 9
2.6 Graph-based Induction Method . . . . . . . . . . . . . . . . . . 11
2.7 Predicting Pol II Promoter Sequences Using Transcription Factor
Binding Sites . . . . . . . . . . . . . . .. . . . . . . . . . . . . 11
Chapter 3. Material and Methods . . . . . .. . . . . . . . . . . . . 15
3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Method 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Method 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Method 2 with Frame Constraints . . . . . . . . . . . . . . . . 21
Chapter 4. Experimental Results and Accuracy Analysis . . . .. . . . 23
Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
參考文獻 References
[1] F. Antequera and A. Bird, "Number of CpG islands and genes in human and
mouse," Proc Natl Acad Sci, USA, Vol. 90, pp. 11995-11999, 1993.
[2] S. Audic and J. M. Claverie, "Visualizing the competitive recognition of TATA-boxes in vertebrate promoters," Trends Genet, Vol. 14, pp. 10-11, 1998.
[3] C. Blake and C. Merz, "http://www.ics.uci.edu/»mlearn/mlrepository.html,"
UCI Repository of machine learning databases, 1998.
[4] P. Bucher, "Weight matrix descriptions of four eukaryotic RNA polymerase
II promoter elements derived from 502 unrelated promoter sequences," J.
Mol.Biol., Vol. 212, pp. 563-578, 1990.
[5] J. M. Craig and W. A. Bickmore, "The distribution of CpG islands in mam-
malian chromosomes," Nature Genetics, Vol. 7, pp. 376-382, 1994.
[6] S. H. Cross and A. Bird, "CpG islands and genes," Current Opinion in Genetics & Development, Vol. 5, pp. 309-314, 1995.
[7] B. Demeler and G. W. Zhou, "Neural network optimization for E. coli promoter prediction," Nucleic Acids Research, Vol. 19, pp. 1593-1599, 1991.
[8] G. Gill and R. Tjian, "Eukaryotic coactivators associated with the TATA box
binding protein," Current Opinion in Genetics & Development, Vol. 2, pp. 236-
242, 1992.
[9] S. Hannenhalli and S. Levy, "Promoter prediction in the human genome,"
Bioinformatics, Vol. 17, pp. 90-96, 2001.
[10] R. Hershberg, G. Bejerano, A. Santos-Zavaleta, and H. Margalit, "Promec: An updated database of Escherichia coli mRNA promoters with experimentally
identified transcriptional start sites," Nucleic Acids Research, Vol. 29, p. 277, 2001.
[11] P. B. Horton and M. Kanehisa, "An assessment of neural network and statistical approaches for prediction of E. coli promoter sites," Nucleic Acids Research, Vol. 20, pp. 4331-4338, 1992.
[12] S. Kullback and R. A. Leibler, "On information and su±ciency," The Annals
of Mathematical Statistics, Vol. 22, pp. 79-86, 1951.
[13] S. Lisser and H. Margalit, "Compilation of E. coli mRNA promoter sequences," Nucleic Acids Research, Vol. 21, pp. 1507-1516, 1993.
[14] I. Mahadevan and I. Ghosh, "Analysis of E. coli promoter structures using
neural networks," Nucleic Acids Research, Vol. 22, pp. 2158-2165, 1994.
[15] T. Matsuda, H. Motoda, and T. Washio, "Graph-based induction and its ap-
plications," Advanced Engineering Informatics, Vol. 16, pp.135-143, 2002.
[16] M. C. O'Neill, "Escherichia coli promoters: neural networks develop distinct descriptions in learning to search for promoters of different spacing classes," Nucleic Acids Research, Vol. 20, pp. 3471-3477, 1992.
[17] A. G. Pedersen, P. Baldi, S. Brunak, and Y. Chauvin, "Characterization of
prokaryotic and eukaryotic promoters using hidden markov models," Proceed-
ings of the Third International Conference on Intelligent Systems for Molecular
Biology (ISMB98), 1998.
[18] A. G. Pedersen, P. Baldi, Y. Chauvin, and S. Brunak, "The biology of eukaryotic promoter prediction - a review," ComputerChemistry, Vol. 23, pp. 191-207, 1999.
[19] A. G. Pedersen and J. Engelbrecht, "Investigations of Escherichia coli promoter sequences with artifical neural network: New signals discovered upstream of the transcriptional startpoint," Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology (ISMB95).
[20] D. S. Prestridge, "Predicting pol II promoter sequences using transcription factor binding sites," J. Mol.Biol., 1995.
[21] J. Quinlan, "Induction of decision trees," Machine Learning, Vol. 1, pp. 81-106,1986.
[22] J. Quinlan, C4.5: programs for machine learning. Los Altos:
CA:Morgan(Kaufmann), 1993.
[23] T. Reichhardt, "Will souped up salmon sink or swim?," Nature, Vol. 406,
pp. 10-12, 2000.
[24] B. A. L. T. R. J. R. G. W. K. W. K. M. M. B. V. A. W. C. S. Rorth P,
Szabo K, "Systematic gain-of-function genetics in Drosophila," Development,
Vol. 125, pp.1049-1057, 1998.
[25] C. Starr and R. Taggart, Biology: The Unity and Diversity of Life. Boston,
USA: Wadsworth Publish Company, five ed., 1995.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code