Responsive image
博碩士論文 etd-0719109-185811 詳細資訊
Title page for etd-0719109-185811
論文名稱
Title
互補迴文的統計檢定:尋找病毒複製源的應用
Statistical tests of complementary palindromes: An application of searching virus origin of replication
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
61
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2009-06-19
繳交日期
Date of Submission
2009-07-19
關鍵字
Keywords
卡方檢定、概似比檢定、互補迴文、複製源、巨細胞病毒
Pearson's chi-squared test, likelihood ratio test, human cytomegalovirus, origin of replication, complementary palindromes
統計
Statistics
本論文已被瀏覽 5709 次,被下載 8
The thesis/dissertation has been browsed 5709 times, has been downloaded 8 times.
中文摘要
人類巨噬細胞病毒(CMV)的傳播範圍遍佈全球,它會侵入特定的細胞組織並加以控制,進而達到生存繁殖的目的。在巨噬細胞病毒的DNA基因組中,「複製源」是一段記錄複製病毒的基因序列。本文中我們推導出一些互補迴文檢定統計方法,以縮短找尋複製源位置所需要的時間。
定義X_(2k)為DNA序列中長度2k的互補迴文個數,Y_(2k)則為長度2k且剔除覆蓋情況的互補迴文個數。以DNA序列中四種核甘酸出現機率相同與否,定為虛無假設及對立假設,並利用Y_(2k)條件分佈所組成的聯合機率分佈進行概似比檢定。在虛無假設下,檢定統計量近似一個乘上係數的卡方分配,其係數與自由度則用動差法加以估計。利用X_(2k)的邊際分佈,我們也可以做Pearson卡方檢定,其虛無假設下的檢定統計量也近似一個乘上係數的卡方分配。此外,X_(2k)/X_(2(k+1))和Y_(2k)/Y_(2(k+1))的比值在虛無假設下會近似特殊值。本文最後,再以模擬研究方式,對其理論加以驗證。
Abstract
The human cytomegalovirus (CMV) is one of the viruses which extensively infect in the world. In order to grow and reproduce, the CMV invades designated cellular lives and influences their behavior. The origin of replication (also called the replication origin) is a particular sequence in the CMV DNA genome at which replication is initiated. In this study, we develop some statistical tests of complementary palindromes, which can be applied to narrow the search for replication origin of the CMV DNA sequence.
Let X_(2k) be the number of complementary palindromes with length 2k and Y_(2k) be the number of non-covered complementary palindromes with length 2k inside a given DNA sequence. Consider the null hypothesis that the marginal probabilities of the four nucleotides remain the same (1/4) over the given sequence versus the alternative hypothesis that the marginal probabilities are different. The likelihood ratio test based on the joint distributions of Y_(18) and Y_(2k) | (Y_(2(k+1)), ...,Y_(18)), where k=1, ..., 8, under the null and the alternative hypotheses are derived. The null distribution of the test statistic is approximated by a scaled chi-squared distribution. The scale parameter and the degree of freedom are estimated by the method of moments. The Pearson's chi-squared test based on the marginal distributions of X_(2k), where k=1, ..., 9. The null distribution of the test statistic is also approximated by a scaled chi-squared distribution. There is an another focus about ratios statistics X_(2k)/X_(2(k+1)) and Y_(2k)/Y_(2(k+1)), which approximate a specific value under the null hypotheses. Simulation studies are performed to confirm the theoretical findings.
目次 Table of Contents
1 Introduction . . . . . . 1
1.1 CMV . . . . . . 1
1.2 Research Issues . . . . . . 3
2 Data Description . . . . . . 7
2.1 CMV DNA . . . . . . 7
2.2 Searching Algorithm of Complementary Palindrome . . . . . . 9
3 Distributions of the complementary palindrome Numbers . . . . . . 12
3.1 The Complementary Palindrome Number X_(2k) . . . . . . 12
3.1.1 A, T, G, C with Equal Probability . . . . . . 12
3.1.2 A, T, G, C with Unequal Probabilities . . . . . . 13
3.2 The Complementary Palindrome Number Y_(2k) . . . . . . 13
3.2.1 A, T, G, C with Equal Probability . . . . . . 13
3.2.2 A, T, G, C with Unequal Probabilities . . . . . . 15
3.3 Likelihood Ratio Test . . . . . . 15
3.4 Pearson's Chi-square Test . . . . . . 17
3.4.1 A, T, G, C with Equal Probability . . . . . . 18
3.4.2 A, T, G, C with Unequal Probabilities . . . . . . 19
3.5 Ratio Statistics X_(2k)/X_(2(k+1)) and Y_(2k)/Y_(2(k+1)) . . . . . . 20
3.5.1 A, T, G, C with Equal Probability . . . . . . 20
3.5.2 A, T, G, C with Unequal Probabilities . . . . . . 20
4 Simulation Study . . . . . . 22
4.1 Simulation Algorithm . . . . . . 22
4.2 Tests of the CMV Complementary Palindrome Numbers . . . . . . 24
4.3 Simulating the Complementary Palindrome Number Scenario of the CMV . . . . . . 25
4.3.1 A, T, G, C with Equal Probability . . . . . . 25
4.3.2 A, T, G, C with Unequal Probabilities . . . . . . 26
5 Conclusions and Future Work . . . . . . 27
6 Appendix . . . . . . 28
6.1 Figures . . . . . . 28
6.2 Tables . . . . . . 39
References . . . . . . 50
參考文獻 References
[1] Casella, G. and Berger, R.L. (2002). Statistical Inference (2nd ed.). USA: Duxbury.
[2] Chew, D.S.H., Choi, K.P. and Leung, M.Y. (2005). Scoring schemes of palindrome clusters
for more sensitive prediction of replication origins in herpesviruses, Nucleic Acids
Research, 33, e134.
[3] Chew, D.S.H., Leung, M.Y. and Choi, K.P. (2007). AT excursion: a new approach to predict
replication origins in viral genomes by locating AT-rich regions, BMC Bioinformatics, 8,
e163.
[4] Hastings, K.J. (1997). Probability and Statistics. USA: Addison-Wesley.
[5] Leung, M.Y., Blaisdell, E., Burge, C. and Karlin, S. (1991). An efficient algorithm for identifying
matches with errors in multiple long molecular sequences, Journal of Molecular
Biology, 221, 1367-1378.
[6] Leung, M.Y., Choi, K.P., Xia, A. and Chen, L.H.Y. (2005). Nonrandom clusters of palindromes
in herpesvirus genomes. Journal of computational biology: a journal of computational
molecular cell biology, 12, 331-354.
[7] Masse, M.J., Karlin, S., Schachtel, G.A. and Mocarski, E.S. (1992). Human cytomegalo
virus origin of DNA replication (oriLyt) resides within a highly complex repetitive region.
Proc. Natl. Acad. Sci. USA. 89, 5246-5250.
[8] Nolan, D. and Speed, T. (2000). Stat Labs: Mathematical Statistics Through Applications (ch.4).
NY: Springer-Verlag.
[9] Plackett, R.L. (1983). Karl Pearson and the Chi-Squared Test. International Statistical Review,
51, 59-72.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內公開,校外永不公開 restricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.216.32.116
論文開放下載的時間是 校外不公開

Your IP address is 18.216.32.116
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code