Responsive image
博碩士論文 etd-0631118-141016 詳細資訊
Title page for etd-0631118-141016
論文名稱
Title
篩選有效基因特徵框架用於多位點基因序列分型研究
Selection of the most informative schemes for multi-locus sequence typing
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2018-07-26
繳交日期
Date of Submission
2018-07-31
關鍵字
Keywords
次世代基因定序、全基因體多位點序列分型、分型框架選擇、分子分型、食媒性疾病、特徵選擇
Feature selection, Molecular subtyping, Typing scheme selection, Foodborne disease, Whole-genome multilocus sequence typing (wgMLST), Next generation sequencing (NGS)
統計
Statistics
本論文已被瀏覽 5608 次,被下載 13
The thesis/dissertation has been browsed 5608 times, has been downloaded 13 times.
中文摘要
針對細菌引起之食媒性(foodborne)疾病爆發的探討與監控需要使用分子分型(molecular subtyping)方法達到,在本研究中將利用全基因體序列建立泛基因體等位基因資料庫,並利用多位點基因座序列分型(multilocus sequence typing, MLST)針對泛基因體序列進行分析,利用圖型化的方式建立同一種細菌不同菌株間的基因親緣關係樹狀圖,並標示出疾病爆發的菌株群集。利用此分子分型方法的輸出將能夠於不同實驗室間進行比較與應用。利用全基因體(whole-genome)序列做為多位點基因座序列分型之資料來源更能代表細菌資料的完整性,但其數量龐大之基因座資料集將會造成分型過程耗時,在本研究中將利用Python程式腳本與特徵選擇(feature selection)方法:變異量數閾值(variance threshold)、特徵重要性(feature importance)針對原始泛基因體序列進行無用或對於疾病爆發相關性低之基因座(特徵)的移除,挑選出最精簡與準確性高的基因特徵框架(scheme)做為細菌分子分型的輸入,期望以此精簡化之基因框架獲得與原始資料集相同分子分型結果,改善現有技術之缺點,在將來也能應用於針對細菌抗生素抗藥性之水平基因轉移(horizontal gene transfer, HGT)相關研究。本研究開發了一wgMLST-BacCompare網頁工具(http://bactyper.imst.nsysu.edu.tw),並測試四種不同細菌:空腸彎曲桿菌(Campylobacter jejuni)、大腸桿菌(Escherichia coli)、李斯特菌(Listeria monocytogenes)、腸道沙門氏菌(Salmonella enterica),並將其結果與現有之細菌基因基準資料集(benchmark dataset)進行比較,驗證此工具準確性與穩定性。在本研究結果中可以看到利用特徵選擇方法篩選出來的極少量基因座資訊便可以達成良好的分型效果,改善了舊有分型方法的效率及保持其準確性。我們相信wgMLST-BacCompare工具將有助於流行病學領域之研究發展。
Abstract
To investigate and surveillance the foodborne disease outbreak caused by bacteria, molecular sequence subtyping method would be employed. In our study, we build an open-access website platform wgMLST-BacCompare for molecular subtyping which utilize multilocus sequence typing (MLST) approach with whole genome sequence data. We construct visualized genetic relatedness trees of the uploaded isolates and indicate disease outbreak subgroup. In general, utilize pan-genome sequences as the multilocus sequence typing process data resource can make the result of analysis more complete and exhaustive. But the pan-genome sequence lead to the enlargement of dataset and subtyping time-consuming. In this study, we make use of Python script and feature selection methods: variance threshold and feature importance. These methods are exploited to remove useless or low influence features (gene loci) for investigating disease outbreak from raw whole-genome sequence. We would take advantage of wgMLST-BacCompare website service (http://bactyper.imst.nsysu.edu.tw) testing four species of bacterial: Campylobacter jejuni, Escherichia coli, Listeria monocytogenes, and Salmonella enterica. And compare the results with existing bacterial gene benchmark dataset, identifying the accuracy and robustness of the web tool. We can achieve well subtyping results by small amount of gene locus information which filtered by feature selection methods. Besides the improvement of efficiency, we can still maintain the accuracy. And we believe that the wgMLST-BacCompare will be a useful online tool for epidemiology studies.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
目錄 v
圖次 vii
表次 ix
第一章 緒論 1
1.1 研究動機與背景 1
1.1.1 病原細菌(Pathogenic bacteria) 1
1.1.2 分子分型(Molecular subtyping) 2
1.1.3 特徵選擇(Feature selection) 4
1.1.4 研究目的 5
第二章 實驗方法 6
2.1 實驗架構 6
2.1.1 預設分析 7
2.1.1.1 基因體片段重疊群註解(Genome contigs annotation) 7
2.1.1.2 建立全基因體等位基因資料庫(Pan-genome allele database, PGAdb) 8
2.1.1.3 建立親緣關係樹狀圖(Genetic relatedness tree) 9
2.1.1.4 特徵選擇方法篩選原始資料集 12
2.1.1.5 特徵選擇:變異量數閾值(Variance threshold) 12
2.1.2 進階分析 13
2.1.2.1 特徵選擇:特徵重要性(Feature importance) 13
第三章 結果與討論 17
3.1 泛基因體等位基因資料庫與特徵選擇演算法目標 17
3.2 網頁服務架設 17
3.3 網頁服務使用介紹 17
3.3.1 輸入格式 17
3.3.2 輸出格式:Build_PGAdb (Default analysis) 18
3.3.3 輸出格式:Locus_refinement (Advanced analysis) 19
3.4 網頁服務測試 20
3.4.1 利用空腸彎曲桿菌(Campylobacter jejuni)為測試細菌菌種 20
3.4.2 利用大腸桿菌(Escherichia coli, E. coli)為測試細菌菌種 24
3.4.3 利用李斯特菌(Listeria monocytogenes)為測試細菌菌種 28
3.4.4 利用腸道沙門氏菌(Salmonella enterica)為測試細菌菌種 32
3.4.5 利用鼠傷寒沙門氏菌(Salmonella typhimurium)為測試細菌菌種 36
3.4.6 特徵選擇策略 40
第四章 結論與未來展望 41
4.1 結論 41
4.2 未來展望 42
參考文獻 43
參考文獻 References
1. Lo, W.-S., et al., Comparison of Metabolic Capacities and Inference of Gene Content Evolution in Mosquito-Associated Spiroplasma diminutum and S. taiwanense. Genome Biology and Evolution, 2013. 5(8): p. 1512-1523.
2. Kaufmann, M.E., Pulsed-field gel electrophoresis, in Molecular Bacteriology. 1998, Springer. p. 33-50.
3. Swaminathan, B., et al., PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerging infectious diseases, 2001. 7(3): p. 382.
4. Noller, A.C., et al., Multilocus Variable-Number Tandem Repeat Analysis Distinguishes Outbreak and Sporadic Escherichia coli O157:H7 Isolates. Journal of Clinical Microbiology, 2003. 41(12): p. 5389-5397.
5. Ansorge, W.J., Next-generation DNA sequencing techniques. New biotechnology, 2009. 25(4): p. 195-203.
6. Syvänen, A.-C., Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature Reviews Genetics, 2001. 2(12): p. 930.
7. Maiden, M.C., et al., Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences, 1998. 95(6): p. 3140-3145.
8. Maiden, M.C., et al., MLST revisited: the gene-by-gene approach to bacterial genomics. Nature Reviews Microbiology, 2013. 11(10): p. 728.
9. Liu, Y.-Y., C.-S. Chiou, and C.-C. Chen, PGAdb-builder: A web service tool for creating pan-genome allele database for molecular fine typing. Scientific Reports, 2016. 6: p. 36213.
10. Saeys, Y., I. Inza, and P. Larrañaga, A review of feature selection techniques in bioinformatics. bioinformatics, 2007. 23(19): p. 2507-2517.
11. Pearson, K., X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1900. 50(302): p. 157-175.
12. Goldberg, D.E. and J.H. Holland, Genetic algorithms and machine learning. Machine learning, 1988. 3(2): p. 95-99.
13. Quinlan, J.R., Induction of decision trees. Machine learning, 1986. 1(1): p. 81-106.
14. Lipman, D.J. and W.R. Pearson, Rapid and sensitive protein similarity searches. Science, 1985. 227(4693): p. 1435-41.
15. Pearson, W.R. and D.J. Lipman, Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A, 1988. 85(8): p. 2444-8.
16. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol, 1990. 215(3): p. 403-10.
17. Stothard, P. and D.S. Wishart, Automated bacterial genome analysis and annotation. Current opinion in microbiology, 2006. 9(5): p. 505-510.
18. Seemann, T., Prokka: rapid prokaryotic genome annotation. Bioinformatics, 2014. 30(14): p. 2068-9.
19. Benson, D.A., et al., GenBank. Nucleic acids research, 2000. 28(1): p. 15.
20. Page, A.J., et al., Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics, 2015. 31(22): p. 3691-3.
21. Wayne, L., et al., Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic and Evolutionary Microbiology, 1987. 37(4): p. 463-464.
22. Huerta-Cepas, J., F. Serra, and P. Bork, ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Molecular Biology and Evolution, 2016. 33(6): p. 1635-1638.
23. Rossum, G., Python reference manual. 1995, CWI (Centre for Mathematics and Computer Science).
24. Felsenstein, J., Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 1981. 17(6): p. 368-376.
25. Rokach, L. and O. Maimon, Clustering methods, in Data mining and knowledge discovery handbook. 2005, Springer. p. 321-352.
26. Zhang, Z., et al., A greedy algorithm for aligning DNA sequences. Journal of Computational biology, 2000. 7(1-2): p. 203-214.
27. Pedregosa, F., et al., Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 2011. 12: p. 2825-2830.
28. Geurts, P., D. Ernst, and L. Wehenkel, Extremely randomized trees. Machine Learning, 2006. 63(1): p. 3-42.
29. Breiman, L., Classification and regression trees. 2017: Routledge.
30. Breiman, L., Bagging predictors. Machine learning, 1996. 24(2): p. 123-140.
31. Timme, R.E., et al., Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance. PeerJ, 2017. 5: p. e3893.
32. Froderberg, L., et al., Targeting and translocation of two lipoproteins in Escherichia coli via the SRP/Sec/YidC pathway. J Biol Chem, 2004. 279(30): p. 31026-32.
33. Martin, F.A., et al., Interplay between two RND systems mediating antimicrobial resistance in Brucella suis. J Bacteriol, 2009. 191(8): p. 2530-40.
34. Kawagishi, I., et al., Characterization of the flagellar hook length control protein fliK of Salmonella typhimurium and Escherichia coli. Journal of bacteriology, 1996. 178(10): p. 2954-2959.
35. Shah, V.K., G. Stacey, and W.J. Brill, Electron transport to nitrogenase. Purification and characterization of pyruvate: flavodoxin oxidoreductase. The nifJ gene product. Journal of Biological Chemistry, 1983. 258(19): p. 12064-12068.
36. Carvajal, N., et al., Evidence that histidine-163 is critical for catalytic activity, but not for substrate binding to Escherichia coli agmatinase. Biochem Biophys Res Commun, 1999. 264(1): p. 196-200.
37. Grolle, S., S. Bringer-Meyer, and H. Sahm, Isolation of the dxr gene of Zymomonas mobilis and characterization of the 1-deoxy-D-xylulose 5-phosphate reductoisomerase. FEMS microbiology letters, 2000. 191(1): p. 131-137.
38. Mizuno, T., M.-Y. Chou, and M. Inouye, A comparative study on the genes for three porins of the Escherichia coli outer membrane. DNA sequence of the osmoregulated ompC gene. Journal of Biological Chemistry, 1983. 258(11): p. 6932-6940.
39. Desai, K.K. and B.G. Miller, Recruitment of genes and enzymes conferring resistance to the nonnatural toxin bromoacetate. Proceedings of the National Academy of Sciences, 2010. 107(42): p. 17968-17973.
40. Jackson, B.R., et al., Listeriosis Associated with Stone Fruit — United States, 2014. MMWR. Morbidity and Mortality Weekly Report, 2015. 64(10): p. 282-283.
41. Kidane, D., et al., Visualization of DNA double‐strand break repair in live bacteria reveals dynamic recruitment of Bacillus subtilis RecF, RecO and RecN proteins to distinct sites on the nucleoids. Molecular microbiology, 2004. 52(6): p. 1627-1639.
42. Nosova, T., et al., Aldehyde dehydrogenase activity and acetate production by aerobic bacteria representing the normal flora of human large intestine. Alcohol Alcohol, 1996. 31(6): p. 555-64.
43. Liu, B., et al., A unique highly thermostable 2-phosphoglycerate forming glycerate kinase from the hyperthermophilic archaeon Pyrococcus horikoshii: gene cloning, expression and characterization. Extremophiles, 2007. 11(5): p. 733-739.
44. Solioz, M., S. Mathews, and P. Fürst, Cloning of the K+-ATPase of Streptococcus faecalis. Structural and evolutionary implications of its homology to the KdpB-protein of Escherichia coli. Journal of Biological Chemistry, 1987. 262(15): p. 7358-7362.
45. Borges, A., et al., Cloning and sequence analysis of the genes encoding the dihydrolipoamide acetyltransferase and dihydrolipoamide dehydrogenase components of the pyruvate dehydrogenase multienzyme complex of Bacillus stearothermophilus. Eur J Biochem, 1990. 194(1): p. 95-102.
46. Worley, M.J., K.H. Ching, and F. Heffron, Salmonella SsrB activates a global regulon of horizontally acquired genes. Molecular microbiology, 2000. 36(3): p. 749-761.
47. Liu, S., et al., Structures of human dihydroorotate dehydrogenase in complex with antiproliferative agents. Structure, 2000. 8(1): p. 25-33.
48. Leekitcharoenphon, P., et al., Evaluation of Whole Genome Sequencing for Outbreak Detection of Salmonella enterica. PLoS ONE, 2014. 9(2): p. e87991.
49. Burton, K., Adenine transport in Escherichia coli. Proc. R. Soc. Lond. B, 1994. 255(1343): p. 153-157.
50. Beláňová, M., et al., Galactosyl transferases in mycobacterial cell wall synthesis. Journal of bacteriology, 2008. 190(3): p. 1141-1145.
51. Worley, M.J., et al., Salmonella typhimurium disseminates within its host by manipulating the motility of infected cells. Proc Natl Acad Sci U S A, 2006. 103(47): p. 17915-20.
52. Rodrigue, A., G. Effantin, and M.A. Mandrand-Berthelot, Identification of rcnA (yohM), a nickel and cobalt resistance gene in Escherichia coli. J Bacteriol, 2005. 187(8): p. 2912-6.
53. Hong, H., et al., The outer membrane protein OmpW forms an eight-stranded β-barrel with a hydrophobic channel. Journal of biological chemistry, 2006. 281(11): p. 7568-7577.
54. Sproul, A.A., et al., Genetic control of manno(fructo)kinase activity in Escherichia coli. Proc Natl Acad Sci U S A, 2001. 98(26): p. 15257-9.
55. Lancy, E., et al., Nucleotide sequences of dnaE, the gene for the polymerase subunit of DNA polymerase III in Salmonella typhimurium, and a variant that facilitates growth in the absence of another polymerase subunit. Journal of bacteriology, 1989. 171(10): p. 5581-5586.
56. Kanjee, U., et al., The enzymatic activities of the Escherichia coli basic aliphatic amino acid decarboxylases exhibit a pH zone of inhibition. Biochemistry, 2011. 50(43): p. 9388-98.
57. Humphreys, S., et al., Role of the two-component regulator CpxAR in the virulence of Salmonella enterica serotype Typhimurium. Infection and immunity, 2004. 72(8): p. 4654-4661.
58. Yew, W.S., et al., Evolution of enzymatic activities in the enolase superfamily: L-talarate/galactarate dehydratase from Salmonella typhimurium LT2. Biochemistry, 2007. 46(33): p. 9564-9577.
59. Frodyma, M.E. and D. Downs, ApbA, the ketopantoate reductase enzyme of Salmonella typhimurium is required for the synthesis of thiamine via the alternative pyrimidine biosynthetic pathway. J Biol Chem, 1998. 273(10): p. 5572-6.
60. Waters, L.S., M. Sandoval, and G. Storz, The Escherichia coli MntR miniregulon includes genes encoding a small protein and an efflux pump required for manganese homeostasis. J Bacteriol, 2011. 193(21): p. 5887-97.
61. Rodionov, D.A., et al., Dissimilatory Metabolism of Nitrogen Oxides in Bacteria: Comparative Reconstruction of Transcriptional Networks. PLOS Computational Biology, 2005. 1(5): p. e55.
62. Spector, M.P., et al., The medium-/long-chain fatty acyl-CoA dehydrogenase (fadF) gene of Salmonella typhimurium is a phase 1 starvation-stress response (SSR) locus. Microbiology, 1999. 145(1): p. 15-31.
63. Laszlo, D.J. and B.L. Taylor, Aerotaxis in Salmonella typhimurium: role of electron transport. J Bacteriol, 1981. 145(2): p. 990-1001.
64. Cohen, M.L., Epidemiology of drug resistance: implications for a post-antimicrobial era. Science, 1992. 257(5073): p. 1050-5.
65. Gold, H.S. and R.C. Moellering, Jr., Antimicrobial-drug resistance. N Engl J Med, 1996. 335(19): p. 1445-53.
66. Koonin, E.V., K.S. Makarova, and L. Aravind, Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol, 2001. 55: p. 709-42.
67. Kay, E., et al., In Situ Transfer of Antibiotic Resistance Genes from Transgenic (Transplastomic) Tobacco Plants to Bacteria. Applied and Environmental Microbiology, 2002. 68(7): p. 3345-3351.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code