Responsive image
博碩士論文 etd-0913112-052307 詳細資訊
Title page for etd-0913112-052307
論文名稱
Title
在雲端運算環境下使用分散式演化式演算法推導大型基因調控網路
Applying MapReduce Island-based Genetic Algorithm-Particle Swarm Optimization to the inference of large Gene Regulatory Network in Cloud Computing environment
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
74
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-07-20
繳交日期
Date of Submission
2012-09-13
關鍵字
Keywords
Hadoop、雲端運算、粒子群最佳化、基因調控網路、MapReduce
Cloud Computing, Gene Regulatory Networks, Particle Swarm Optimization, Hadoop, MapReduce
統計
Statistics
本論文已被瀏覽 5894 次,被下載 734
The thesis/dissertation has been browsed 5894 times, has been downloaded 734 times.
中文摘要
當前生物資訊領域,建立大型基因調控網路需要耗費高額的計算成本,在資源和成本的限制之下,過去許多學者利用分散式運算,結合眾多個人電腦的運算能力,共同完成耗時的計算工作。近年來雲端運算技術成熟,學界和業界將雲端運算技術廣泛運用在大型資料的計算上,其中Hadoop是目前最為知名且可靠的開放原始碼雲端運算框架,支援MapReduce分散運算機制,能夠在任何虛擬機器或實體電腦組成的叢集環境下進行分散式運算工作,是一套具備高度抽象化的雲端運算框架,允許使用者開發Map Reduce程式,在任何Hadoop雲端運算環境下,進行大規模的資料分析作業,並支援完善的資料備份和回復機制。
粒子群最佳化演算法,是Eberhart、Kennedy學者於1995年提出,是以群體為基礎的最佳化搜尋方法;GAPSO是改良型的粒子群最佳化演算法,融入基因演算法的選擇、交配、突變機制,具備更良好的最佳解搜尋、和跳脫區域最佳解的能力,學界常用來作為推導基因調控網路的參數最佳化方法。
基因調控網路是利用節點和連結來表達基因之間的作用關係,利用實驗的方式觀察基因在時間序列的表現量,推導建立基因調控網路,是當前生物資訊領域重要探討的議題之一。其中非線性微分方程式的S-System基因網路模型,能夠描述生物網路系統,分析內部的動態變化情形,是目前最廣泛被使用的方法,其中改良型的De.S-System,能夠大幅縮減參數的維度大小,適用在大型基因調控網路推導。建立一個含有N個基因的調控網路,必須處理包含2N(N+1)個參數的非線性微分方程式組,此為一個大量參數最佳化的問題,需要耗費高額的計算成本。
本研究提出Map Reduce-Island Based GAPSO演算法,能夠有效的在Hadoop雲端環境下執行Map Reduce分散式運算,完成De.S-System參數的最佳化,建立大型基因調控網路。在26台電腦的叢集環境推導含有125基因的大型基因調控網路,相較於單機運算,能夠減少90%的計算時間、並提升9.7倍的速度。
Abstract
The construction of Gene Regulatory Networks (GRNs) is one of the most important issues in systems biology. To infer a large-scale GRN with a nonlinear mathematical model, researchers need to encounter the time-consuming problem due to the large number of network parameters involved. In recent years, the cloud computing technique has been widely used to solve large-scale problems. Among others, Hadoop is currently the most well-known and reliable cloud computing framework, which allows users to analyze large amount of data in a distributed environment (i.e., MapReduce). It also supports data backup and data recovery mechanisms.
This study proposes an Island-based GAPSO algorithm under the Hadoop cloud computing environment to infer large-scale GRNs. GAPSO exploited the position and velocity functions of PSO, and integrated the operations of Genetic Algorithm. This approach is often used to derive the optimal solution in nonlinear mathematical models. Several sets of experiments have been conducted, in which the number of network nodes varied from 50 to 125. The experiments were executed in the Hadoop distributed environment with 10, 20, and 26 computers, respectively. In the experiments of inferring the network with 125 gene nodes on the largest Hadoop cluster (i.e. 26 computers), the proposed framework performed up to 9.7 times faster than the stand-alone computer. It means that our work can successfully reduce 90% of the computation time in a single experimental run.
目次 Table of Contents
論文摘要 i
英文摘要 ii
目錄 iii
圖示目錄 v
表格目錄 vii
1. 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
2. 文獻探討 4
2.1 基因調控網路推導 4
2.1.1 以De.S-System方法推導基因調控網路 4
2.2 基因演算法粒子群最佳化(Genetic Algorithm-Particle Swarm Optimization) 5
2.3 雲端運算 8
2.2.1 Hadoop雲端運算技術 8
2.2.2 MapReduce運算機制 12
2.2.2.1 一般型MapReduce運算 13
2.2.2.2 迭代型MapReduce運算 14
3. 研究方法與架構 15
3.1 GAPSO推導基因調控網路 15
3.2 島嶼式運算模式 19
3.3 MR_IGAPSO (MapReduce Island-based GAPSO) 演算法 23
3.3.1 MR_IGAPSO架構 24
3.3.2 MR_IGAPSO運作流程 29
3.3.3 MR_IGAPSO演算法 32
3.4 實驗設計 36
4. 實驗結果與討論 38
4.1 實驗環境介紹 38
4.2 實驗結果 39
4.2.1 單機IGAPSO與GAPSO模擬結果 39
4.2.2 MR_IGAPSO計算時間比較 45
4.2.2.1 25基因 46
4.2.2.2 50基因 45
4.2.2.3 100基因 47
4.2.2.4 125基因 48
4.2.2.5 綜合比較 49
4.2.3 Hadoop叢集大小與加速倍率比較 52
4.2.3.1 25基因 53
4.2.3.2 50基因 52
4.2.3.3 100基因 53
4.2.3.4 125基因 54
4.2.3.5 綜合比較 54
4.2.4 推導結果綜合比較 56
5. 結論 60
5.1 研究結果與討論 60
5.2 未來研究 61
6. 參考文獻 63
參考文獻 References
1. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System. 19th ACM Symposium on Operating Systems Principles(SOSP), 2003.
2. Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simpli_ed Data Processing on Large Clusters. Operating Systems Design and Implementation (OSDI), 2004.
3. Hadoop, http://hadoop.apache.org/
4. Applications powered by Hadoop: http://wiki.apache.org/hadoop/PoweredBy
5. Nutch, http://nutch.apache.org/
6. de Jong, H., Modeling and Simulation of Genetic Regulatory Systems: A Literature Review. Journal of Computational Biology, 2002. 9(1): p. 67-103.
7. Sima, C., J. Hua, and S. Jung, Inference of Gene Regulatory Networks Using Time-Series Data: A Survey. Current Genomics, 2009. 10: p. 416-429.
8. Cho, K.H., et al., Reverse engineering of gene regulatory networks. Systems Biology, IET, 2007. 1(3): p. 149-163.
9. Hecker, M., et al., Gene regulatory network inference: Data integration in dynamic models--A review. Biosystems, 2009. 96(1): p. 86-103.
10. Noman, N. and H. Iba, Inference of gene regulatory networks using s-system and differential evolution, in Proceedings of the 2005 conference on Genetic and evolutionary computation. 2005, ACM: Washington DC, USA. p. 439-446.
11. Yeh, W.-C., et al., Feasible prediction in S-system models of genetic networks. Expert Systems with Applications, 2011. 38(1): p. 193-197.
12. Nasimul, N., Inferring Gene Regulatory Networks using Differential Evolution with Local Search Heuristics. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007. 4: p. 634-647.
13. Schlitt, T. and A. Brazma, Current approaches to gene regulatory network modelling. BMC Bioinformatics, 2007. 8(Suppl 6): p. S9.
14. Lee, W.-P. and W.-S. Tzou, Computational methods for discovering gene networks from expression data. Briefings in Bioinformatics, 2009. 10(4): p. 408-423.
15. Chou, I.C. and E.O. Voit, Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical Biosciences, 2009. 219(2): p. 57-83.
16. Savageau, M.A., Biochemical systems analysis A study of function and design in molecular biology. 1976: Addison-Wesley.
17. Voit, E.O., Computational Analysis of Biochemical Systems. 2000: Cambridge University Press.
18. Wang, Y., et al., Reconstruct gene regulatory network using slice pattern model. BMC Genomics, 2009. 10(Suppl 1): p. S2.
19. Kennedy, J. and R. Eberhart. Particle swarm optimization. in Neural Networks, 1995. Proceedings., IEEE International Conference on. 1995.
20. Eberhart and S. Yuhui. Particle swarm optimization: developments, applications and resources. in Evolutionary Computation, 2001. Proceedings of the 2001 Congress on. 2001.
21. Kojima, K., Matsuo, H., Ishigame,M., Asynchronous Parallel Distributed GA using Elite Server, Congress on Evolutionary Computation, 2003,Vol. 4, pp. 2603-2610.
22. Yi, W., Liu, Q. He, Y., Dynamic Distributed Genetic Algorithms, Evolutionary Computation, 2000, Vol. 2, pp. 1132-1136.
23. Shinn-Ying Ho, Chih-Hung Hsieh, An Intelligent Two-Stage Evolutionary Algorithm for Dynamic Pathway Identification from Gene Expression Profiles, IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL.4, NO. 4, OCTOBER-DECEMBER 2007
24. Satish Narayana Srirama, Pelle Jakovits , Eero Vainikko, Adapting scientific computing problems to clouds using MapReduce , Future Generation Computer Systems 28 (2012) 184–192. 2012
25. Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, The HaLoop approach to large-scale iterative data analysis, The VLDB Journal, vol. 21, no. 2, 2012, pp. 169-190
26. Yanfeng Zhang, Qixin Gao, Qixin Gao, Cuirong Wang, iMapReduce: A Distributed Computing Framework for Iterative Computation, Journal of Grid Computing. Volume 10 Issue 1, March 2012 ,Pages 47-68
27. G. Sudha Sadasivam,Dharini Selvaraj, A Novel Parallel Hybrid PSO-GA using MapReduce to Schedule Jobs in Hadoop Data Grids, 010 Second World Congress on Nature and Biologically Inspired Computing Dec. 15-17,2010
28. Chao Jin,Vecchiola, C.; Buyya, R., MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms, eScience, 2008. eScience '08. IEEE Fourth International Conference. p214-221 , 2008
29. Abhishek Verma+, Xavier Llor`a∗, David E. Goldberg# and Roy H. Campbell, Scaling Genetic Algorithms using MapReduce, Department of Computer Science∗National Center for Supercomputing Applications (NCSA)#Department of Industrial and Enterprise Systems EngineeringUniversity of Illinois at Urbana-Champaign, IL, US 61801, 2009
30. McNabb, A.W.;Monson, C.K.; Seppi, K.D., Parallel PSO using MapReduce , Evolutionary Computation, 2007. CEC 2007. IEEE Congress on
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code