論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available
論文名稱 Title |
透過unix-like特性在雲端的環境中減少溝通的負擔 Reducing communication overheads in a cloud environment through unix-like features |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
59 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2011-01-18 |
繳交日期 Date of Submission |
2011-02-17 |
關鍵字 Keywords |
雲端 cloud computing |
||
統計 Statistics |
本論文已被瀏覽 5602 次,被下載 0 次 The thesis/dissertation has been browsed 5602 times, has been downloaded 0 times. |
中文摘要 |
本論文描述了一種方法來增加功能和提高性能,以Hadoop 的雲計算基 礎設施。特別是,我們增加了Hadoop 的源代碼文件,從hadoop 執行 時mapper的內部,允許 UNIX腳本運行在雲端環境中的各個任務節點。 我們的研究結果證明,我們所使用的方法相較於其他的替代方法更容 易進行,更容易讓有經驗的UNIX程式設計者所理解,更強大的計算 能力是可能的,或相較於其它的替代方法具有更快的速度。 |
Abstract |
This thesis describes an approach to add functionality and improved performance to the Hadoop infrastructure for cloud computing. In particular, we have added code to the Hadoop source files, to allow unix scripts to run on the task nodes of the cloud, from within the mapper phase of Hadoop execution. Our results show that the new approach is easier to program than other alternatives, more easy to understand for experienced UNIX programmers, more powerful in terms of the kinds of computations that are possible, and as fast or faster to compute than would be the alternatives. |
目次 Table of Contents |
論文審定書------- - ----- ---- ---- ---- --- - ----- ---- ---- ---- --- - ----- ---- ---- i 誌謝- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - i i 中文摘要- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - i i i 英文摘要- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - i v List of Figures --------------------------------------------------------- vi Chapter 1 Introduction 1 Chapter 2 Background Information about Hadoop 9 2 . 1 C l o u d C omp u t i n g - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 9 2 . 2 An Ov e r v i ew o f Ha d o o p - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 11 2.3 The Hadoop Di s t r ibut ed Fi l e Sys t em - - - - - - - - - - - - - - - - - - - - - - - - - - - - 12 2.4 MapReduce - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 13 Chapter 3 Relate Work 15 3 . 1 Ha d o o p ’ s ma s t e r / s l a v e s t r u c t u r e - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 5 3 . 2 T h e HDF S a r c h i t e c t u r e - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 6 3 . 3 HADOOP Re a d /Wr i t e o p e r a t i o n s - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 8 3.4 MapReduce - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 18 vii 3.5 Varying the number of reduce tasks ---------------------------------- 20 3.6 Pig Lat in - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 23 3.7 Dryad - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -- - - - -- - - - - - - - - - - - - - - - - - -- - - - -- - - - - - - - - - 25 Chapter 4 Method Overview 29 4.1 How Pig Lat in works - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 29 4.2 Modifying the Local Combiner ------------------------------------------- 35 Chapter 5 Experimental and Discussion 36 5.1 Experiment 1: Finding unique words following a pattern -------- 36 5.2 Experiment 2: Advanced search for a pattern ------------------- 39 5.3 Experiment 3: Replacing a pattern ------------------------------------ 43 Chapter 6 Conclusions and Future Work 47 Bibliography 48 |
參考文獻 References |
[1] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Communications of the ACM, 51 (1): 107-113, 2008. [2] Hadoop, http://lucene.apache.org/hadoop [3] Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007. [4] Yahoo! Launches World’s Largest Hadoop Production Application, http://tinyurl.com/2hgzv7 [5] Applications powered by Hadoop: http://wiki.apache.org/hadoop/PoweredBy [6] Presentations by S. Schlosser and J. Lin at the 2008 Hadoop Summit. tinyurl.com/4a6lza [7] D. Gottfrid, Self-service, Prorated Super Computing Fun, New York Times Blog, tinyurl.com/2pjh5n [8] Figure from slide deck on MapReduce from Google academic cluster, tinyurl.com/4zl6f5. Available under Creative Commons Attribution 2.5 License. [9] R. Pike, S. Dorward, R. Griesemer, S. Quinlan. Interpreting the Data: Parallel Analysis with Sawzall, Scientific Programming Journal, 13 (4): 227-298, Oct. 2005. [10] C. Olston, B. Reed, U. Srivastava, R. Kumar and A.Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. ACM SIGMOD 2008, June 2008. [11] E.B. Nightingale, P.M. Chen, and J.Flinn. Speculative execution in a distributed file system. ACM Trans. Comput. Syst., 24 (4): 361-392, November 2006. [12] Amazon EC2 Instance Types, tinyurl.com/3zjlrd 49 [13] B.Dragovic, K.Fraser, S.Hand, T.Harris, A.Ho, I.Pratt, A.Warfield, P.Barham, and R.Neugebauer. Xen and the art of virtualization. ACM SOSP 2003. [14] Personal communication with the Yahoo! Hadoop team and with Joydeep Sen Sarma from Facebook. [15] J. Bernardin, P. Lee, J. Lewis, DataSynapse, Inc. Using Execution statistics to select tasks for redundant assignment in a distributed computing platform. Patent number 7093004, filed Nov 27, 2002, issued Aug 15, 2006. [16] G. E. Blelloch, L. Dagum, S. J. Smith, K. Thearling, M. Zagha. An evaluation of sorting as a supercomputer benchmark. NASA Technical Reports, Jan 1993. [17] EC2 Case Studies, tinyurl.com/46vyut [18] Mor Harchol-Balter, Task Assignment with Unknown Duration. Journal of the ACM, 49 (2): 260-288, 2002. [19] M.Crovella, M.Harchol-Balter, and C.D. Murta. Task assignment in a distributed system: Improving performance by unbalancing load. In Measurement and Modeling of Computer Systems, pp. 268-269, 1998. [20] B.Ucar, C.Aykanat, K.Kaya, and M.Ikinci. Task assignment in heterogeneous computing systems. J. of Parallel and Distributed Computing, 66 (1): 32-46, Jan 2006. [21] S.Manoharan. Effect of task duplication on the assignment of dependency graphs. Parallel Comput., 27 (3): 257-268, 2001. [22] Y. Su, M. Attariyan, J. Flinn AutoBash: improving configuration management with operating system causality analysis. ACM SOSP 2007. [23] G. Barish. Speculative plan execution for information agents. PhD dissertation, University of Southernt California.Dec 2003 |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外均不公開 not available 開放時間 Available: 校內 Campus:永不公開 not available 校外 Off-campus:永不公開 not available 您的 IP(校外) 位址是 3.237.32.143 論文開放下載的時間是 校外不公開 Your IP address is 3.237.32.143 This thesis will be available to you on Indicate off-campus access is not available. |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |