論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
藉由早期部分結果結合降低通訊成本和運算代價 Reducing Communication Overhead and Computation Costs in a Cloud Network by Early Combination of Partial Results |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
46 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2011-06-29 |
繳交日期 Date of Submission |
2011-08-22 |
關鍵字 Keywords |
雲端計算、雲端運算 cloud computing, MapReduce, Hadoop |
||
統計 Statistics |
本論文已被瀏覽 5701 次,被下載 155 次 The thesis/dissertation has been browsed 5701 times, has been downloaded 155 times. |
中文摘要 |
本論文介紹一個方法用於降低雲端環境下MapRuduce 架構通訊的成本。MapReduce 是個框架,為大量處存在分散式計算機網路的資料提供平行運算。MapReduce其中一個優點是通常會在資料處存的電腦(節點)端執行運算。這方法不僅會達到平行處理的效果,也會使許多擁有運算程式小於輸入資料的應用程式受益。 我們的方法也因這個特性受惠。我們延後任一個給定節點結果的傳送,這樣才能使這些結果先在本地端結合。這樣做會有兩項優勢。首先,會降低最終傳輸的資料量。第二項,允許額外跨文件的運算(例如,merge-sort)。 然而,延後傳送結果會有一個限制,因為MapReduce的Reduce階段必須等到所有的節點都發送結果。因此,我們設計一個機制讓使用者可以自行調整延後資料發送的時間。 |
Abstract |
This thesis describes a method of reducing communication overheads within the MapReduce infrastructure of a cloud computing environment. MapReduce is an framework for parallelizing the processing on massive data systems stored across a distributed computer network. One of the benefits of MapReduce is that the computation is usually performed on a computer (node) that holds the data file. Not only does this approach achieve parallelism, but it also benefits from a characteristic common to many applications: that the answer derived from a computation is often smaller than the size of the input file. Our new method benefits also from this feature. We delay the transmission of individual answers out a given node, so as to allow these answers to be combined locally, first. This combination has two advantages. First, it allows for a further reduction in the amount of data to ultimately transmit. And second, it allows for additional computation across files (such as a merge-sort). There is a limit to the benefit of delaying transmission, however, because the reducer stage of MapReduce cannot begin its work until the nodes transmit their answers. We therefore consider a mechanism to allow the user to adjust the amount of delay before data transmission out of each node. |
目次 Table of Contents |
論文審定書.....................................................................................................................i 摘要................................................................................................................................ii Abstract ........................................................................................................................ iii Chapter 1 Introduction ...................................................................................................1 Chapter 2 Related Work.................................................................................................6 Chapter 3 Background ................................................................................................9 3.1 Hadoop Core (Common)..................................................................................9 3.2 HDFS ...............................................................................................................9 3.3 MapReduce ....................................................................................................12 3.4 HBase.............................................................................................................18 3.5 Pig ..................................................................................................................19 3.6 ZooKeeper......................................................................................................20 Chapter 4 Methodology ...............................................................................................21 Chapter 5 Experimental Results...................................................................................30 Chapter 6 Conclusion...................................................................................................36 Bibliography ................................................................................................................37 |
參考文獻 References |
[1] Jeffery Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters. Google, Inc. Magazine Communications of the ACM - 50th anniversary issue: 1958 – 2008, Volume 51 Issue 1, January 2008 [2] David Pan, Reducing communication overheads in a cloud environment through unix-like features. National Sun Yat-sen University, February 2011 [3] Shadi Lbrahim, Hai Jin, Bin Cheng, Haijun Cao, Song Wu CLOUDLET: towards mapreduce implementation on virtual machines. Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology Wuhan, 430074, China [4] Diana Moise, Gabriel Antoniu, Luc Bouge Improving the Hadoop Map/Reduce Framework to Support Concurrent Appends through the BlobSeer BLOB management system Proceedings of the HPDC ‘10 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing [5] Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica 38 Improving MapReduce Performance in Heterogeneous Environments. USENIX Association Berkeley, CA, USA ©2008. [6] Yen-Liang Su, Po-Cheng Chen, Jyh-Biau Chang, Ce-Kuen Shieh Variable-sized map and locality-aware reduce on public-resource grids. Advances in Grid and Pervasive Computing Lecture Notes in Computer Science, 2010, Volume 6104/2010 National Cheng Kung University, Taiwan [7] Hadoop. http://hadoop.apache.org/. [8] Tom White foreword by Doug Cutting, Hadoop The Definitive Guide. O’REILLY Beijing, Cambridge, Farnham, Koln, Sebastopol, Taipei, Tokyo [9] Naushad UzZaman, Instructed By: Sandhya Dwarkadas Survey on Google File System. Survey Paper for CSC 456 (Operating Systems), University of Rochester, Fall 2007 |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |