Responsive image
博碩士論文 etd-0822111-220155 詳細資訊
Title page for etd-0822111-220155
論文名稱
Title
藉由早期部分結果結合降低通訊成本和運算代價
Reducing Communication Overhead and Computation Costs in a Cloud Network by Early Combination of Partial Results
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
46
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2011-06-29
繳交日期
Date of Submission
2011-08-22
關鍵字
Keywords
雲端計算、雲端運算
cloud computing, MapReduce, Hadoop
統計
Statistics
本論文已被瀏覽 5701 次,被下載 155
The thesis/dissertation has been browsed 5701 times, has been downloaded 155 times.
中文摘要
本論文介紹一個方法用於降低雲端環境下MapRuduce 架構通訊的成本。MapReduce 是個框架,為大量處存在分散式計算機網路的資料提供平行運算。MapReduce其中一個優點是通常會在資料處存的電腦(節點)端執行運算。這方法不僅會達到平行處理的效果,也會使許多擁有運算程式小於輸入資料的應用程式受益。
我們的方法也因這個特性受惠。我們延後任一個給定節點結果的傳送,這樣才能使這些結果先在本地端結合。這樣做會有兩項優勢。首先,會降低最終傳輸的資料量。第二項,允許額外跨文件的運算(例如,merge-sort)。
然而,延後傳送結果會有一個限制,因為MapReduce的Reduce階段必須等到所有的節點都發送結果。因此,我們設計一個機制讓使用者可以自行調整延後資料發送的時間。
Abstract
This thesis describes a method of reducing communication overheads within the MapReduce infrastructure of a cloud computing environment. MapReduce is an framework for parallelizing the processing on massive data systems stored across a
distributed computer network. One of the benefits of MapReduce is that the computation is usually performed on a computer (node) that holds the data file. Not
only does this approach achieve parallelism, but it also benefits from a characteristic common to many applications: that the answer derived from a computation is often smaller than the size of the input file.
Our new method benefits also from this feature. We delay the transmission of individual answers out a given node, so as to allow these answers to be combined locally, first. This combination has two advantages. First, it allows for a further reduction in the amount of data to ultimately transmit. And second, it allows for additional computation across files (such as a merge-sort).
There is a limit to the benefit of delaying transmission, however, because the reducer stage of MapReduce cannot begin its work until the nodes transmit their answers. We therefore consider a mechanism to allow the user to adjust the amount of delay before data transmission out of each node.
目次 Table of Contents
論文審定書.....................................................................................................................i
摘要................................................................................................................................ii
Abstract ........................................................................................................................ iii
Chapter 1 Introduction ...................................................................................................1
Chapter 2 Related Work.................................................................................................6
Chapter 3 Background ................................................................................................9
3.1 Hadoop Core (Common)..................................................................................9
3.2 HDFS ...............................................................................................................9
3.3 MapReduce ....................................................................................................12
3.4 HBase.............................................................................................................18
3.5 Pig ..................................................................................................................19
3.6 ZooKeeper......................................................................................................20
Chapter 4 Methodology ...............................................................................................21
Chapter 5 Experimental Results...................................................................................30
Chapter 6 Conclusion...................................................................................................36
Bibliography ................................................................................................................37
參考文獻 References
[1] Jeffery Dean and Sanjay Ghemawat,
MapReduce: Simplified Data Processing on Large Clusters.
Google, Inc. Magazine Communications of the ACM - 50th anniversary issue:
1958 – 2008, Volume 51 Issue 1, January 2008
[2] David Pan,
Reducing communication overheads in a cloud environment through unix-like
features.
National Sun Yat-sen University, February 2011
[3] Shadi Lbrahim, Hai Jin, Bin Cheng, Haijun Cao, Song Wu
CLOUDLET: towards mapreduce implementation on virtual machines.
Cluster and Grid Computing Lab Services Computing Technology and System
Lab Huazhong University of Science and Technology Wuhan, 430074,
China
[4] Diana Moise, Gabriel Antoniu, Luc Bouge
Improving the Hadoop Map/Reduce Framework to Support Concurrent Appends
through the BlobSeer BLOB management system
Proceedings of the HPDC ‘10 Proceedings of the 19th ACM International
Symposium on High Performance Distributed Computing
[5] Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica
38
Improving MapReduce Performance in Heterogeneous Environments.
USENIX Association Berkeley, CA, USA ©2008.
[6] Yen-Liang Su, Po-Cheng Chen, Jyh-Biau Chang, Ce-Kuen Shieh
Variable-sized map and locality-aware reduce on public-resource grids.
Advances in Grid and Pervasive Computing
Lecture Notes in Computer Science, 2010, Volume 6104/2010
National Cheng Kung University, Taiwan
[7] Hadoop. http://hadoop.apache.org/.
[8] Tom White foreword by Doug Cutting,
Hadoop The Definitive Guide.
O’REILLY Beijing, Cambridge, Farnham, Koln, Sebastopol, Taipei, Tokyo
[9] Naushad UzZaman, Instructed By: Sandhya Dwarkadas
Survey on Google File System.
Survey Paper for CSC 456 (Operating Systems), University of Rochester, Fall
2007
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code