Responsive image
博碩士論文 etd-0217111-001208 詳細資訊
Title page for etd-0217111-001208
論文名稱
Title
透過unix-like特性在雲端的環境中減少溝通的負擔
Reducing communication overheads in a cloud environment through unix-like features
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
59
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2011-01-18
繳交日期
Date of Submission
2011-02-17
關鍵字
Keywords
雲端
cloud computing
統計
Statistics
本論文已被瀏覽 5602 次,被下載 0
The thesis/dissertation has been browsed 5602 times, has been downloaded 0 times.
中文摘要
本論文描述了一種方法來增加功能和提高性能,以Hadoop 的雲計算基
礎設施。特別是,我們增加了Hadoop 的源代碼文件,從hadoop 執行
時mapper的內部,允許 UNIX腳本運行在雲端環境中的各個任務節點。
我們的研究結果證明,我們所使用的方法相較於其他的替代方法更容
易進行,更容易讓有經驗的UNIX程式設計者所理解,更強大的計算
能力是可能的,或相較於其它的替代方法具有更快的速度。
Abstract
This thesis describes an approach to add functionality and improved performance to the
Hadoop infrastructure for cloud computing. In particular, we have added code to the Hadoop
source files, to allow unix scripts to run on the task nodes of the cloud, from within the
mapper phase of Hadoop execution.
Our results show that the new approach is easier to program than other alternatives, more
easy to understand for experienced UNIX programmers, more powerful in terms of the kinds
of computations that are possible, and as fast or faster to compute than would be the
alternatives.
目次 Table of Contents
論文審定書------- - ----- ---- ---- ---- --- - ----- ---- ---- ---- --- - ----- ---- ---- i
誌謝- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - i i
中文摘要- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - i i i
英文摘要- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - i v
List of Figures --------------------------------------------------------- vi
Chapter 1 Introduction 1
Chapter 2 Background Information about Hadoop 9
2 . 1 C l o u d C omp u t i n g - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 9
2 . 2 An Ov e r v i ew o f Ha d o o p - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 11
2.3 The Hadoop Di s t r ibut ed Fi l e Sys t em - - - - - - - - - - - - - - - - - - - - - - - - - - - - 12
2.4 MapReduce - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 13
Chapter 3 Relate Work 15
3 . 1 Ha d o o p ’ s ma s t e r / s l a v e s t r u c t u r e - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 5
3 . 2 T h e HDF S a r c h i t e c t u r e - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 6
3 . 3 HADOOP Re a d /Wr i t e o p e r a t i o n s - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 8
3.4 MapReduce - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 18
vii
3.5 Varying the number of reduce tasks ---------------------------------- 20
3.6 Pig Lat in - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 23
3.7 Dryad - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -- - - - -- - - - - - - - - - - - - - - - - - -- - - - -- - - - - - - - - - 25
Chapter 4 Method Overview 29
4.1 How Pig Lat in works - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 29
4.2 Modifying the Local Combiner ------------------------------------------- 35
Chapter 5 Experimental and Discussion 36
5.1 Experiment 1: Finding unique words following a pattern -------- 36
5.2 Experiment 2: Advanced search for a pattern ------------------- 39
5.3 Experiment 3: Replacing a pattern ------------------------------------ 43
Chapter 6 Conclusions and Future Work 47
Bibliography 48
參考文獻 References
[1] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In
Communications of the ACM, 51 (1): 107-113, 2008.
[2] Hadoop, http://lucene.apache.org/hadoop
[3] Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Michael
Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. European Conference on
Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007.
[4] Yahoo! Launches World’s Largest Hadoop Production Application,
http://tinyurl.com/2hgzv7
[5] Applications powered by Hadoop: http://wiki.apache.org/hadoop/PoweredBy
[6] Presentations by S. Schlosser and J. Lin at the 2008 Hadoop Summit. tinyurl.com/4a6lza
[7] D. Gottfrid, Self-service, Prorated Super Computing Fun, New York Times Blog,
tinyurl.com/2pjh5n
[8] Figure from slide deck on MapReduce from Google academic cluster, tinyurl.com/4zl6f5.
Available under Creative Commons Attribution 2.5 License.
[9] R. Pike, S. Dorward, R. Griesemer, S. Quinlan. Interpreting the Data: Parallel Analysis
with Sawzall, Scientific Programming Journal, 13 (4): 227-298, Oct. 2005.
[10] C. Olston, B. Reed, U. Srivastava, R. Kumar and A.Tomkins. Pig Latin: A
Not-So-Foreign Language for Data Processing. ACM SIGMOD 2008, June 2008.
[11] E.B. Nightingale, P.M. Chen, and J.Flinn. Speculative execution in a distributed file
system. ACM Trans. Comput. Syst., 24 (4): 361-392, November 2006.
[12] Amazon EC2 Instance Types, tinyurl.com/3zjlrd
49
[13] B.Dragovic, K.Fraser, S.Hand, T.Harris, A.Ho, I.Pratt, A.Warfield, P.Barham, and
R.Neugebauer. Xen and the art of virtualization. ACM SOSP 2003.
[14] Personal communication with the Yahoo! Hadoop team and with Joydeep Sen Sarma
from Facebook.
[15] J. Bernardin, P. Lee, J. Lewis, DataSynapse, Inc. Using Execution statistics to select
tasks for redundant assignment in a distributed computing platform. Patent number 7093004,
filed Nov 27, 2002, issued Aug 15, 2006.
[16] G. E. Blelloch, L. Dagum, S. J. Smith, K. Thearling, M. Zagha. An evaluation of sorting
as a supercomputer benchmark. NASA Technical Reports, Jan 1993.
[17] EC2 Case Studies, tinyurl.com/46vyut
[18] Mor Harchol-Balter, Task Assignment with Unknown Duration. Journal of the ACM, 49
(2): 260-288, 2002.
[19] M.Crovella, M.Harchol-Balter, and C.D. Murta. Task assignment in a distributed system:
Improving performance by unbalancing load. In Measurement and Modeling of Computer
Systems, pp. 268-269, 1998.
[20] B.Ucar, C.Aykanat, K.Kaya, and M.Ikinci. Task assignment in heterogeneous computing
systems. J. of Parallel and Distributed Computing, 66 (1): 32-46, Jan 2006.
[21] S.Manoharan. Effect of task duplication on the assignment of dependency graphs.
Parallel Comput., 27 (3): 257-268, 2001.
[22] Y. Su, M. Attariyan, J. Flinn AutoBash: improving configuration management with
operating system causality analysis. ACM SOSP 2007.
[23] G. Barish. Speculative plan execution for information agents. PhD dissertation,
University of Southernt California.Dec 2003
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.237.32.143
論文開放下載的時間是 校外不公開

Your IP address is 3.237.32.143
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code