Responsive image
博碩士論文 etd-0129113-125059 詳細資訊
Title page for etd-0129113-125059
論文名稱
Title
Hadoop雲端計算平台之海量資料服務分析與安全考量
Big Data Analysis and Security Consideration with Hadoop-based Cloud Platform
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
78
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-01-18
繳交日期
Date of Submission
2013-01-29
關鍵字
Keywords
雲端運算、Hadoop、MapReduce、網路流量、資訊安全、身份認證
Network traffic, Security, Authentication, MapReduce, Cloud computing, Hadoop
統計
Statistics
本論文已被瀏覽 5792 次,被下載 213
The thesis/dissertation has been browsed 5792 times, has been downloaded 213 times.
中文摘要
雲端運算架構具備大資料儲存、可動態擴充、高效率分散式運算等特點,可以用來處理高速網路的流量分析與監控等應用。我們測試了基於Hadoop系統的網路封包即時分析暨流量統計之可用性,同時除了提出兩種MapReduce策略來比較處理程序設計良劣間的差異外,也針對檔案儲存區塊大小、Map與Reduce程序數量的影響及效能提升程度等方面進行測試與比較。由結果發現,網路流量分析類型的應用(如:IDS/IPS網路偵防系統)非常適合發展於Hadoop架構之上。
我們同時也設計和實作一個安全的Hadoop叢集。我們除針對使用者認證問題提出了相關解決方法,也分析在Hadoop傳輸過程中可能的安全問題,以避免重要資訊(Block ID、Job ID、username)暴露於不信任的網路中,研究中採用IPSec來實作Hadoop傳輸加密及封包的驗證。因而我們提出一個安全的Hadoop叢集架構,以解決目前Hadoop所存在的安全問題。最後,針對此系統進行HDFS和MapReduce的運作效能測量分析。
Abstract
We use Hadoop to construct a trial cloud platform to tackle the challenges derived from the heavy traffic, and perform network traffic analysis and target packet dissection. Our testbed evaluates the performance against different processing strategies, and shows the statistic results. According to the experimental results, the Hadoop-based platform can fit to process massive traffic data very well, and it’s appropriate to manage today’s data centers with cloud computing services.
We also try to construct a highly secure Hadoop platform with small deployment cost, robust attacking prevention, and less performance degradation. Simulation results reveal the feasibility of security mechanisms, and find that the more important thing to construct cloud platforms with appropriate security mechanisms is to consider the application requirements, which could be a better trade-off between security and user requirement.
目次 Table of Contents
Acknowledgements iv
摘要 v
Abstract vi
Content vii
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
1.1 Motivation and Objectives 2
1.2 Organization of the Dissertation 4
Chapter 2 Big Data Era 5
2.1 Data Deluge 5
2.2 Cloud computing 7
2.2.1 IaaS 8
2.2.2 PaaS 10
2.2.3 SaaS 10
2.2.4 The Features of Cloud Computing 11
2.3 Cloud Technologies 13
2.3.1 Virtualization 13
2.3.2 Hadoop 14
Chapter 3 Hadoop Overview 17
3.1 Hadoop Computing Framework and Daemons 17
3.1.1 System Architecture 18
3.2 HDFS 20
3.3 MapReduce 23
3.1.4 HBase 27
3.2 The Features of Hadoop 27
3.2.1 Advantages of Hadoop 29
3.3 Hadoop Security Issues 29
3.3.1 Authentication Mechanism 30
3.3.2 Security Framework 31
Chapter 4 Unusual Traffic Detection Framework 35
4.1 Novel Cloud Application 36
4.1.1 Traffic Analysis 36
4.1.2 Packet Dissection (HTTP Packet Dissection) 39
4.2 Experimental Environment 42
4.2.1 Traffic Analysis 44
4.2.2 HTTP Packet Dissection 46
4.3 Experimental Result and Analysis 46
Chapter 5 High Security Cloud Platform Design and Performance Evaluation 50
5.1 Secure Architecture Overview 50
5.2 High Secure Hadoop Testbed Framework 54
5.3 Experiments Results 56
5.4 Platform Security Analysis and Deployment 59
Chapter 6 Conclusions and Future Works 61
References 63
參考文獻 References
[1] A. Verma and S. Kaushal, Cloud computing security issues and challenges: a survey. Advances in Computing and Communications, vol. 193, 2011, pp. 445–454.
[2] H. Takabi, J. B. D. Joshi, and G. Ahn, Security and privacy challenges in cloud computing environments. IEEE Security & Privacy, vol. 8, 2010, pp. 24–31.
[3] D. Zissis and D. Lekkas, Addressing cloud computing security issues. Future Generation Computer Systems, vol. 28, 2012, pp. 583–592.
[4] J. Wu, Q. Shen, T. Wang, J. Zhu, and J. Zhang, Recent advances in cloud security. Journal of Computers, vol. 6, 2011, pp. 2156–2163.
[5]Google Inc., Google Fiber Blog, available from: http://googlefiberblog.blogspot.com/
[6] Q. Zhang, L. Cheng, and R. Boutaba, Cloud computing: state-of-the-art and research challenges. Journal of Internet Services and Applications, Vol. 1, No. 1, 2010, pp. 7–18.
[7] Apache Software Foundation, Apache Hadoop Project, available from: http://hadoop.apache.org/
[8] Apache Software Foundation, Apache HBase Project, available from: http://hbase.apache.org/
[9] O. O’Malley, K. Zhang, S. Radia, R. Marti, and C. Harrell, Hadoop security design. 2009, available from: http://carfield.com.hk:8080/document/distributed/hadoop-security-design.pdf
[10] M. Leech, M. Ganis, Y. Lee, R. Kuris, D. Koblas, and L. Jones, SOCKS protocol version 5. IETF RFC 1928, Mar. 1996, available from: http://tools.ietf.org/html/rfc1928
[11] M. Leech, Username/Password authentication for SOCKS V5. IETF RFC 1929, Mar. 1996, available from: http://tools.ietf.org/html/rfc1929
[12] R. Thayer, N. Doraswamy, and R. Glenn, IP security document roadmap. IETF RFC 2411, Nov. 1998, available from: http://tools.ietf.org/html/rfc2411
[13] S. Kent and K. Seo, Security architecture for the internet protocol. IETF RFC 4301, Dec. 2005, available from: http://tools.ietf.org/html/rfc4301
[14] S. Kent, IP authentication header. IETF RFC 4302, Dec. 2005, available from: http://tools.ietf.org/html/rfc4302
[15] S. Kent, IP encapsulating security payload (ESP). IETF RFC 4303, Dec. 2005, available from: http://tools.ietf.org/html/rfc4303
[16] V. Manral, Cryptographic algorithm implementation requirements for encapsulating security payload (ESP) and authentication header (AH). IETF RFC 4835, Apr. 2007, available from: http://tools.ietf.org/html/rfc4835
[17] A. Halevy, P. Norvig, and F. Pereira, The Unreasonable Effectiveness of Data. IEEE Intelligent Systems, Vol. 24, No 2, March 2009, pp. 8–12.
[18]J. M. Hellerstein, Programming a Parallel Future. Technical Report UCB/EECS-2008-144, University of California, Berkeley, Nov. 2008, available from: http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-144.html
[19] R. E. Bryant, Data-intensive scalable computing for scientific applications. Computing in Science and Engineering, 2011, pp. 25–33.
[20] A. S. Szalay, Extreme data-intensive scientific computing. Computing in Science and Engineering, 2011, pp. 34–41.
[21] P. Mell, and T. Grance, The NIST definition of cloud computing. NIST special publication 800-145, Sep., 2011, available from: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
[22] Apache Software Foundation, Apache Nutch Project, available from: http://nutch.apache.org/
[23] Apache Software Foundation, Apache Lucene Project, available from: http://lucene.apache.org/java/docs/index.html
[24] J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters. Communication of the ACM, Vol. 51, No. 1, 2008, pp. 107–113.
[25] Yahoo! Inc., Yahoo! Launches World’s Largest Hadoop Production Application, Yahoo! Developer Network, 2008, available from: http://developer.yahoo.com/blogs/hadoop/posts/2008/02/yahoo-worlds-largest-production-hadoop/
[26] S. Ghemawat, H. Gobioff, and S.-T. Leung, The google file system. 19th ACM Symposium on Operating Systems Principles, Lake George, NY, Oct. 2003, pp. 20–43.
[27] K. Shvachko, H. Huang, S. Radia, and R. Chansler, The hadoop distributed file system. 26th IEEE Symposium on Massive Storage Systems and Technologies, Incline Village, Nevada, May 2010.
[28] T. White, Hadoop: The Definitive Guide, 3/e (early release). O’Reilly Media / Yahoo Press, 2012.
[29] J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, 6th USENIX OSDI, 2004, pp. 137–150.
[30] D. DeWitt and M. Stonebraker, MapReduce: A major step backwards. The Database Column, 1, 2008.
[31] A. Pavlo et al, A comparison of approaches to large-scale data analysis. ACM SIGMOD, 2009, pp. 165–178.
[32] M. Stonebraker et al, MapReduce and parallel DBMSs: friends or foes? Communications of the ACM, Vol. 53, No. 1, 2010, pp. 64–71.
[33] E. Anderson et al, Efficiency matters! ACM SIGOPS Operating Systems Review, Vol. 44, No. 1, 2010, pp. 40–45.
[34] C. Ordonez et al, Relational versus Non-Relational Database Systems for Data Warehousing. ACM DOLAP, 2010, pp. 67–68.
[35] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), Vol. 26, No. 2, 2008, pp. 1–26.
[36] HDFS proxy guide. The Apache Software Foundation, 2009, available from: http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfsproxy.pdf
[37] Service level authorization guide. The Apache Software Foundation, 2008, available from: http://hadoop.apache.org/common/docs/stable/service_level_auth.pdf
[38] A. Kimball, Securing a Hadoop cluster through a gateway. Cloudera, 2008, available form: http://www.cloudera.com/blog/2008/12/securing-a-hadoop-cluster-through-a-gateway/
[39] A. Greenberg , J. R. Hamilton , N. Jain , S. Kandula , C. Kim , P. Lahiri , D. A. Maltz , P. Patel and S. Sengupta, VL2: a scalable and flexible data center network, ACM SIGCOMM on Data communication, Barcelona, Spain, Aug. 16-21, 2009.
[40] Data Center Knowledge. Microsoft Unveils Its Container-Powered Cloud, Sep. 30, 2009, available form: http://www.datacenterknowledge.com/archives/2009/09/30/microsoft-unveils-its-container-powered-cloud/
[41] K. Fujii, Jpcap: a java library for capturing and sending network packets. 2000, available form: http://netresearch.ics.uci.edu/kfujii/jpcap/doc/
[42] Y. Lee, W. Kang, and H. Son, An internet traffic analysis method with mapreduce. 1st IEEE/IFIP International Workshop on Cloud Management, Osaka, Japan, Apr. 2010, pp. 357–361.
[43] Announcing the Advanced Encryption Standard (AES). Federal Information Processing Standards (FIPS) Publication 197, Nov. 26, 2001.
[44] W. C. Barker, Recommendation for the Triple Data Encryption Algorithm (TDEA) Block Cipher. NIST Special Publication 800-67 Version 1.2, Jul. 19, 2011.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code