國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,運用雲端運算改善偵測效能,Improving Detection Efficiency using Cloud Computing

論文名稱 Title	運用雲端運算改善偵測效能 Improving Detection Efficiency using Cloud Computing
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	105 學年度第 1 學期 The fall semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	58
研究生 Author	張育涵 Yu-han Chang
指導教授 Advisor	陳嘉玫 Chia-Mai Chen
召集委員 Convenor	賴谷鑫 Gu-Hsin Lai
口試委員 Advisory Committee	林輝堂, 王智弘 Hui-Tang Lin; Chih-Hung Wang
口試日期 Date of Exam	2016-07-25	繳交日期 Date of Submission	2016-09-05
關鍵字 Keywords	入侵偵測系統、分散式檔案系統、雲端運算 Distributed File System, Cloud Computing, Intrusion Detection System
統計 Statistics	本論文已被瀏覽 5960 次，被下載 52 次 The thesis/dissertation has been browsed 5960 times, has been downloaded 52 times.

中文摘要
近年來，隨著網際網路的便利以及迅速普及的特性，使得資訊安全遭受到前所未有的挑戰與威脅，網路成為攻擊者有利可圖的途徑之一，許多組織、企業以及政府單位，為了要對複雜且多變的網路攻擊行為進行防禦，大多會添購防火牆、入侵偵測系統，或是入侵防護系統等資訊安全設備。隨著組織與企業添購的資安設備越來越多後，也衍生出相關的難題，各種不同資料來源格式的Log、資料記錄檔案過大，以及傳統架構下的入侵偵測系統無法進行對長時間的Log資料進行分散式儲存、平行運算等分析，也使得組織與企業在資料處理的能力遭受到前所未有的挑戰與威脅。本研究所提出的巨量資料運算框架，分別透過儲存空間、資料節點數、CPU、記憶體與網路頻寬不同的搭配與設定，替組織與企業內部提供了雲端環境配置的參考依據，並藉由對目前資料進行全面性的分析，衡量雲端運算以及使用雲端儲存環境之使用。本研究所提出雲端環境中的異常偵測系統架構，不僅可以透過Hadoop多節點叢集，對巨量資料記錄檔進行分散式儲存，更使用雲端運算中Spark的優點，改善傳統入侵偵測系統無法進行長時間區間Event Correlation的缺點，藉此建立雲端運算中的入侵偵測系統架構。同時，本研究收集真實企業之巨量資料，來做為實驗之資料集，在傳統架構與雲端環境中處理，作為偵測效能以及系統效率差異比較，證實巨量資料處理在雲端叢集式系統中有較好的偵測效能與表現。
Abstract
Recently, with the popularity and convenience features of Internet, Internet has become one of the attacker profitable way to enter the local area network. Most organizations, companies and government agencies will purchase the firewall, intrusion detection systems, intrusion prevention systems or other information security system to prevent and defense their network. With the increasing of the security infrastructure and system, these problems can have a significant impact on organizations. For example, All kinds of Raw Log Messages in different formats and big data storage are important issues. The traditional data analysis architecture by means of a powerful server has serious performance issues when processing big data. This study proposes a cloud computing architecture by deploy the settings of storage space, number of namenode and datanode, CPU, memory and network bandwidth to make cloud computing system more efficacy. This study proposes an open source cloud computing platform solution for storing and analyzing big data. Clustered and distributed storage provided by the open source cloud platform, Hadoop, improves the time and storage issue faced in traditional centralized architecture. To improve the bottleneck of the read/write access time during big data processing, in-memory processing technology, Spark, is adopted to reduce the number of disk accesses. The experimental results demonstrate that the proposed cloud platform provides a great performance improvement.

目次 Table of Contents
摘要 i Abstract ii 圖次 iv 表次 vi 第1章緒論 1 1.1 研究背景 1 1.2 研究動機 5 1.3 研究目的 9 第2章文獻探討 10 2.1 雲端運算 10 2.2 Hadoop 12 2.3 分散式檔案系統 14 2.4 平行運算 15 2.5 Spark 19 2.6 入侵偵測方法 21 2.7 雲端運算增進之效能 23 第3章系統設計 25 3.1 系統架構 25 3.2 系統參數 27 3.3 系統元件描述 31 3.4 雲端叢集環境 34 第4章系統評估 36 4.1 資料集 36 4.2 實驗參數 37 4.3 系統評估 37 4.4 系統效能 42 4.5 系統比較 43 第5章結論與未來展望 45 參考文獻 47

參考文獻 References
[1] M. Burstein, C. Bussler, T. Finin, M. Huhns, M. Paolucci, A. Sheth, et al., "A semantic web services architecture," IEEE Internet Computing, vol. 9, pp. 72-81, 2005. [2] M. J. Franklin, M. J. Carey, and M. Livny, "Transactional client-server cache consistency: Alternatives and performance," ACM Transactions on Database Systems (TODS), vol. 22, pp. 315-363, 1997. [3] K. McGaughey, "EMC news: Worl’ data more than doubling every Two Years—Driving big data opportunity, new IT roles," 2011. [Online]. Available: https://www.emc.com/about/news/press/2011/20110628-01.htm. [Accessed: 12- Sep- 2016] [4] DatafloqHome, "Big data at Walmart is all about big numbers; 40 Petabytes a day!," 2015. Available: https://datafloq.com/read/big-data-walmart-big-numbers-40-petabytes/1175. [Accessed: 12- Sep- 2016] [5] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, et al., "A view of cloud computing," Communications of the ACM, vol. 53, pp. 50-58, 2010. [6] iThome 電腦周刊, " 3種Big Data部署方式",2012. [Online]. Available: http://www.ithome.com.tw/tech/88330. [Accessed: 12- Sep- 2016] [7] iThome電腦周刊, "解決生產良率痛點半導體產業走入大資料分析", 2014.[online]. Available:http://www.ithome.com.tw/news/92292. [Accessed: 12- Sep- 2016] [8] iThome電腦周刊, "力可科技導入Cassandra開源資料庫 10分鐘能擴充20萬人用量", 2013. [online]. Available:http://www.ithome.com.tw/tech/87417[Accessed: 12- Sep- 2016] [9] iThome電腦周刊, "趨勢科技用Spark打造大資料分析架構，克服單日GB級APT資料分析挑戰", 2016. [online]. Available: http://www.ithome.com.tw/news/103292. [Accessed: 12- Sep- 2016] [10] J. Han, E. Haihong, G. Le, and J. Du, "Survey on NoSQL database," in Pervasive computing and applications (ICPCA), 2011 6th international conference on, 2011, pp. 363-366. [11] P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, "An evaluation study on log parsing and its use in log mining," in Dependable Systems and Networks (DSN), 2016 46th Annual IEEE/IFIP International Conference on, 2016, pp. 654-661. [12] 黃彥棻, "推動BYOD的3大安全管控作法", 2012. [online]. iThome online. Available: http://online.ithome.com.tw/itadm/article.php?c=73587&s=1. [Accessed: 12- Sep- 2016] [13] A. Williams, "Security Information and Event Management Technologies", Siliconindia, Vol. 10, No.1, 2006, pp. 34-35. [14] R. Gabriel, T. Hoppe, A. Pastwa, and S. Sowa, "Analyzing malware log data to support security information and event management: Some research results," in Advances in Databases, Knowledge, and Data Applications, 2009. DBKDA'09. First International Conference on, pp. 108-113, 2009. [15] T. Kenaza and M. Aiash, "Toward an Efficient Ontology-Based Event Correlation in SIEM," Procedia Computer Science, vol. 83, pp. 139-146, 2016. [16] The NIST definition of cloud computing, NIST SP 800-145, 2011. [17] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali, "Cloud computing: Distributed internet computing for IT and scientific research," IEEE Internet computing, vol. 13, 2009. [18] Infovision. Inc, "Service offerings," 1995. [Online]. Available: http://www.infovision.com/services/technology-solutions/big-data-analytics/service-offerings. [Accessed: 12- Sep- 2016] [19] T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc.", 2012. [20] 林大貴，《Hadoop+Spark大數據巨量分析與機器學習整合開發實戰》，博碩文化股份有限公司，ISBN：9789864340545，2015。 [21] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, et al., "Hive-a petabyte scale data warehouse using hadoop," in 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), 2010, pp. 996-1005. [22] D. Borthakur, "HDFS architecture guide," HADOOP APACHE PROJECT, 2016. [online]. Available:http://hadoop.apache.org/docs/current/. [Accessed: 12- Sep- 2016] [23] Y. Zhang, "Understanding HDFS recovery processes (part 1) - Cloudera engineering Blog," Cloudera Engineering Blog, 2015. [Online]. Available: http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/. [Accessed: 12- Sep- 2016] [24] B. Jia, T. W. Wlodarczyk, and C. Rong, "Performance considerations of data acquisition in hadoop system," in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, 2010, pp. 545-549. [25] 闞大成, "Hadoop與MapReduce 開發巨量資料應用", 2012. [online]. Available: http://www.digitimes.com.tw/tw/dt/n/shwnws.asp?CnlID=13&Cat=30&id=295141#ixzz3ra9BKrL2. [Accessed: 12- Sep- 2016] [26] M. Cardosa, C. Wang, A. Nangia, A. Chandra, and J. Weissman, "Exploring mapreduce efficiency with highly-distributed data," in Proceedings of the second international workshop on MapReduce and its applications, 2011, pp. 27-34. [27] Z. Xiao, H. Chen, and B. Zang, "A hierarchical approach to maximizing MapReduce efficiency," in Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, 2011, pp. 167-168. [28] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster Computing with Working Sets," HotCloud, vol. 10, p. 95, 2010. [29] M. Zaharia, An architecture for fast and general data processing on large clusters: Morgan & Claypool, 2016. [30] H. Ayyalasomayajula, "An Evaluation of the Spark Programming Model For Big Data Analytics," University of Houston, 2015. [31] X. Lin, P. Wang, and B. Wu, "Log analysis in cloud computing environment with Hadoop and Spark," in Broadband Network & Multimedia Technology (IC-BNMT), 2013 5th IEEE International Conference on, 2013, pp. 273-276. [32] SPARK APACHE PROJECT, "Spark Architecture", Available: http://spark.apache.org/docs/latest/cluster-overview.html, 2016. [online]. [Accessed: 12- Sep- 2016] [33] L. Li, D.-Z. Yang, and F.-C. Shen, "A novel rule-based Intrusion Detection System using data mining," in Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on, 2010, pp. 169-172. [34] C. V. Zhou, C. Leckie, and S. Karunasekera, "A survey of coordinated attacks and collaborative intrusion detection," Computers & Security, vol. 29, pp. 124-140, 2010. [35] Y.-L. Ding, L. Li, and H.-Q. Luo, "A novel signature searching for intrusion detection system using data mining," in Machine Learning and Cybernetics, 2009 International Conference on, pp. 122-126, 2009. [36] H. R. Zeidanloo, M. J. Z. Shooshtari, P. V. Amoli, M. Safari, and M. Zamani, "A taxonomy of botnet detection techniques," in Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on, 2010, pp. 158-162. [37] K. Stroeh, E. R. M. Madeira, and S. K. Goldenstein, "An approach to the correlation of security events based on machine learning techniques," Journal of Internet Services and Applications, vol. 4, p. 1, 2013. [38] X. Xu, "Adaptive intrusion detection based on machine learning: feature extraction, classifier construction and sequential pattern prediction," International Journal of Web Services Practices, vol. 2, pp. 49-58, 2006. [39] T. G. Nair and M. Vaidehi, "Efficient resource arbitration and allocation strategies in cloud computing through virtualization," in 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, 2011, pp. 397-401. [40] E. Dede, M. Govindaraju, D. Gunter, R. S. Canon, and L. Ramakrishnan, "Performance evaluation of a mongodb and hadoop platform for scientific data analysis," in Proceedings of the 4th ACM workshop on Scientific cloud computing, 2013, pp. 13-20. [41] E. Feller, L. Ramakrishnan, and C. Morin, "Performance and energy efficiency of big data applications in cloud environments: A Hadoop case study," Journal of Parallel and Distributed Computing, vol. 79, pp. 80-89, 2015. [42] M. Odersky, P. Altherr, V. Cremet, B. Emir, S. Maneth, S. Micheloud, et al., "An overview of the Scala programming language," 2004. [43] K. Tannir, Optimizing Hadoop for MapReduce: Packt Publishing Ltd, 2014. [44] A. Davidson and A. Or, "Optimizing shuffle performance in spark," University of California, Berkeley-Department of Electrical Engineering and Computer Sciences, Tech. Rep, 2013.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0730116-223811.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS