國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,漸進式分群誘捕系統惡意軟體,Incremental Clustering Malware from Honeypots

論文名稱 Title	漸進式分群誘捕系統惡意軟體 Incremental Clustering Malware from Honeypots
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	101 學年度第 2 學期 The spring semester of Academic Year 101	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	69
研究生 Author	黃僑偉 Chiao-Wei Huang
指導教授 Advisor	陳嘉玫 Chia-Mei Chen
召集委員 Convenor	鄭炳強 Bing-Chiang Jeng
口試委員 Advisory Committee	蕭漢威, 賴谷鑫, 李彥賢 Han-wei Hsiao; Gu-Hsin Lai; Yen-Hsien Lee
口試日期 Date of Exam	2013-07-24	繳交日期 Date of Submission	2013-09-02
關鍵字 Keywords	原始碼相似度、漸進式分群、誘捕系統、惡意程式、靜態分析 Incremental clustering, Source code similarity, Static analysis, Honeypot, Malware
統計 Statistics	本論文已被瀏覽 5890 次，被下載 0 次 The thesis/dissertation has been browsed 5890 times, has been downloaded 0 times.

中文摘要
近幾年來網路犯罪份子為了有效地躲避安全機制的檢驗，而不斷地新增惡意程式或是進行變種。雖然誘捕系統能夠捕獲到現今網路犯罪份子所使用的惡意程式，但是隨著所捕獲到的數量日漸增加，資安人員若不能區分出已知舊有變種或是新型惡意程式以利後續分析，則政府企業無法迅速地針對新型態的惡意程式其攻擊模式來做防範。雖然現今有許多學者針對惡意程式提出許多種方法來進行分析，但是大多數都只針對單一檔案型態的惡意程式，而無法適合誘捕系統所捕獲到大多為原始碼與二進位檔混和型態的惡意程式。因此，目前仍然缺少一個有效且快速為誘捕系統惡意程式進行分析的工具。本研究提出結合原始碼檔案與二進位檔案分析的誘捕系統惡意程式分群系統。擷取惡意程式內具有惡意行為意涵的原始碼檔語法結構和二進位檔轉化成影像檔的向量特徵，以及相似的惡意程式所擁有相近的檔案名稱和檔案結構作為特徵值。並且本研究使用漸進式分群法做為未知誘捕系統惡意程式的分群演算法，藉此快速歸類舊有已知惡意程式與區分新型態的惡意程式。經過實驗評估後，證實本研究的系統，能對誘捕系統惡意程式有效且快速地分群。最後，本研究也與virustotal平台與其他相關研究作比較，證實本研究的系統可以達到更好的分群效率。
Abstract
In recent years, cybercriminals use new malware or variants in order to effectively evade inspection from security mechanisms. The honeypot is able to capture the malware cybercriminals are using. With the increasing number of captured malware from honeypots, if IT security people can’t distinguish old, variant or new malware in order to further analysis, government organizations and enterprises can’t prevent for new types attack model quickly. Although today there are many scholars propose a lot of researches to analyze malware, most of them focus on single file type of malware. It is not suitable the honeypot malware that are mostly mixed with source code and binary files. Therefore, it still lacks an effective and quick analysis tool for the honeypot malware. We propose honeypot malware analysis system combining source files and binary files. We use the syntax structure of source code files, the image vector of binary files, file name and file structure as our features to measure malware similarity. We adopt incremental clustering as our clustering algorithm to quickly classify the old known malware and new types of malware. After several experimental evaluations, our system can effectively and quickly cluster honeypot malware. Finally, we also compare the performance with virustotal and other researches, and the result confirms that our system can achieve better clustering efficiency.

目次 Table of Contents
誌謝 ii 摘要 iii Abstract iv 目次 v 圖次 vi 表次 vii 第一章緒論 1 第一節研究背景 1 第二節研究動機與目的 2 第二章文獻探討 4 第一節惡意軟體分類 4 一、動態分析 4 二、靜態分析 5 第二節原始碼相似度比對 7 一、 Token based 7 二、 Tree based 8 三、 Metrics based 9 四、 PDG based 9 第三節字串相似度計算 10 一、 Hamming distance 11 二、 Levenshtein distance 11 三、 Longest Common Subsequence (LCS) 11 四、 Damerau–Levenshtein distance 12 第三章研究方法 13 第一節系統架構與流程 14 第二節漸進式分群法 27 第三節相似度公式 31 第四節權重值計算 35 第四章系統評估 39 第一節樣本蒐集 39 第二節實驗一：惡意二進位檔案之分群 42 第三節實驗二：開放原始碼檔之分群 45 第四節實驗三：誘捕系統所收集樣本之分群 48 第五章結論與未來展望 57 參考文獻 58

參考文獻 References
[1] Help Net Security, “Smaller DDoS attacks can be deadlier than big ones,” http://www.net-security.org/secworld.php?id=12347, 2012. [2] Trend Micro, “2012 Research Paper: Russian Underground 101”, http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/white-papers/wp-russian-underground-101.pdf [3] Trend Micro, “2011 Press Releases:“Soldier” Uses SpyEye to Net $3.2 Million in Six Months, ” http://apac.trendmicro.com/apac/about/news/pr/article/20111031034015.html [4] Symantac, “Symantec Internet Security Threat Report (ISTR) Volume 17,” http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en-us.pdf [5] Honeynet Project, http://map.honeycloud.net/ [6] C.H. Yang, “Code Classification Based on Structure Similarity,” University Sun Yat-sen, 2011. [7] M. Chilowicz, E. Duris and G. Roussel, “Syntax tree fingerprinting: a foundation for source code similarity detection,” in Proceedings of Technical Report IGM2009-03, 2009. [8] B. Cui, J. Li, T. Guo, J. Wang and D. Ma, “Code comparison system based on abstract syntax tree,” in Proceedings of The 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), 2010, pp. 668-673. [9] J. Mayrand, C. Leblanc and E. Merlo, “Experiment on the automatic detection of function clones in a software system using metrics, ” in Proceedings of the 12th International Conference on Software Maintenance, 1996, pp. 244-253. [10] J. Patenaude, E. Merlo, M. Dagenais and B. Lague, “Extending software quality assessment techniques to java systems, ” in Proceedings of the 7th International Workshop on Program Comprehension, 1999, pp. 49-56. [11] Y. Park, D. Reeves, V. Mulukutla and B. Sundaravel, “Fast malware classification by automated behavioral graph matching,” in Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research, New York, 2010, pp. 45:1-45:4. [12] R. Tian, R. Islam and L. Batten, “Differentiating malware from cleanware using behavioral analysis,” in Proceedings of International Conference on Malicious and Unwanted Software, 2010, pp. 23-30. [13] WIKIPEDIA, “Logic bomb,“ http://en.wikipedia.org/wiki/Logic_bomb [14] Y. Ye, D. Wang, T. Li, and D. Ye, “An intelligent pe-malware detection system based on association mining,” Journal in Computer Virology, 2008, pp.323-334. [15] M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva and S. Mukkamala,“Kernel machines for malware classification and similarity analysis,” in Proceedings of International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1-6. [16] S. Cesare, Y. Xiang and W. Zhou, “Malwise—an effective and efficient classification system for packed and polymorphic malware,” Journal in IEEE Transactions on Computers, 2013, pp. 1193-1206. [17] B. Kang, H.S. Kim, T. Kim, H. Kwon and E.G. Im, “Fast malware family detection method using Control Flow Graphs ,” in Proceedings of the 2011 ACM Symposium on Research in Applied Computation, 2011, pp.287-292. [18] H. Agrawal, L. Bahler, J. Micallef, S. Snyder and A. Virodov, “Detection of global, metamorphic malware variants using Control and Data Flow Analysis ,” in Proceedings of MILITARY COMMUNICATIONS CONFERENCE, 2012, pp. 1-6. [19] D. Lee, W.H. Park and K.J. Kim, “A study on analysis of malicious codes similarity using n-gram and vector space model,” in Proceedings of International Conference on Information Science and Applications (ICISA), 2011, pp. 1-4. [20] I. Santos, Y.K. Penya, J. Devesa and P.G. Bringas, “N-grams-based file signatures for malware detection,” in Proceedings of the 11th International Conference on Enterprise Information Systems, 2009, pp. 317-320. [21] S. Jain and Y.K. Meena, “ Byte level n–gram analysis for malware detection ”, Journal in Communications in Computer and Information Science, 2011, pp. 51-59. [22] G. Conti, S. Bratus and A. Shubinay, “ A visual study of primitive binary fragment types ,” Black Hat USA, 2010. [23] L. Nataraj, S. Karthikeyan,G. Jacob and B.S. Manjunath, “ Malware images: visualization and automatic classification,” in Proceedings of the 8th International Symposium on Visualization for Cyber Security, 2011. [24] M.K. Shankarapani, S. Ramamoorthy, R.S. Movva and S. Mukkamala, “Malware detection using assembly and API call sequences,” Journal in Computer Virology, 2011, pp.107-119. [25] L. Prechelt, G. Malpohl and M. Philippsen, “ Finding plagiarisms among a set of programs with JPlag, ” Journal in Journal of Universal Computer Science, 2002, pp. 1016-1038. [26] D. Gitchell and N. Tran, “Sim: A utility for detecting similarity in computer programs, ” in Proceedings of the 30th SIGCSE Technical Symposium, 1999, pp.266-270. [27] G. Cosma and M. Joy, “An approach to source-code plagiarism detection and investigation using Latent Semantic Analysis,” Journal in IEEE Transactions on Computers, 2012, pp.379-394. [28] J.I. Maletic and N. Valluri, “Automatic software clustering via Latent Semantic Analysis, ” in Proceedings of 14th IEEE International Conference on Automated Software Engineering, 1999, pp. 251-254. [29] R. Chen, L. Hong, C. Lü and W. Deng, “Author identification of software source code with Program Dependence Graphs, ” in Proceedings of 34th Annual IEEE Computer Software and Applications Conference, 2010, pp. 281- 286. [30] Sourceforge, http://sourceforge.net/. [31] ZeuS Tracker, https://zeustracker.abuse.ch/. [32] SourceGear DiffMerge, http://www.sourcegear.com/diffmerge/downloads.php

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.216.190.167 論文開放下載的時間是校外不公開 Your IP address is 18.216.190.167 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 永不公開 not available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS