Responsive image
博碩士論文 etd-0713113-152613 詳細資訊
Title page for etd-0713113-152613
論文名稱
Title
漸進式分群誘捕系統惡意軟體
Incremental Clustering Malware from Honeypots
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
69
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-07-24
繳交日期
Date of Submission
2013-09-02
關鍵字
Keywords
原始碼相似度、漸進式分群、誘捕系統、惡意程式、靜態分析
Incremental clustering, Source code similarity, Static analysis, Honeypot, Malware
統計
Statistics
本論文已被瀏覽 5920 次,被下載 0
The thesis/dissertation has been browsed 5920 times, has been downloaded 0 times.
中文摘要
近幾年來網路犯罪份子為了有效地躲避安全機制的檢驗,而不斷地新增惡意程式或是進行變種。雖然誘捕系統能夠捕獲到現今網路犯罪份子所使用的惡意程式,但是隨著所捕獲到的數量日漸增加,資安人員若不能區分出已知舊有變種或是新型惡意程式以利後續分析,則政府企業無法迅速地針對新型態的惡意程式其攻擊模式來做防範。
雖然現今有許多學者針對惡意程式提出許多種方法來進行分析,但是大多數都只針對單一檔案型態的惡意程式,而無法適合誘捕系統所捕獲到大多為原始碼與二進位檔混和型態的惡意程式。因此,目前仍然缺少一個有效且快速為誘捕系統惡意程式進行分析的工具。
本研究提出結合原始碼檔案與二進位檔案分析的誘捕系統惡意程式分群系統。擷取惡意程式內具有惡意行為意涵的原始碼檔語法結構和二進位檔轉化成影像檔的向量特徵,以及相似的惡意程式所擁有相近的檔案名稱和檔案結構作為特徵值。並且本研究使用漸進式分群法做為未知誘捕系統惡意程式的分群演算法,藉此快速歸類舊有已知惡意程式與區分新型態的惡意程式。經過實驗評估後,證實本研究的系統,能對誘捕系統惡意程式有效且快速地分群。最後,本研究也與virustotal平台與其他相關研究作比較,證實本研究的系統可以達到更好的分群效率。
Abstract
In recent years, cybercriminals use new malware or variants in order to effectively evade inspection from security mechanisms. The honeypot is able to capture the malware cybercriminals are using. With the increasing number of captured malware from honeypots, if IT security people can’t distinguish old, variant or new malware in order to further analysis, government organizations and enterprises can’t prevent for new types attack model quickly.
Although today there are many scholars propose a lot of researches to analyze malware, most of them focus on single file type of malware. It is not suitable the honeypot malware that are mostly mixed with source code and binary files. Therefore, it still lacks an effective and quick analysis tool for the honeypot malware.
We propose honeypot malware analysis system combining source files and binary files. We use the syntax structure of source code files, the image vector of binary files, file name and file structure as our features to measure malware similarity. We adopt incremental clustering as our clustering algorithm to quickly classify the old known malware and new types of malware. After several experimental evaluations, our system can effectively and quickly cluster honeypot malware. Finally, we also compare the performance with virustotal and other researches, and the result confirms that our system can achieve better clustering efficiency.
目次 Table of Contents
誌謝 ii
摘要 iii
Abstract iv
目次 v
圖次 vi
表次 vii
第一章 緒論 1
第一節 研究背景 1
第二節 研究動機與目的 2
第二章 文獻探討 4
第一節 惡意軟體分類 4
一、 動態分析 4
二、 靜態分析 5
第二節 原始碼相似度比對 7
一、 Token based 7
二、 Tree based 8
三、 Metrics based 9
四、 PDG based 9
第三節 字串相似度計算 10
一、 Hamming distance 11
二、 Levenshtein distance 11
三、 Longest Common Subsequence (LCS) 11
四、 Damerau–Levenshtein distance 12
第三章 研究方法 13
第一節 系統架構與流程 14
第二節 漸進式分群法 27
第三節 相似度公式 31
第四節 權重值計算 35
第四章 系統評估 39
第一節 樣本蒐集 39
第二節 實驗一:惡意二進位檔案之分群 42
第三節 實驗二:開放原始碼檔之分群 45
第四節 實驗三:誘捕系統所收集樣本之分群 48
第五章 結論與未來展望 57
參考文獻 58
參考文獻 References
[1] Help Net Security, “Smaller DDoS attacks can be deadlier than big ones,”
http://www.net-security.org/secworld.php?id=12347, 2012.
[2] Trend Micro, “2012 Research Paper: Russian Underground 101”, http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/white-papers/wp-russian-underground-101.pdf
[3] Trend Micro, “2011 Press Releases:“Soldier” Uses SpyEye to Net $3.2 Million in Six Months, ” http://apac.trendmicro.com/apac/about/news/pr/article/20111031034015.html
[4] Symantac, “Symantec Internet Security Threat Report (ISTR) Volume 17,”
http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en-us.pdf
[5] Honeynet Project, http://map.honeycloud.net/
[6] C.H. Yang, “Code Classification Based on Structure Similarity,” University Sun Yat-sen, 2011.
[7] M. Chilowicz, E. Duris and G. Roussel, “Syntax tree fingerprinting: a foundation for source code similarity detection,” in Proceedings of Technical Report IGM2009-03, 2009.
[8] B. Cui, J. Li, T. Guo, J. Wang and D. Ma, “Code comparison system based on abstract syntax tree,” in Proceedings of The 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), 2010, pp. 668-673.
[9] J. Mayrand, C. Leblanc and E. Merlo, “Experiment on the automatic detection of function clones in a software system using metrics, ” in Proceedings of the 12th International Conference on Software Maintenance, 1996, pp. 244-253.
[10] J. Patenaude, E. Merlo, M. Dagenais and B. Lague, “Extending software quality assessment techniques to java systems, ” in Proceedings of the 7th International Workshop on Program Comprehension, 1999, pp. 49-56.
[11] Y. Park, D. Reeves, V. Mulukutla and B. Sundaravel, “Fast malware classification by automated behavioral graph matching,” in Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research, New York, 2010, pp. 45:1-45:4.
[12] R. Tian, R. Islam and L. Batten, “Differentiating malware from cleanware using behavioral analysis,” in Proceedings of International Conference on Malicious and Unwanted Software, 2010, pp. 23-30.
[13] WIKIPEDIA, “Logic bomb,“ http://en.wikipedia.org/wiki/Logic_bomb
[14] Y. Ye, D. Wang, T. Li, and D. Ye, “An intelligent pe-malware detection system based on association mining,” Journal in Computer Virology, 2008, pp.323-334.
[15] M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva and S. Mukkamala,“Kernel machines for malware classification and similarity analysis,” in Proceedings of International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1-6.
[16] S. Cesare, Y. Xiang and W. Zhou, “Malwise—an effective and efficient classification system for packed and polymorphic malware,” Journal in IEEE Transactions on Computers, 2013, pp. 1193-1206.
[17] B. Kang, H.S. Kim, T. Kim, H. Kwon and E.G. Im, “Fast malware family detection method using Control Flow Graphs ,” in Proceedings of the 2011 ACM Symposium on Research in Applied Computation, 2011, pp.287-292.
[18] H. Agrawal, L. Bahler, J. Micallef, S. Snyder and A. Virodov, “Detection of global, metamorphic malware variants using Control and Data Flow Analysis ,” in Proceedings of MILITARY COMMUNICATIONS CONFERENCE, 2012, pp. 1-6.
[19] D. Lee, W.H. Park and K.J. Kim, “A study on analysis of malicious codes similarity using n-gram and vector space model,” in Proceedings of International Conference on Information Science and Applications (ICISA), 2011, pp. 1-4.
[20] I. Santos, Y.K. Penya, J. Devesa and P.G. Bringas, “N-grams-based file signatures for malware detection,” in Proceedings of the 11th International Conference on Enterprise Information Systems, 2009, pp. 317-320.
[21] S. Jain and Y.K. Meena, “ Byte level n–gram analysis for malware detection ”, Journal in Communications in Computer and Information Science, 2011, pp. 51-59.
[22] G. Conti, S. Bratus and A. Shubinay, “ A visual study of primitive binary fragment types ,” Black Hat USA, 2010.
[23] L. Nataraj, S. Karthikeyan,G. Jacob and B.S. Manjunath, “ Malware images: visualization and automatic classification,” in Proceedings of the 8th International Symposium on Visualization for Cyber Security, 2011.
[24] M.K. Shankarapani, S. Ramamoorthy, R.S. Movva and S. Mukkamala, “Malware detection using assembly and API call sequences,” Journal in Computer Virology, 2011, pp.107-119.
[25] L. Prechelt, G. Malpohl and M. Philippsen, “ Finding plagiarisms among a set of programs with JPlag, ” Journal in Journal of Universal Computer Science, 2002, pp. 1016-1038.
[26] D. Gitchell and N. Tran, “Sim: A utility for detecting similarity in computer programs, ” in Proceedings of the 30th SIGCSE Technical Symposium, 1999, pp.266-270.
[27] G. Cosma and M. Joy, “An approach to source-code plagiarism detection and investigation using Latent Semantic Analysis,” Journal in IEEE Transactions on Computers, 2012, pp.379-394.
[28] J.I. Maletic and N. Valluri, “Automatic software clustering via Latent Semantic Analysis, ” in Proceedings of 14th IEEE International Conference on Automated Software Engineering, 1999, pp. 251-254.
[29] R. Chen, L. Hong, C. Lü and W. Deng, “Author identification of software source code with Program Dependence Graphs, ” in Proceedings of 34th Annual IEEE Computer Software and Applications Conference, 2010, pp. 281- 286.
[30] Sourceforge, http://sourceforge.net/.
[31] ZeuS Tracker, https://zeustracker.abuse.ch/.
[32] SourceGear DiffMerge, http://www.sourcegear.com/diffmerge/downloads.php
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.238.121.7
論文開放下載的時間是 校外不公開

Your IP address is 3.238.121.7
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 永不公開 not available

QR Code