Responsive image
博碩士論文 etd-0713113-152613 詳細資訊
Title page for etd-0713113-152613
Incremental Clustering Malware from Honeypots
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
Incremental clustering, Source code similarity, Static analysis, Honeypot, Malware
本論文已被瀏覽 5889 次,被下載 0
The thesis/dissertation has been browsed 5889 times, has been downloaded 0 times.
In recent years, cybercriminals use new malware or variants in order to effectively evade inspection from security mechanisms. The honeypot is able to capture the malware cybercriminals are using. With the increasing number of captured malware from honeypots, if IT security people can’t distinguish old, variant or new malware in order to further analysis, government organizations and enterprises can’t prevent for new types attack model quickly.
Although today there are many scholars propose a lot of researches to analyze malware, most of them focus on single file type of malware. It is not suitable the honeypot malware that are mostly mixed with source code and binary files. Therefore, it still lacks an effective and quick analysis tool for the honeypot malware.
We propose honeypot malware analysis system combining source files and binary files. We use the syntax structure of source code files, the image vector of binary files, file name and file structure as our features to measure malware similarity. We adopt incremental clustering as our clustering algorithm to quickly classify the old known malware and new types of malware. After several experimental evaluations, our system can effectively and quickly cluster honeypot malware. Finally, we also compare the performance with virustotal and other researches, and the result confirms that our system can achieve better clustering efficiency.
目次 Table of Contents
誌謝 ii
摘要 iii
Abstract iv
目次 v
圖次 vi
表次 vii
第一章 緒論 1
第一節 研究背景 1
第二節 研究動機與目的 2
第二章 文獻探討 4
第一節 惡意軟體分類 4
一、 動態分析 4
二、 靜態分析 5
第二節 原始碼相似度比對 7
一、 Token based 7
二、 Tree based 8
三、 Metrics based 9
四、 PDG based 9
第三節 字串相似度計算 10
一、 Hamming distance 11
二、 Levenshtein distance 11
三、 Longest Common Subsequence (LCS) 11
四、 Damerau–Levenshtein distance 12
第三章 研究方法 13
第一節 系統架構與流程 14
第二節 漸進式分群法 27
第三節 相似度公式 31
第四節 權重值計算 35
第四章 系統評估 39
第一節 樣本蒐集 39
第二節 實驗一:惡意二進位檔案之分群 42
第三節 實驗二:開放原始碼檔之分群 45
第四節 實驗三:誘捕系統所收集樣本之分群 48
第五章 結論與未來展望 57
參考文獻 58
參考文獻 References
[1] Help Net Security, “Smaller DDoS attacks can be deadlier than big ones,”, 2012.
[2] Trend Micro, “2012 Research Paper: Russian Underground 101”,
[3] Trend Micro, “2011 Press Releases:“Soldier” Uses SpyEye to Net $3.2 Million in Six Months, ”
[4] Symantac, “Symantec Internet Security Threat Report (ISTR) Volume 17,”
[5] Honeynet Project,
[6] C.H. Yang, “Code Classification Based on Structure Similarity,” University Sun Yat-sen, 2011.
[7] M. Chilowicz, E. Duris and G. Roussel, “Syntax tree fingerprinting: a foundation for source code similarity detection,” in Proceedings of Technical Report IGM2009-03, 2009.
[8] B. Cui, J. Li, T. Guo, J. Wang and D. Ma, “Code comparison system based on abstract syntax tree,” in Proceedings of The 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), 2010, pp. 668-673.
[9] J. Mayrand, C. Leblanc and E. Merlo, “Experiment on the automatic detection of function clones in a software system using metrics, ” in Proceedings of the 12th International Conference on Software Maintenance, 1996, pp. 244-253.
[10] J. Patenaude, E. Merlo, M. Dagenais and B. Lague, “Extending software quality assessment techniques to java systems, ” in Proceedings of the 7th International Workshop on Program Comprehension, 1999, pp. 49-56.
[11] Y. Park, D. Reeves, V. Mulukutla and B. Sundaravel, “Fast malware classification by automated behavioral graph matching,” in Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research, New York, 2010, pp. 45:1-45:4.
[12] R. Tian, R. Islam and L. Batten, “Differentiating malware from cleanware using behavioral analysis,” in Proceedings of International Conference on Malicious and Unwanted Software, 2010, pp. 23-30.
[13] WIKIPEDIA, “Logic bomb,“
[14] Y. Ye, D. Wang, T. Li, and D. Ye, “An intelligent pe-malware detection system based on association mining,” Journal in Computer Virology, 2008, pp.323-334.
[15] M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva and S. Mukkamala,“Kernel machines for malware classification and similarity analysis,” in Proceedings of International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1-6.
[16] S. Cesare, Y. Xiang and W. Zhou, “Malwise—an effective and efficient classification system for packed and polymorphic malware,” Journal in IEEE Transactions on Computers, 2013, pp. 1193-1206.
[17] B. Kang, H.S. Kim, T. Kim, H. Kwon and E.G. Im, “Fast malware family detection method using Control Flow Graphs ,” in Proceedings of the 2011 ACM Symposium on Research in Applied Computation, 2011, pp.287-292.
[18] H. Agrawal, L. Bahler, J. Micallef, S. Snyder and A. Virodov, “Detection of global, metamorphic malware variants using Control and Data Flow Analysis ,” in Proceedings of MILITARY COMMUNICATIONS CONFERENCE, 2012, pp. 1-6.
[19] D. Lee, W.H. Park and K.J. Kim, “A study on analysis of malicious codes similarity using n-gram and vector space model,” in Proceedings of International Conference on Information Science and Applications (ICISA), 2011, pp. 1-4.
[20] I. Santos, Y.K. Penya, J. Devesa and P.G. Bringas, “N-grams-based file signatures for malware detection,” in Proceedings of the 11th International Conference on Enterprise Information Systems, 2009, pp. 317-320.
[21] S. Jain and Y.K. Meena, “ Byte level n–gram analysis for malware detection ”, Journal in Communications in Computer and Information Science, 2011, pp. 51-59.
[22] G. Conti, S. Bratus and A. Shubinay, “ A visual study of primitive binary fragment types ,” Black Hat USA, 2010.
[23] L. Nataraj, S. Karthikeyan,G. Jacob and B.S. Manjunath, “ Malware images: visualization and automatic classification,” in Proceedings of the 8th International Symposium on Visualization for Cyber Security, 2011.
[24] M.K. Shankarapani, S. Ramamoorthy, R.S. Movva and S. Mukkamala, “Malware detection using assembly and API call sequences,” Journal in Computer Virology, 2011, pp.107-119.
[25] L. Prechelt, G. Malpohl and M. Philippsen, “ Finding plagiarisms among a set of programs with JPlag, ” Journal in Journal of Universal Computer Science, 2002, pp. 1016-1038.
[26] D. Gitchell and N. Tran, “Sim: A utility for detecting similarity in computer programs, ” in Proceedings of the 30th SIGCSE Technical Symposium, 1999, pp.266-270.
[27] G. Cosma and M. Joy, “An approach to source-code plagiarism detection and investigation using Latent Semantic Analysis,” Journal in IEEE Transactions on Computers, 2012, pp.379-394.
[28] J.I. Maletic and N. Valluri, “Automatic software clustering via Latent Semantic Analysis, ” in Proceedings of 14th IEEE International Conference on Automated Software Engineering, 1999, pp. 251-254.
[29] R. Chen, L. Hong, C. Lü and W. Deng, “Author identification of software source code with Program Dependence Graphs, ” in Proceedings of 34th Annual IEEE Computer Software and Applications Conference, 2010, pp. 281- 286.
[30] Sourceforge,
[31] ZeuS Tracker,
[32] SourceGear DiffMerge,
電子全文 Fulltext
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是
論文開放下載的時間是 校外不公開

Your IP address is
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
開放時間 available 永不公開 not available

QR Code