Responsive image
博碩士論文 etd-0914112-155523 詳細資訊
Title page for etd-0914112-155523
Code Classification Based on Structure Similarity
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
Malware Classification, Source Code, Static Analysis, Structure Similarity
本論文已被瀏覽 5851 次,被下載 382
The thesis/dissertation has been browsed 5851 times, has been downloaded 382 times.
隨著誘捕系統愈來愈健全,誘捕系統所蒐集到的惡意軟體原始碼也日漸增加,藉由分析惡意軟體的原始碼可以得到最正確的惡意軟體分類,因此本論文提出一個自動化惡意軟體分類機制。本研究藉由誘捕系統所擷取之惡意軟體原始碼,利用惡意軟體檔案結構相似度以及原始碼檔案相似度,透過階層式分群演算法(Hierarchical Clustering Algorithmn)之方法,不但可以正確的將新捕捉到的惡意軟體分類到正確的類別,也可以快速地找出新類型的惡意軟體。本論文提出的方式可以大幅度減少數位鑑識者針對同一類型的惡意軟體重複進行高成本的分析,亦可在最短時間內了解攻擊者行為以及意圖。本研究透過實驗證明,系統除了可以將惡意軟體原始碼做正確的分類外,亦可應用於其他有原始碼分類需求的領域。
Automatically classifying malware variants source code is the most important research issue in the field of digital forensics. By means of malware classification, we can get complete behavior of malware which can simplify the forensics task. In previous researches, researchers use malware binary to perform dynamic analysis or static analysis after reverse engineering. In the other hand, malware developers even use anti-VM and obfuscation techniques try to cheating malware classifiers.
With honeypots are increasingly used, researchers could get more and more malware source code. Analyzing these source codes could be the best way for malware classification. In this paper, a novel classification approach is proposed which based on logic and directory structure similarity of malwares. All collected source code will be classified correctly by hierarchical clustering algorithm. The proposed system not only helps us classify known malwares correctly but also find new type of malware. Furthermore, it avoids forensics staffs spending too much time to reanalyze known malware. And the system could also help realize attacker's behavior and purpose. The experimental results demonstrate the system can classify the malware correctly and be applied to other source code classification aspect.
目次 Table of Contents
誌謝 II
中文摘要 III
Abstract IV
目錄 V
圖次 VII
表次 IX
第一章 緒論 1
第一節 研究背景 1
第二節 研究動機 2
第三節 研究目的 3
第二章 相關文獻 4
第一節 惡意軟體分類 4
第二節 原始碼比對 7
第三節 相似度計算 7
第三章 問題定義與研究方法 11
第一節 問題定義 11
第二節 系統架構與流程 16
第三節 相似度定義 18
第四章 系統評估 24
第一節 樣本蒐集 24
第二節 實驗一:自行改寫之原始碼獨立檔案依變異階段順序輸入 25
第三節 實驗二:自行改寫之原始碼獨立檔案隨機輸入 28
第四節 實驗三:自行改寫之原始碼壓縮檔案隨機輸入 30
第五節 實驗四:誘捕系統所蒐集可疑下載 34
第五章 結論及未來展望 43
第六章 相關文獻 44
參考文獻 References
[1] Sans, "Bots & botnet: An overview,", 2003.
[2] COMPUTERWORLD, “Security firm warns of commercial, on-demand DDoS botnet,”, 2010.
[3] B. Stone-Gross, T. Holz, G. Stringhini, and G. Vigna, “The Underground Economy of Spam: a Botmaster’s Perspective of Coordinating Large-Scale Spam Campaigns,” In Proceedings of the 4th USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET), Apr. 2011.
[4] HELP NET SECURITY, “Microsoft cripples the Waledac botnet,”, 2010.
[5] HELP NET SECURITY, “Rustock botnet downed by Microsoft,”, 2011.
[6] HELP NET SECURITY, “Microsoft offers $250,000 reward for botnet information,”, 2011.
[7] C. Willems, T. Holz, and F. Freiling, “Toward Automated Dynamic Malware Analysis Using CWSandbox,” IEEE Security and Privacy, no. 2, vol. 5, Mar./Apr. 2007, pp. 32-39.
[8] M. Harman, “Why Source Code Analysis and Manipulation Will Always Be Important,” in Proceedings of the 10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010), Timişoara, Romania, Sep. 12-13, 2010.
[9] J. Z. Kolter, and M. A. Maloof, “Learning to Detect and Classify Malicious Executables in the Wild,” The Journal of Machine Learning Research, vol. 7, 2006, pp. 2721-2744.
[10] G. Tahan, L. Rokach, and Y. Shahar, “Mal-ID:Automatic Malware Detection Using Common Segment Analysis and Meta-Features,” Journal of Machine Learning Research, vol. 13, 2012, pp. 949-979.
[11] M.G. Schultz, E. Eskin, F. Zadok, and S.J. Stolfo, “Data mining methods for detection of new malicious executables,” The 2001 IEEE Symposium on Security and Privacy, Oakland, CA, May 2001.
[12] T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, “N-gram-based detection of new malicious code,” in Proceedings of the 28th Annual International Computer Software and Applications Conference, IEEE CSP, 2004.
[13] J.Z. Kolter and M.A. Maloof, “Learning to detect and classify malicious executables in the wild,” The Journal of Machine Learning Research, vol. 7, Dec 2006, pp. 2721-2744.O. Henchiri and N. Japkowicz, “A feature selection and evaluation scheme for computer virus detection,” in Proceedings of the Sixth International Conference on Data Mining, Hong Kong, 2006, pp. 891-895.
[14] O. Henchiri and N. Japkowicz, “A feature selection and evaluation scheme for computer virus detection,” in Proceedings of ICDM-2006, Hong Kong, 2006, pp. 891–895.
[15] B. Zhang, J. Yin, J. Hao, D. Zhang, and S. Wang, “Malicious codes detection based on ensemble learning,” in Proceedings of The 4th International Conference on Autonomic and Trusted Computing, vol. 4610, 2007, pp. 468-477.
[16] Y. Elovici, A. Shabtai, R. Moskovitch, G. Tahan, and C. Glezer, “Applying machine learning techniques for detection of malicious code in network traffic,” in Proceedings of the 30th annual German conference on Advances in Artificial Intelligence, Berlin, Germany, Sep. 10-13, 2007, pp. 44-50.
[17] J. Jang, D. Brumley, and S. Venkataraman, “BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis,” in Proceedings of the 18th ACM conference on Computer and Communications Security, Chicago, Illinois, Oct. 17-21, 2011, pp. 309–320.
[18] Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, “An intelligent pe-malware detection system based on association mining,” Journal in Computer Virology, vol. 4, no. 4, 2008, pp.323–334.
[19] Y. Ye, L. Chen, D. Wang, T. Li, Q. Jiang, and M. Zhao, “Sbmds: an interpretable string based malware detection system using svm ensemble with bagging,” Journal in Computer Virology, vol. 5, no. 4, 2009, pp. 283–293.
[20] Y. Ye, T. Li, K. Huang, Q. Jiang, and Y. Chen, “Hierarchical associative classifier (hac) for malware detection from the large and imbalanced gray list,” Journal of Intelligent Information Systems, vol. 35, no. 1, 2010, pp. 1–20.
[21] A. Altaher, Supriyanto, A. ALmomani, M. Anbar, and S. Ramadass, “Malware detection based on evolving clustering method for classification,” Scientific Research and Essays, vol. 7, no. 22, Jun 14, 2012, pp.2031-2036.
[22] M. Gheorghescu, "An automated virus classification system," in Virus Bulletin Conference, 2005, pp. 294-300.
[23] M. Christodorescu, and S. Jha, “Static Analysis of Executables to Detect Malicious Patterns,” in Proceedings of the 12th USENIX Security Symposium, 2003.
[24] S. Cesare, and Y. Xiang, “Classification of Malware Using Structured Control Flow,” in Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010), 2010.
[25] K. Zen, D.N.F.A. Iskandar, and O. Linang, “Using Latent Semantic Analysis for Automated Grading Programming Assignments,” in Proceedings of Semantic Technology and Information Retrieval (STAIR), Putrajaya, Malaysia, Jun 28-29, 2011, pp. 82-88.
[26] J.I. Maletic, and N. Valluri, “Automatic software clustering via Latent Semantic Analysis,” in Proceedings of 14th IEEE International Conference on Automated Software Engineering (ASE’99), Cocoa Beach Florida, Oct 1999, pp. 251-254.
[27] D. Zhang, J. Wang, D. Cai, and J. Lu, “Self-taught hashing for fast similarity search,” in Proceedings of Proceedings of the Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR), 2010.
[28] Edit distance - Wikipedia, the free encyclopedia,
[29] Graphviz - Graph Visualization Software,
[30] Meld Diff Viewer – Compare and Merge files/directories in Ubuntu,
[31] virustotal - Free Online Virus, Malware and URL Scanner,
電子全文 Fulltext
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available

紙本論文 Printed copies
開放時間 available 已公開 available

QR Code