國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於多層漸進式分群之惡意程式分析,Multi-layer Incremental Clustering for Malware Analysis

論文名稱 Title	基於多層漸進式分群之惡意程式分析 Multi-layer Incremental Clustering for Malware Analysis
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	104 學年度第 1 學期 The fall semester of Academic Year 104	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	68
研究生 Author	王奕凱 I-Kai Wang
指導教授 Advisor	陳嘉玫 Chia-Mai Chen
召集委員 Convenor	賴谷鑫 Gu-Hsin Lai
口試委員 Advisory Committee	楊竹星, 林耕霈 Chu-Sing Yang; Keng-Pei Lin
口試日期 Date of Exam	2015-08-04	繳交日期 Date of Submission	2015-09-16
關鍵字 Keywords	惡意程式、漸進式分群、靜態分析、惡意程式家族、多層分析 incremental clustering, malware detection, static analysis, malware family
統計 Statistics	本論文已被瀏覽 5907 次，被下載 85 次 The thesis/dissertation has been browsed 5907 times, has been downloaded 85 times.

中文摘要
惡意程式的威脅是現今資安的頭痛議題，隨著惡意程式的快速成長與變種，資安的防範手段也要跟上這個成長的速度才行。但資安人員總是只能在被動的立場去解決惡意程式的攻擊，當新的惡意程式展開攻擊時，資安人員才能想辦法去偵測，然後再去恢復受到攻擊的損害，最後再建構出有效的防禦機制。這是一場與時間的賽跑，每個環節都要以最快的效率來完成。現今針對惡意程式的偵測有許多方式，坊間也有許多廠商所設計的防禦軟體。但大多都只把重點放在找出惡意程式上，很少有軟體會去找出惡意程式之間的關連，而這也是加速偵測惡意程式的關鍵所在。就算我們遇到了之前未曾碰過的惡意程式，如果可以找出它與之前碰過的惡意程式之間的關聯性，就可以更快的訂定出應對策略，減少損失。本研究提出一個以偵測惡意程式之間關聯性為主的多層分群系統。主要是擷取惡意程式中的代表性特徵，包括各種程式碼檔中的指令結構、二進位檔的向量特徵、惡意程式中的結構特徵等。再利用兩種不同的分群演算法，改良漸進式分群和延伸式1-NN，來將未知的惡意程式加以歸類，並且以多層分群的方式，分析惡意程式之間的家族關係。最後再與最具公正力的線上掃描網站VirusTotal、知名防毒大廠Avira作比較，證實本研究可以作到更好的偵測效果。
Abstract
The threat of malware is definitely the most important topic of internet security. As the growth of malware is faster ever and ever, the defense method of security must evolve. Unfortunately the IT expert only can start to deal with attack problem after the new malware have already invaded our system. The usual steps for malware attack issue is to collect the evidence first. Then the IT expert can analyze these evidence to find out the solution. At last, we need to improve our system in case that there will be another malware attack. In this paper, we propose a malware analysis system to accurately cluster new malware. We extract the significant feature from malware sample. For source code file, we extract the syntax string as the feature. For binary file, we transform the binary file to image file, and extract the matrix vector from the image as the feature. Then we adopt two different clustering algorithm, advanced incremental clustering and extended 1-NN, to cluster our malware sample. Finally, our system can offer a detailed report abou the malware family relationship. In our research, there are four experiments to verify our system. We compare the performance and accuracy about the two different clustering algorithm, and verify the system’s maturity with random sample analysis order. We also compare our system with Virustotal.com and Avira software, and the result confirms that our system can do better efficient clustering.

目次 Table of Contents
目次 5 第一章緒論 6 1.1 研究背景 6 1.2 研究動機與目的 7 第二章文獻探討 9 1.1 惡意程式分析 9 1.2 惡意程式家族的偵測方式 12 第三章研究方法 14 3.1 系統架構與流程 15 3.2 相似度公式 28 3.3 分群演算法 32 第四章系統實驗 40 4.1 樣本收集與觀察 40 4.2 實驗一改良漸進式分群與延伸式1-NN 42 4.3 實驗二調換樣本順序以驗證本系統的分群效率 49 4.4 實驗三本系統與VirusTotal之比較 56 4.5 實驗四本系統與Avira防毒軟體之比較 61 第五章結論 64 參考文獻 65

參考文獻 References
[1] L. Gordon, M. Loeb,W. Lucyshyn, and R. Richardson. Computer Crime and Security Survey. Technical report, Computer Security Institute (CSI), 2005. [2] Symantec ISTR20 INTERNET SECURITY THREAT REPORT VOLUME 20.2015 April. [3] Symantec ISTR20 INTERNET SECURITY THREAT REPORT VOLUME 20 APPENDICES.2015 April. [4] Hao Bai, Chang-zhen Hu, Xiao-chuan Jing, Ning Li, Xiao-yin Wang “Approach for malware identification using dynamic behaviour and outcome triggering” Journal in Information Security IET, Vol. 8, Iss. 2, 2014, pp. 140-151 [5] Sheng Wen, Wei Zhou, Jun Zhang, Yang Xiang, Wanlei Zhou, Weijia Jia, Cliff C. Zou “Modeling and Analysis on the Propagation Dynamics of Modern Email Malware” Journal in IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 11, Iss. 4, JULY/AUGUST 2014, pp. 361–374 [6] Lei Cen, Christoher S. Gates, Luo Si, and Ninghui Li “A Probabilistic Discriminative Model for Android Malware Detection with Decompiled Source Code”journal in IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, Iss. 4, JULY/AUGUST 2015, pp. 400-412 [7] Deniz Yuret “FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-Gram Language Model”journal in IEEE SIGNAL PROCESSING LETTERS, VOL. 19, Iss. 11, NOVEMBER 2012, pp. 725-728 [8] Nicole L. Beebe, Laurence A. Maddox, Lishu Liu, Minghe Sun “Sceadan: Using Concatenated N-Gram Vectors for Improved File and Data Type Classification”journal in IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, Iss. 9, SEPTEMBER 2013, pp. 1519-1530 [9] Nikita Jain, Rashi Garg, Indu Chawla “Concept Localization using n-gram Information Retrieval Model and Control Flow Graph”in proceedings of Confluence 2013: The Next Generation Information Technology Summit (4th International Conference) , 26-27 Sept. 2013, pp. 29-34 [10] M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva and S. Mukkamala,“Kernel machines for malware classification and similarity analysis,” in Proceedings of International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1-6. [11] S. Cesare, Y. Xiang and W. Zhou, “Malwise—an effective and efficient classification system for packed and polymorphic malware,” Journal in IEEE Transactions on Computers, 2013, pp. 1193-1206 [12] B. Kang, H.S. Kim, T. Kim, H. Kwon and E.G. Im, “Fast malware family detection method using Control Flow Graphs ,” in Proceedings of the 2011 ACM Symposium on Research in Applied Computation, 2011, pp.287-292. [13] H. Agrawal, L. Bahler, J. Micallef, S. Snyder and A. Virodov, “Detection of global, metamorphic malware variants using Control and Data Flow Analysis ,” in Proceedings of MILITARY COMMUNICATIONS CONFERENCE, 2012, pp. 1-6. [14] G. Conti, S. Bratus and A. Shubinay, “ A visual study of primitive binary fragment types ,” Black Hat USA, 2010. [15] L. Nataraj, S. Karthikeyan,G. Jacob and B.S. Manjunath, “ Malware images: visualization and automatic classification,” in proceedings of the 8th International Symposium on Visualization for Cyber Security, 2011. [16] Maya Louk, Hyotaek Lim, HoonJae Lee , Mohammed Atiquzzaman “An Effective Framework of Behavior Detection- Advanced Static Analysis for Malware Detection”in proceedings of 2014 International Symposium on Communications and Information Technologies (ISCIT), 24-26 Sept. 2014, pp. 361 – 365 [17] Young Han Choi, Byoung Jin Han, Byung Chul Bae, Hyung Geun Oh, Ki Wook Sohn “Toward Extracting Malware Features for Classification using Static and Dynamic Analysis”in proceedings of 2012 8th International Conference on Computing and Networking Technology (ICCNT), 27-29 Aug. 2012, pp. 126 – 129 [18] M.K. Shankarapani, S. Ramamoorthy, R.S. Movva and S. Mukkamala, “Malware detection using assembly and API call sequences,” Journal in Computer Virology, 2011, pp.107-119. [19] Hesham Mekky, Aziz Mohaisen, Zhi-Li Zhang. Blind Separation of Benign and Malicious Events to Enable Accurate Malware Family Classification. ACM 978-1-4503-2957-6/14/11.2014. [20] Jeff Gennari, David French. Defining Malware Families Based on Analyst Insights. IEEE 978-1-4577-1376-7/11/. 2011. [21] Yang Zhong, Hirofumi Yamaki, Yukiko Yamaguchi, Hiroki Takakura. ARIGUMA Code Analyzer: Efficient Variant Detection by Identifying Common Instruction Sequences in Malware Families. IEEE DOI 10.1109/COMPSAC. 2013. [22] Gregory Blanc, Ruo Ando, Youki Kadobayashi “Term-Rewriting Deobfuscation for Static Client-Side Scripting Malware Detection”in proceedings of 2011 4th IFIP International Conference on New Technologies, Mobility and Security (NTMS), 7-10 Feb. 2011, pp. 1 - 6

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0815115-154123.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS