國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於信譽系統偵測路過式下載攻擊 ,Detecting Drive-by Download Based on Reputation System

論文名稱 Title	基於信譽系統偵測路過式下載攻擊 Detecting Drive-by Download Based on Reputation System
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	100 學年度第 1 學期 The fall semester of Academic Year 100	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	52
研究生 Author	黃哲諄 Jhe-Jhun Huang
指導教授 Advisor	陳嘉玫 Chia-Mei Chen
召集委員 Convenor	官大智 D. J. Guan
口試委員 Advisory Committee	林輝堂, 蕭漢威 Hui-Tang Lin; Han-Wei Hsiao
口試日期 Date of Exam	2011-12-28	繳交日期 Date of Submission	2012-01-10
關鍵字 Keywords	機器學習、誘捕系統、網域名稱、信譽系統、路過式下載攻擊 Honeypot, Machine Learning, Drive-by Download, Reputation System, DNS
統計 Statistics	本論文已被瀏覽 5855 次，被下載 692 次 The thesis/dissertation has been browsed 5855 times, has been downloaded 692 times.

中文摘要
由於路過式下載攻擊（Drive-by download）瀏覽即中毒、更多混淆技術、反向連結和藉由HTTP協定的特性，使得傳統入侵偵測系統（Intrusion Detection System ，IDS）和防火牆難以偵測，因此有許多研究提出使用Crawler-based的方法在網際網路上不斷地搜尋路過式下載攻擊網站，並通報在黑名單網站上。然而Crawler-based的研究無法忠實地反應出該路過式下載攻擊網站真的有人瀏覽，因此本研究提出基於監聽（sniffer）的方法，對HTTP流量進行動態檢定（Execution-based detection）。在使用者所瀏覽的網站大多是良性的假設前提下，本研究使用信譽系統（Reputation System）過濾並找出可疑的網域，大幅度降低客戶端誘捕系統（Client Honeypot）的負荷量，讓客戶端誘捕系統適用於HTTP流量的資料。在真實網路環境實驗中，本研究每日平均可以得到56萬筆的HTTP成功存取記錄，經過信譽系統過濾後的客戶端誘捕系統，即使遇到流量高峰，所需處理時間不會超過22小時，並可從中偵測到使用者瀏覽路過式下載攻擊網站的紀錄。在信譽系統方面，本研究的信譽系統不需要使用WHOIS資料庫，適用於各式各樣的網域名稱，使用3個特徵集合，共12個特徵值，基於機器學習建立分類模型，最後的正確分類率可達90.9%。本研究的信賴系統除了從DNS A-Type萃取特徵外，還從DNS NS-Type萃取特徵，實驗的結果發現，從DNS NS-Type萃取的特徵其Error Rate僅19.03%，這個全新的特徵將可供未來的信譽系統相關研究使用。
Abstract
Drive-by download is a sort of network attack which uses different techniques to plant malicious codes in their computers. It makes the traditional intrusion detection systems and firewalls nonfunctional in the reason that those devices could not detect web-based threats. The Crawler-based approach has been proposed by many studies to discover drive-by download sites. However, the Crawler-based approach could not simulate the real user behavior of web browsing when drive-by download attack happens. Therefore, this study proposes a new approach to detect drive-by download by sniffing HTTP flow. This study uses reputation system to improve the efficiency of client honeypots, and adjusts client honeypots to process the raw data of HTTP flow. In the experiment conducted in real network environment, this study show the performance of a single client honeypot could reach average 560,000 HTTP success access log per day. Even in the peak traffic, this mechanism reduced the process time to 22 hours, and detected drive-by download sites that users were actually browsing. Reputation system in this study is applicable to varieties of domain names because it does not refer to online WHOIS database. It established classification model on machine learning in 12 features. The correct classification rate of the reputation system applied in this study is 90.9%. Compared with other Reputation System studies, this study not only extract features from DNS A-Type but also extract features from DNS NS-Type. The experiment results show the Error Rate of the new features from DNS NS-Type is only 19.03%.

目次 Table of Contents
誌謝 I 摘要 II ABSTRACT III 圖次 V 表次 VII 第一章緒論 1 第二章相關研究 4 2.1 HONEYPOT 4 2.2 信譽系統（REPUTATION SYSTEM） 7 2.3 路過式下載攻擊 10 2.4 惡意程式散播網路（MALWARE DISTRIBUTION NETWORK） 12 第三章系統設計 14 3.1 系統概述 14 3.2 路徑過濾 15 3.2 特徵萃取 17 3.3 建立模型 26 第四章效能評估 29 4.1 信譽系統驗證 29 4.2 CLIENT HONEYPOT效率實驗 31 4.3 真實網路環境實驗 34 第五章結論與未來工作 39 參考文獻 40

參考文獻 References
[1] Internet World Stats, “Usage and Poulation Statistics,” http://www.internetworldstats.com/stats.htm. [2] CENZIC, “Web Application Security Trends Report,” http://www.cenzic.com/downloads/Cenzic_AppSecTrends_Q3-Q4-2010.pdf. [3] N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose, “All Your iFRAMEs Point to Us,” Proceedings of the 17th conference on Security symposium, 2008. [4] Virus Bulletin, “VB100 Results by platform,” http://www.virusbtn.com/vb100/archive/platforms. [5] D. Ourston, S. Matzner, W. Stump, and B. Hopkins, “Applications of Hidden Markov Models to Detecting Multi-stage Network Attacks,” System Sciences, Proceedings of the 36th Annual Hawaii International Conference, 2003. [6] IT World, “Attack code used to hack Google now public,” http://www.itworld.com/security/93009/attack-code-used-hack-google-now-public. [7] Y.-M. Wang, D. Beck, X. Jiang, and R. Roussev, “Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities,” In 13th Annual Network and Distributed System Security Symposium, 2006. [8] A. Moshchuk, T. Bragin, S. D. Gribble, and H. M.Levy, “A Crawler-based Study of Spyware on the Web,” In 13th Annual Network and Distributed System Security Symposium, 2006. [9] B. Yuan. Client-side honeypots. Master’s thesis, University of Mannheim, 2007. [10] S. Christian and S. Ramon, “Capture - Honeypot Client (Capture-HPC),” Victoria University of Wellington, NZ, 2006. [11]L. Spitzner, Honeypots: Tracking Hackers. 2002. [12]P. Baecher, M. Koetter, T. Holz, M. Dornseif, and F. Freiling, "The Nepenthes Platform: An Efficient Approach to Collect Malware, " In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID), Sept. 2006. [13]J. Zhuge, T. Holz, X. Han, C. Song, and W. Zou, "Collecting Autonomous Spreading Malware Using High-Interaction Honeypots," In Proceeding ICICS'07 Proceedings of the 9th international conference on Information and communications security, 2007. [14]N. Provos, "Honeyd - A Virtual Honeypot Daemon," in 10th DFN-CERT Workshop, Hamburg, Germany, February 2003. [15] E. Balas and C. Viecco, “Towards a Third Generation Data Capture Architecture for Honeynets.” In Proceedings of the 2002 IEEE Workshop on Information Assurance and Security, 2002. [16] J. Nazario, “Phoneyc: A virtual client honeypot,” In LEET '09: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, 2009. [17] F. Weimer, “Passive DNS replication,” In Proceedings of FIRST Conference on Computer Security Incident, 2005. [18] B. Zdrnja, N. Brownlee, and D. Wessels, “Passive monitoring of DNS anomalies,” In Proceedings of DIMVA Conference, 2007. [19] M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster, “Building a Dynamic Reputation System for DNS,” In 19th Usenix Security Symposium, 2010. [20] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “EXPOSURE:Finding malicious domains using passive DNS analysis,” In Proc.Network and Distributed System Security Symposium (NDSS), 2011. [21] J. Ma, L. K. Saul, S. Savage, and G.. M. Voelker, “ Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs,“ In Proceeding KDD '09 Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009. [22] J. Ma, L. K. Saul, S. Savage, and G.. M. Voelker, “Identifying Suspicious URLs: An Application of Large-Scale Online Learning,” In Proc. of the International Conference on Machine Learning (ICML), 2009. [23] ecma, “Standard ECMA-262,” http://www.ecma-international.org/publications/standards/Ecma-262.htm. [24] M. Egele, P. Wurzinger, C. Kruegel, and E. Kirda, “Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks,” In Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment, 2009. [25]phpBB, "Free and Open Source Forum Software", http://www.phpbb.com/. [26] CLEAN MX, “CLEAN MX realtime database,” http://support.clean-mx.de/clean-mx/viruses. [27] Malware Domain List, “Malware Domain List,” http://www.malwaredomainlist.com/. [28] Phishtank, “Phishtank,” http://www.phishtank.com/. [29] Alexa, “Alexa the Web Information Company,” http://www.alexa.com/. [30] Open Directory Project, “Open Directory Project,” http://www.dmoz.org/. [31] SECURELIST, “Exploit Kits – A Different View,” http://www.securelist.com/en/analysis/204792160/Exploit_Kits_A_Different_View. [32] 鄭毓芹, ”Evolving Threat Landscapes Web-Based Botnet through Exploit Kits and Scripts Evolution,” Workshop on Understanding Botnets of Taiwan, 2011. [33] iana, “Internet Assigned Numbers Authority,” http://www.iana.org/. [34] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna, “Your Botnet is My Botnet: Analysis of a Botnet Takeover,” Proceedings of the 16th ACM conference on Computer and communications security, 2009. [35]P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. 2005. [36]黃銘宗, “混合型殭屍網路偵測,” 碩士論文, 中山大學, 2010. [37] McAfee, “MaAfee SiteAdvisor,” http://www.siteadvisor.com/. [38] Weka 3, “Data Mining with Open Source Machine Learning Software,” http://www.cs.waikato.ac.nz/ml/weka/. [39] C. Seifert, P. Komisarczuk, and I. Welch, “Application of divide-andconquer algorithm paradigm to improve the detection speed of high interaction client honeypots,” in 23rd Annual ACM Symposium on Applied Computing, 2008. [40]VirusTotal, “Free Online Virus, Malware and URL Scanner”, http://www.virustotal.com/.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0110112-180904.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS