Responsive image
博碩士論文 etd-0110112-180904 詳細資訊
Title page for etd-0110112-180904
Detecting Drive-by Download Based on Reputation System
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
Honeypot, Machine Learning, Drive-by Download, Reputation System, DNS
本論文已被瀏覽 5813 次,被下載 692
The thesis/dissertation has been browsed 5813 times, has been downloaded 692 times.
由於路過式下載攻擊(Drive-by download)瀏覽即中毒、更多混淆技術、反向連結和藉由HTTP協定的特性,使得傳統入侵偵測系統(Intrusion Detection System ,IDS)和防火牆難以偵測,因此有許多研究提出使用Crawler-based的方法在網際網路上不斷地搜尋路過式下載攻擊網站,並通報在黑名單網站上。然而Crawler-based的研究無法忠實地反應出該路過式下載攻擊網站真的有人瀏覽,因此本研究提出基於監聽(sniffer)的方法,對HTTP流量進行動態檢定(Execution-based detection)。
在使用者所瀏覽的網站大多是良性的假設前提下,本研究使用信譽系統(Reputation System)過濾並找出可疑的網域,大幅度降低客戶端誘捕系統(Client Honeypot)的負荷量,讓客戶端誘捕系統適用於HTTP流量的資料。在真實網路環境實驗中,本研究每日平均可以得到56萬筆的HTTP成功存取記錄,經過信譽系統過濾後的客戶端誘捕系統,即使遇到流量高峰,所需處理時間不會超過22小時,並可從中偵測到使用者瀏覽路過式下載攻擊網站的紀錄。
在信譽系統方面,本研究的信譽系統不需要使用WHOIS資料庫,適用於各式各樣的網域名稱,使用3個特徵集合,共12個特徵值,基於機器學習建立分類模型,最後的正確分類率可達90.9%。本研究的信賴系統除了從DNS A-Type萃取特徵外,還從DNS NS-Type萃取特徵,實驗的結果發現,從DNS NS-Type萃取的特徵其Error Rate僅19.03%,這個全新的特徵將可供未來的信譽系統相關研究使用。
Drive-by download is a sort of network attack which uses different techniques to plant malicious codes in their computers. It makes the traditional intrusion detection systems and firewalls nonfunctional in the reason that those devices could not detect web-based threats.
The Crawler-based approach has been proposed by many studies to discover drive-by download sites. However, the Crawler-based approach could not simulate the real user behavior of web browsing when drive-by download attack happens. Therefore, this study proposes a new approach to detect drive-by download by sniffing HTTP flow.
This study uses reputation system to improve the efficiency of client honeypots, and adjusts client honeypots to process the raw data of HTTP flow. In the experiment conducted in real network environment, this study show the performance of a single client honeypot could reach average 560,000 HTTP success access log per day. Even in the peak traffic, this mechanism reduced the process time to 22 hours, and detected drive-by download sites that users were actually browsing.
Reputation system in this study is applicable to varieties of domain names because it does not refer to online WHOIS database. It established classification model on machine learning in 12 features. The correct classification rate of the reputation system applied in this study is 90.9%. Compared with other Reputation System studies, this study not only extract features from DNS A-Type but also extract features from DNS NS-Type. The experiment results show the Error Rate of the new features from DNS NS-Type is only 19.03%.
目次 Table of Contents
誌 謝 I
摘 要 II
圖 次 V
表 次 VII
第一章 緒論 1
第二章 相關研究 4
2.3 路過式下載攻擊 10
第三章 系統設計 14
3.1 系統概述 14
3.2 路徑過濾 15
3.2 特徵萃取 17
3.3 建立模型 26
第四章 效能評估 29
4.1 信譽系統驗證 29
4.3 真實網路環境實驗 34
第五章 結論與未來工作 39
參考文獻 40

參考文獻 References
[1] Internet World Stats, “Usage and Poulation Statistics,”
[2] CENZIC, “Web Application Security Trends Report,”
[3] N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose, “All Your iFRAMEs Point to Us,” Proceedings of the 17th conference on Security symposium, 2008.
[4] Virus Bulletin, “VB100 Results by platform,”
[5] D. Ourston, S. Matzner, W. Stump, and B. Hopkins, “Applications of Hidden Markov Models to Detecting Multi-stage Network Attacks,” System Sciences, Proceedings of the 36th Annual Hawaii International Conference, 2003.
[6] IT World, “Attack code used to hack Google now public,”
[7] Y.-M. Wang, D. Beck, X. Jiang, and R. Roussev, “Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities,” In 13th Annual Network and Distributed System Security Symposium, 2006.
[8] A. Moshchuk, T. Bragin, S. D. Gribble, and H. M.Levy, “A Crawler-based Study of Spyware on the Web,” In 13th Annual Network and Distributed System Security Symposium, 2006.
[9] B. Yuan. Client-side honeypots. Master’s thesis, University of Mannheim, 2007.
[10] S. Christian and S. Ramon, “Capture - Honeypot Client (Capture-HPC),” Victoria University of Wellington, NZ, 2006.
[11]L. Spitzner, Honeypots: Tracking Hackers. 2002.
[12]P. Baecher, M. Koetter, T. Holz, M. Dornseif, and F. Freiling, "The Nepenthes Platform: An Efficient Approach to Collect Malware, " In Proceedings of
the 9th International Symposium on Recent Advances in Intrusion Detection (RAID), Sept. 2006.
[13]J. Zhuge, T. Holz, X. Han, C. Song, and W. Zou, "Collecting Autonomous Spreading Malware Using High-Interaction Honeypots," In Proceeding ICICS'07 Proceedings of the 9th international conference on Information and communications security, 2007.
[14]N. Provos, "Honeyd - A Virtual Honeypot Daemon," in 10th DFN-CERT Workshop, Hamburg, Germany, February 2003.
[15] E. Balas and C. Viecco, “Towards a Third Generation Data Capture Architecture for Honeynets.” In Proceedings of the 2002 IEEE Workshop on Information Assurance and Security, 2002.
[16] J. Nazario, “Phoneyc: A virtual client honeypot,” In LEET '09: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, 2009.
[17] F. Weimer, “Passive DNS replication,” In Proceedings of FIRST Conference on Computer Security Incident, 2005.
[18] B. Zdrnja, N. Brownlee, and D. Wessels, “Passive monitoring of DNS anomalies,” In Proceedings of DIMVA Conference, 2007.
[19] M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster, “Building a Dynamic Reputation System for DNS,” In 19th Usenix Security Symposium, 2010.
[20] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “EXPOSURE:Finding malicious domains using passive DNS analysis,” In Proc.Network and Distributed System Security Symposium (NDSS), 2011.
[21] J. Ma, L. K. Saul, S. Savage, and G.. M. Voelker, “ Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs,“ In Proceeding KDD '09 Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.
[22] J. Ma, L. K. Saul, S. Savage, and G.. M. Voelker, “Identifying Suspicious URLs: An Application of Large-Scale Online Learning,” In Proc. of the International Conference on Machine Learning (ICML), 2009.
[23] ecma, “Standard ECMA-262,”
[24] M. Egele, P. Wurzinger, C. Kruegel, and E. Kirda, “Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks,” In Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment, 2009.
[25]phpBB, "Free and Open Source Forum Software",
[26] CLEAN MX, “CLEAN MX realtime database,”
[27] Malware Domain List, “Malware Domain List,”
[28] Phishtank, “Phishtank,”
[29] Alexa, “Alexa the Web Information Company,”
[30] Open Directory Project, “Open Directory Project,”
[31] SECURELIST, “Exploit Kits – A Different View,”
[32] 鄭毓芹, ”Evolving Threat Landscapes Web-Based Botnet through Exploit Kits and Scripts Evolution,” Workshop on Understanding Botnets of Taiwan, 2011.
[33] iana, “Internet Assigned Numbers Authority,”
[34] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna, “Your Botnet is My Botnet: Analysis of a Botnet Takeover,” Proceedings of the 16th ACM conference on Computer and communications security, 2009. [35]P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. 2005.
[36]黃銘宗, “混合型殭屍網路偵測,” 碩士論文, 中山大學, 2010.
[37] McAfee, “MaAfee SiteAdvisor,”
[38] Weka 3, “Data Mining with Open Source Machine Learning Software,”
[39] C. Seifert, P. Komisarczuk, and I. Welch, “Application of divide-andconquer algorithm paradigm to improve the detection speed of high interaction client honeypots,” in 23rd Annual ACM Symposium on Applied Computing, 2008.
[40]VirusTotal, “Free Online Virus, Malware and URL Scanner”,
電子全文 Fulltext
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available

紙本論文 Printed copies
開放時間 available 已公開 available

QR Code