國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,社交網路之惡意 URL 偵測 ,Malicious URL Detection in Social Network

論文名稱 Title	社交網路之惡意 URL 偵測 Malicious URL Detection in Social Network
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	99 學年度第 2 學期 The spring semester of Academic Year 99	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	57
研究生 Author	蘇群凱 Qun-kai Su
指導教授 Advisor	官大智 D. J. Guan
召集委員 Convenor	陳嘉玫 Chia-Mei Chen
口試委員 Advisory Committee	范俊逸 Chun-I Fan
口試日期 Date of Exam	2011-07-20	繳交日期 Date of Submission	2011-08-15
關鍵字 Keywords	惡意網址、社交網站、機器學習、社交工程、惡意程式 Malware, Machine learning, Malicious URL, Social network, Social Engineering
統計 Statistics	本論文已被瀏覽 5656 次，被下載 1524 次 The thesis/dissertation has been browsed 5656 times, has been downloaded 1524 times.

中文摘要
近來社交網站的盛行，使用者們可在社交網站上註冊帳號，並與其他使用者帳號互相建立連線關係，其連線關係進而可形成一個社交網路，使用者們可透過社交網路來快速分享資訊、照片、及影音等。然而，社交網站上存在社交網路蠕蟲，它會發送含有惡意網址之文字訊息，這些文字訊息會結合社交工程手法，誘騙使用者點閱惡意網址，並安裝惡意程式，另外還會利用帳號之間的連線關係來攻擊其他使用者，達到快速散佈惡意程式的目的。由於多數使用者不易辨別這些網址的安全性，因此本研究提出一個基於社交網站 Facebook 塗鴉牆 (wall) 環境的惡意網址偵測方法，該方法使用顯著性高之啟發式特徵 (Heuristic features) 及機器學習演算法，來預測網址訊息之安全性。實驗結果顯示，本研究所提出的偵測方法，可達到相當高的判別率，其惡意約有 96.3%，良性約有 95.4%，準確率約有 95.7%。
Abstract
Social network web sites become very popular nowadays. Users can establish connections with other users forming a social network, and quickly share information, photographs, and videos with friends. Malwares called social network worms can send text messages with malicious URLs by employing social engineering techniques. They are trying let users click malicious URL and infect users. Also, it can quickly attack others by infected user accounts in social network. By curiosity, most users click it without validation. This thesis proposes a malicious URL detection method used in Facebook wall, which used heuristic features with high classification property and machine learning algorithm, to predict the safety of URL messages. Experiments show that, the proposed approach can achieve about 96.3% of True Positive Rate, 95.4% of True Negative Rate, and 95.7% of Accuracy.

目次 Table of Contents
論文審定書 i 誌謝 ii 中文摘要 iii 英文摘要 iv 目錄 v 圖次 vii 表次 viii 第一章緒論 1 1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 研究目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 第二章文獻探討 6 2.1 網路拓樸. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 網路蠕蟲. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 惡意網址偵測方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 機器學習演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 第三章研究方法 19 3.1 資料收集方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 系統架構及流程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 啟發式特徵. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 貝氏分類模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 第四章實驗結果 27 4.1 樣本收集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 實驗評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 階段一：不同惡意網址特徵交互組合之實驗. . . . . . . . . . . . . . . . 31 4.4 階段二：加入垃圾特徵之實驗. . . . . . . . . . . . . . . . . . . . . . . . 34 4.5 階段三：排除短網址樣本之實驗. . . . . . . . . . . . . . . . . . . . . . . 37 4.6 階段四：不同時間分區樣本資料集合之實驗. . . . . . . . . . . . . . . . 37 4.7 實驗討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 第五章結論及未來展望 42 參考文獻 44

參考文獻 References
[1] “Facebook press room, 統計資料.” http://www.facebook.com/press/info.php?statistics. [2] “Trend micro corporate end user survey: Global rise in workplace social networking,”2010. available in http://trendmicro.mediaroom.com/file.php/179/Trend+Micro+2010+Corporate+End+User+Study+-+PR2.zip. [3] M. Madden, “Older adults and social media,” tech. rep., Pew Internet & American Life Project, 2010. available in http://pewinternet.org/Reports/2010/Older-Adults-and-Social-Media/Report.aspx. [4] “Facebook pages statistics.” http://statistics.allfacebook.com/pages. [5] “Top facebook pages, worldwide social media stats.” http://www.famecount.com/facebook-rank. [6] “Facebook pages statistics - socialbakers.” http://www.socialbakers.com/facebook-pages/. [7] “Malware, spam in 10 per cent of facebook links,” 2010. http://www.itbusiness.ca/it/client/en/Home/News.asp?id=59518. [8] R. D. Smith, “Instant messaging as a scale-free network.” EB/OL, 2002. [9] C. D. Morse and H. Wang, “The structure of an instant messenger network and its vulnerability to malicious codes,” in Proc. of ACM SIGCOMM, 2005. [10] D. M. Boyd and N. B. Ellison, “Social network sites: Definition, history, and scholarship,” Journal of Computer-Mediated Communication, vol. 13, no. 1, pp. 210–230, 2008. [11] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee, “Measurement and analysis of online social networks,” in Proceeding IMC ’07 Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007. [12] W. Xu, F. Zhang, and S. Zhu, “Toward worm detection in online social networks,” in Proceeding ACSAC ’10 Proceedings of the 26th Annual Computer Security Applications Conference, 2010. [13] D. M. Kienzle and M. C. Elder, “Recent worms: A survey and trends,” in Proceeding WORM ’03 Proceedings of the 2003 ACM workshop on Rapid malcode, 2003. [14] M. Mannan and P. C. van Oorschot, “On instant messaging worms, analysis and countermeasures,” in Proceeding WORM ’05 Proceedings of the 2005 ACM workshop on Rapid malcode, 2005. [15] “IM viruses opening a new can of worms,” 2001. http://www.usatoday.com/tech/news/2001-08-16-ebrief.htm. [16] “IM threats growing 50% per month,” 2005. http://www.informationweek.com/news/60407752. [17] “Microsoft security bulletin MS05-009: Vulnerability in PNG processing could allow remote code execution (890261),” 2005. http://www.microsoft.com/technet/security/bulletin/MS05-009.mspx. [18] “Microsoft security bulletin MS05-022: Vulnerability in MSN messenger could lead to remote code execution (896597),” 2005. http://www.microsoft.com/technet/security/Bulletin/MS05-022.mspx. [19] “Worm spreads on facebook, hijacks users’ clicks,” 2008. http://www.computerworld.com/s/article/9122724/Worm_spreads_on_Facebook_hijacks_users_clicks. [20] K. Thomas and D. M. Nicol, “The koobface botnet and the rise of social malware,” in Malicious and Unwanted Software (MALWARE), 2010 5th International Conference, 2010. [21] J. Baltazar, J. Costoya, and R. Flores, “The real face of koobface: The largest web 2.0 botnet explained,” tech. rep., Trend Micro Threat Research, 2009. [22] Y. Zhang, J. Hong, and L. Cranor, “CANTINA: A content-based approach to detecting phishing web sites,” in Proceedings of the International World Wide Web Conference (WWW), 2007. [23] S. Garera, N. Provos, M. Chew, and A. D. Rubin, “A framework for detection and measurement of phishing attacks,” in Proceedings of the ACM Workshop on Rapid Malcode (WORM), 2007. [24] D. K. McGrath and M. Gupta, “Behind phishing: An examination of phisher modi operandi,” in Proc. of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), 2008. [25] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: Learning to detect malicious web sites from suspicious URLs,” in KDD ’09 Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009. [26] P. Kolari, T. Finin, and A. Joshi, “SVMs for the blogosphere: Blog identification and splog detection,” in Proceedings of the AAAI Spring Symposium on Computational Approaches to Analysing Weblogs, 2006. [27] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Identifying suspicious URLs: an application of large-scale online learning,” in Proc. of the International Conference on Machine Learning (ICML), 2009. [28] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Learning to detect malicious URLs,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, 2011. [29] I. Fette, N. Sadeh, and A. Tomasic, “Learning to detect phishing emails,” in WWW ’07: Proceedings of the 16th International Conference on World Wide Web, pp. 649–656, 2007. [30] S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair, “A comparison of machine learning techniques for phishing detection,” in Proceedings of the Anti-Phishing Working Group eCrime Researchers Summit, 2007. [31] A. Bergholz, G. Paas, F. Reichartz, S. Strobel, and J.-H. Chang, “Improved phishing detection using model-based features,” in Proceedings of the Conference on Email and Anti-Spam (CEAS), 2008. [32] 林家賓,以異常為基礎之即時通訊惡意URL偵測. PhD thesis,國立中山大學, 2009. [33] D. J. Guan, C.-M. Chen, and J.-B. Lin, “Anomaly based malicious URL detection in instant messaging,” in Proceedings of the Joint Workshop on Information Security (JWIS), 2009. [34] I. Rish, “An empirical study of the naive bayes classifier,” in Proceedings of IJCAI-01 workshop on Empirical Methods in AI, pp. 41–46, 2001.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0815111-155110.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS