國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,適應性之伺服器端垃圾郵件過濾系統之研究,An Adaptive Server-Side Anti-Spam System

論文名稱 Title	適應性之伺服器端垃圾郵件過濾系統之研究 An Adaptive Server-Side Anti-Spam System
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	97 學年度第 2 學期 The spring semester of Academic Year 97	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	76
研究生 Author	賴谷鑫 Gu-Hsin Lai
指導教授 Advisor	鄭炳強, 陳嘉玫 Bing-Chiang Jeng; Chia-Mei Chen
召集委員 Convenor	官大智 Dah-Jyh Guan
口試委員 Advisory Committee	賴溪松, 李忠憲 Chi-Sung Laih; Jung-Shian Li
口試日期 Date of Exam	2009-06-09	繳交日期 Date of Submission	2009-07-27
關鍵字 Keywords	資料探勘、垃圾郵件、統計檢定 Data mining, Statistical Testing, Spam mail
統計 Statistics	本論文已被瀏覽 5932 次，被下載 1257 次 The thesis/dissertation has been browsed 5932 times, has been downloaded 1257 times.

中文摘要
垃圾郵件的氾濫已經成為網際網路的重大威脅，除了商業郵件外，一些惡意訊息如網路釣魚、網路詐欺、色情訊息以及惡意程式都是透過垃圾郵件散佈。垃圾郵件對於個人、組織以及社會都有重大的影響，因此解決垃圾郵件問題是當務之急。一個實用的伺服器端郵件過濾器需要有三種能力：(1) 如何精確的過濾大量的垃圾郵件；(2) 垃圾郵件過濾器如何認得新型態的垃圾郵件以及(3) 郵件伺服器如何自動化的管理日益增多的垃圾郵件法則。而當前有關垃圾郵件之研究大多著重在單一面向（著重於垃圾郵件法則的建立）。但是在真實世界上，垃圾郵件的預防不僅僅於應用資料探勘技術產生垃圾郵件法則以過濾垃圾郵件。真實世界的垃圾郵件防治必須考量到除了垃圾郵件法則產生以外的其他議題。本研究提出並整合了三個子系統做為垃圾郵件防治的架構。這三個子系統分別為垃圾郵件法則產生子系統；垃圾郵件法則分享子系統以及垃圾郵件法則管理子系統。本研究利用法則基礎（Rule-Based）之資料探勘演算法產生可分享以及可管理的垃圾郵件法則；而最新之垃圾郵件資訊則透過XML檔案格式在伺服器之間彼此分享；垃圾郵件法則的管理藉由統計檢定之方式自動之啟動有效的法則以及停用不精確的法則達成。本研究預計設計並且整合此三個子系統已達到垃圾郵件防治的目標。
Abstract
The spread of spam mails have become a serious threat in the Internet. In addition to commercial messages, some malicious messages such as phishing, pornography messages, fraudulent messages and malicious codes are spread via spam. A practical server-side anti-spam system should have ability to (1) filter out growing volume of spam mails correctly; (2) recognize new type of spam mails and (3) manage the increasing spam rules automatically. Most work only focused on single aspect (especially for spam rule generation) to prevent spam mail. However, in real world, spam prevention is not just applying data mining algorithm for rule generation. To filter out spam mails correctly and efficiently in a real world, there are still many issues should be considered in addition to spam rule generation. In this research, we propose and integrate three sub-systems to form a practical anti-spam system, the sub-systems are spam rule generation sub-system, spam rule sharing sub-system and spam rule management sub-system. In this research, rule-based data mining approach is used to generate manageable and shareable spam rules. The latest spam rules are shared through machine-readable XML format. Spam rules stored in mail servers are managed based on statistical testing approach. The Rule management sub-system can automatically enable high performance rules and disable out-of-date rules to improve the miss rate and efficiency of spam filter. This research will develop and integrate the three sub-systems to achieve the goal of spam prevention.

目次 Table of Contents
1. Introduction 1 2. Literature Review 6 2.1 Overview of Anti-Spam Solutions 6 2.2 Mail feature selection review 10 2.3 Mail filter review 12 3. The Proposed Approach 14 3.1 Rules Generation Sub-system 14 3.2 Spam rule sharing 21 3.3 Spam Rule Management 24 3.4 Statistical Model 27 3.5 Rule Conflict 34 4. System Demonstration 40 5. Performance Evaluation 46 5.1 Performance Metrics 46 5.2 Experiments environment 47 5.3 Evaluation of rule sharing 51 5.4 Evaluation of rule management 56 5.5 Evaluation of proposed approach 60 6. Conclusion 66 7. Reference 67

參考文獻 References
M. Abadi, M. Burrows, M. Manasse, T.Wobber, 2005, "Moderately hard, memory-bound functions", ACM Transactions on Internet Technology, Vol.11, No.5, pp.299-327 A. Chouchoulas, “A Rough Set-Based Approach to Text Classification”, Lecture Notes in Computer Science, 2004, Vol. 1711, pp. 118-127. X. Carreras, L. Marquez, “Boosting Trees for Anti-Spam Email Filtering”, 4th International Conference on Recent Advances in Natural Language Processing, 2001 J. Clark, I. Koprinska and J. Poon, “A neural network based approach to automated e-mail classification”, IEEE/WIC International Conference on Web Intelligence, 2003, pp:702 -705 L.F. Cranor, and B.A. LaMacchia, “Spam!”, Communications of the ACM, 1998, Vol. 41, No.8, pp. 74-83. H. Drucker, D. Wu and V.N. Vapnik, "Support vector machines for spam categorization", IEEE Transactions on Neural Networks, 1999, Vol.10, No.5, pp. 1048-1054 P.Gburzynski and G.Maitan, "Fighting the spam wars: A remailer approach with restrictive aliasing", ACM Transactions on Internet Technology, 2004, Vol.4, No.1, pp.1-30 R. J. Hall, “How to avoid unwanted email”. Communications of the ACM, 1998, Val.41, No.3, pp.88-95 Hashcash, 2003, http://www.hashcash.org/ J. Hidalgo, "Evaluating cost-sensitive unsolicited bulk email categorization," in proceedings of the 2002 ACM Symposium on Applied Computing, 2002, pp. 615-620. H. Katirai, "Filtering Junk E-Mail: A Performance Comparison between Genetic Programming and Naive Bayes,", technical report, Available: http://members.rogers.com/hoomank/papers/katirai99filtering.pdf, 1999. Lucent Personal Web Assistant,2009, http://lpwa.com. G. H. Lai, Chia-Mei Chen, Y. F. Chiu, C. S. Laih, and T. Chen, “A Collaborative Approach to Anti-Spam,” 20th Annual FIRST Conference, 2008 K. Li and H. Huang, ”An architecture of active learning SVMs for spam”, 6th International Conference on Signal Processing, 2002, Vol.2 pp:1247-1250 Z. Pawlak =, Rough sets and intelligent data analysis, Information Sciences, 2002, Vol.147, No. 1-4 , pp:1-12 Rosetta. http://www.idt.unit.no/~aleks/rosetta/rosetta.html M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A Bayesian approach to filtering junk e-mail”. In Proceedings of Workshop on Learning for Text Categorization, 1998 A. Skowron and N. Son , “Boolean Reasoning Scheme with Some Applications in Data Mining”, Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery, 1999, pp:107-115 R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998 M. Woitaszek, M. Shaaban, R. Czernikowski, “Identifying junk electronic mail in Microsoft outlook with a support vector machine”, Symposium on Applications and the Internet, 2003, pp:66 -169 J. Wrblewski, “Finding Minimal Reducts Using Genetic Algorithms”, Proceeding of the Second Annual Joint Conference on Information Sciences, 1995 pp.186-189 W. Zhao and Z. Zhang, “An email classification model based on rough set theory”, Proceedings of the International Conference on Active Media Technology, 2005, pp:403-408 W. Zhao and Y. Zhu, “An Email Classification Scheme Based on Decision-Theoretic Rough Set Theory and Analysis of Email Security”, IEEE TENCON, 2005, pp:1-6

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0727109-223459.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS