Responsive image
博碩士論文 etd-0727109-223459 詳細資訊
Title page for etd-0727109-223459
An Adaptive Server-Side Anti-Spam System
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
Data mining, Statistical Testing, Spam mail
本論文已被瀏覽 5899 次,被下載 1257
The thesis/dissertation has been browsed 5899 times, has been downloaded 1257 times.
一個實用的伺服器端郵件過濾器需要有三種能力:(1) 如何精確的過濾大量的垃圾郵件;(2) 垃圾郵件過濾器如何認得新型態的垃圾郵件以及(3) 郵件伺服器如何自動化的管理日益增多的垃圾郵件法則。而當前有關垃圾郵件之研究大多著重在單一面向(著重於垃圾郵件法則的建立)。但是在真實世界上,垃圾郵件的預防不僅僅於應用資料探勘技術產生垃圾郵件法則以過濾垃圾郵件。真實世界的垃圾郵件防治必須考量到除了垃圾郵件法則產生以外的其他議題。
The spread of spam mails have become a serious threat in the Internet. In addition to commercial messages, some malicious messages such as phishing, pornography messages, fraudulent messages and malicious codes are spread via spam.
A practical server-side anti-spam system should have ability to (1) filter out growing volume of spam mails correctly; (2) recognize new type of spam mails and (3) manage the increasing spam rules automatically. Most work only focused on single aspect (especially for spam rule generation) to prevent spam mail. However, in real world, spam prevention is not just applying data mining algorithm for rule generation. To filter out spam mails correctly and efficiently in a real world, there are still many issues should be considered in addition to spam rule generation.
In this research, we propose and integrate three sub-systems to form a practical anti-spam system, the sub-systems are spam rule generation sub-system, spam rule sharing sub-system and spam rule management sub-system. In this research, rule-based data mining approach is used to generate manageable and shareable spam rules. The latest spam rules are shared through machine-readable XML format. Spam rules stored in mail servers are managed based on statistical testing approach. The Rule management sub-system can automatically enable high performance rules and disable out-of-date rules to improve the miss rate and efficiency of spam filter. This research will develop and integrate the three sub-systems to achieve the goal of spam prevention.
目次 Table of Contents
1. Introduction 1
2. Literature Review 6
2.1 Overview of Anti-Spam Solutions 6
2.2 Mail feature selection review 10
2.3 Mail filter review 12
3. The Proposed Approach 14
3.1 Rules Generation Sub-system 14
3.2 Spam rule sharing 21
3.3 Spam Rule Management 24
3.4 Statistical Model 27
3.5 Rule Conflict 34
4. System Demonstration 40
5. Performance Evaluation 46
5.1 Performance Metrics 46
5.2 Experiments environment 47
5.3 Evaluation of rule sharing 51
5.4 Evaluation of rule management 56
5.5 Evaluation of proposed approach 60
6. Conclusion 66
7. Reference 67
參考文獻 References
M. Abadi, M. Burrows, M. Manasse, T.Wobber, 2005, "Moderately hard, memory-bound functions", ACM Transactions on Internet Technology, Vol.11, No.5, pp.299-327
A. Chouchoulas, “A Rough Set-Based Approach to Text Classification”, Lecture Notes in Computer Science, 2004, Vol. 1711, pp. 118-127.
X. Carreras, L. Marquez, “Boosting Trees for Anti-Spam Email Filtering”, 4th International Conference on Recent Advances in Natural Language Processing, 2001
J. Clark, I. Koprinska and J. Poon, “A neural network based approach to automated e-mail classification”, IEEE/WIC International Conference on Web Intelligence, 2003, pp:702 -705
L.F. Cranor, and B.A. LaMacchia, “Spam!”, Communications of the ACM, 1998, Vol. 41, No.8, pp. 74-83.
H. Drucker, D. Wu and V.N. Vapnik, "Support vector machines for spam categorization", IEEE Transactions on Neural Networks, 1999, Vol.10, No.5, pp. 1048-1054
P.Gburzynski and G.Maitan, "Fighting the spam wars: A remailer approach with restrictive aliasing", ACM Transactions on Internet Technology, 2004, Vol.4, No.1, pp.1-30
R. J. Hall, “How to avoid unwanted email”. Communications of the ACM, 1998, Val.41, No.3, pp.88-95
Hashcash, 2003,
J. Hidalgo, "Evaluating cost-sensitive unsolicited bulk email categorization," in proceedings of the 2002 ACM Symposium on Applied Computing, 2002, pp. 615-620.
H. Katirai, "Filtering Junk E-Mail: A Performance Comparison between Genetic Programming and Naive Bayes,", technical report, Available:, 1999.
Lucent Personal Web Assistant,2009,
G. H. Lai, Chia-Mei Chen, Y. F. Chiu, C. S. Laih, and T. Chen, “A Collaborative Approach to Anti-Spam,” 20th Annual FIRST Conference, 2008
K. Li and H. Huang, ”An architecture of active learning SVMs for spam”, 6th International Conference on Signal Processing, 2002, Vol.2 pp:1247-1250
Z. Pawlak =, Rough sets and intelligent data analysis, Information Sciences, 2002, Vol.147, No. 1-4 , pp:1-12
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A Bayesian approach to filtering junk e-mail”. In Proceedings of Workshop on Learning for Text Categorization, 1998
A. Skowron and N. Son , “Boolean Reasoning Scheme with Some Applications in Data Mining”, Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery, 1999, pp:107-115
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998
M. Woitaszek, M. Shaaban, R. Czernikowski, “Identifying junk electronic mail in Microsoft outlook with a support vector machine”, Symposium on Applications and the Internet, 2003, pp:66 -169
J. Wrblewski, “Finding Minimal Reducts Using Genetic Algorithms”, Proceeding of the Second Annual Joint Conference on Information Sciences, 1995 pp.186-189
W. Zhao and Z. Zhang, “An email classification model based on rough set theory”, Proceedings of the International Conference on Active Media Technology, 2005, pp:403-408
W. Zhao and Y. Zhu, “An Email Classification Scheme Based on Decision-Theoretic Rough Set Theory and Analysis of Email Security”, IEEE TENCON, 2005, pp:1-6
電子全文 Fulltext
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available

紙本論文 Printed copies
開放時間 available 已公開 available

QR Code