An Adaptive Server-Side Anti-Spam System
An Adaptive Server-Side Anti-Spam System
Data mining, Statistical Testing, Spam mail
The thesis/dissertation has been browsed 5932 times, has been downloaded 1257 times.
一個實用的伺服器端郵件過濾器需要有三種能力:(1) 如何精確的過濾大量的垃圾郵件;(2) 垃圾郵件過濾器如何認得新型態的垃圾郵件以及(3) 郵件伺服器如何自動化的管理日益增多的垃圾郵件法則。而當前有關垃圾郵件之研究大多著重在單一面向(著重於垃圾郵件法則的建立)。但是在真實世界上,垃圾郵件的預防不僅僅於應用資料探勘技術產生垃圾郵件法則以過濾垃圾郵件。真實世界的垃圾郵件防治必須考量到除了垃圾郵件法則產生以外的其他議題。
The spread of spam mails have become a serious threat in the Internet. In addition to commercial messages, some malicious messages such as phishing, pornography messages, fraudulent messages and malicious codes are spread via spam.
A practical server-side anti-spam system should have ability to (1) filter out growing volume of spam mails correctly; (2) recognize new type of spam mails and (3) manage the increasing spam rules automatically. Most work only focused on single aspect (especially for spam rule generation) to prevent spam mail. However, in real world, spam prevention is not just applying data mining algorithm for rule generation. To filter out spam mails correctly and efficiently in a real world, there are still many issues should be considered in addition to spam rule generation.
In this research, we propose and integrate three sub-systems to form a practical anti-spam system, the sub-systems are spam rule generation sub-system, spam rule sharing sub-system and spam rule management sub-system. In this research, rule-based data mining approach is used to generate manageable and shareable spam rules. The latest spam rules are shared through machine-readable XML format. Spam rules stored in mail servers are managed based on statistical testing approach. The Rule management sub-system can automatically enable high performance rules and disable out-of-date rules to improve the miss rate and efficiency of spam filter. This research will develop and integrate the three sub-systems to achieve the goal of spam prevention.
1. Introduction 1
2. Literature Review 6
2.1 Overview of Anti-Spam Solutions 6
2.2 Mail feature selection review 10
2.3 Mail filter review 12
3. The Proposed Approach 14
3.1 Rules Generation Sub-system 14
3.2 Spam rule sharing 21
3.3 Spam Rule Management 24
3.4 Statistical Model 27
3.5 Rule Conflict 34
4. System Demonstration 40
5. Performance Evaluation 46
5.1 Performance Metrics 46
5.2 Experiments environment 47
5.3 Evaluation of rule sharing 51
5.4 Evaluation of rule management 56
5.5 Evaluation of proposed approach 60
6. Conclusion 66
7. Reference 67
