Responsive image
博碩士論文 etd-0912106-173411 詳細資訊
Title page for etd-0912106-173411
論文名稱
Title
以區域聯防為基礎之垃圾郵件防治研究
Anti-Spam Study: an Alliance-based Approach
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
91
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2006-07-25
繳交日期
Date of Submission
2006-09-12
關鍵字
Keywords
強化學習、約略集合理論、文件分類、垃圾郵件、XCS分類元系統
Rough set theory, Reinforcement learning, XCS classifier system, Spam, Text classification
統計
Statistics
本論文已被瀏覽 5929 次,被下載 20
The thesis/dissertation has been browsed 5929 times, has been downloaded 20 times.
中文摘要
垃圾郵件帶來的威脅日趨嚴重,顯示出垃圾郵件過濾技術的價值所在。現今的過濾技術多為機器學習與資料探勘的結合,這些技術強調能達到極高的準確度,但其誤判率卻不一定很低;在實際狀況中,誤判率造成的損失通常都是難以彌補的。許多垃圾郵件防治方案只是針對某些現行的技術提出改善,而混用多種演算法的研究又相當少見,於是本研究提出了區域聯防的架構,結合約略集合理論、基因演算法與XCS分類元系統,期望能廣為散播關於垃圾郵件的即時資訊,使郵件伺服器得以聯手防堵氾濫成災的垃圾郵件。
約略集合理論在處理不精確也不完整的資料方面有卓越的能耐,並且是有助於交換分享的規則導向演算法;又因約略集合理論計算最佳reduct組合屬於NP-hard的問題,所以需藉助基因演算法可在大量資料中快速搜尋、比對、演化出最佳解的特性,產生垃圾郵件的過濾規則。XCS中的強化學習能幫助各個郵件伺服器了解最適合自身的郵件分類準則。以區域聯防為基礎的垃圾郵件過濾成果,經過一些統計方法評估後證實有不錯的表現,並有以下兩個結論:
(1)從別台郵件伺服器交換來的過濾規則,確實對阻擋掉更多的垃圾郵件有貢獻。
(2)混用多種演算法的垃圾郵件防治方案能同時改善準確度與誤判率。
Abstract
The growing problem of spam has generated a need for reliable anti-spam filters. There are many filtering techniques along with machine learning and data miming used to reduce the amount of spam. Such algorithms can achieve very high accuracy but with some amount of false positive tradeoff. Generally false positives are prohibitively expensive in the real world. Much work has been done to improve specific algorithms for the task of detecting spam, but less work has been report on leveraging multiple algorithms in email analysis. This study presents an alliance-based approach to classify, discovery and exchange interesting information on spam. Furthermore, the spam filter in this study is build base on the mixture of rough set theory (RST), genetic algorithm (GA) and XCS classifier system.
RST has the ability to process imprecise and incomplete data such as spam. GA can speed up the rate of finding the optimal solution (i.e. the rules used to block spam). The reinforcement learning of XCS is a good mechanism to suggest the appropriate classification for the email. The results of spam filtering by alliance-based approach are evaluated by several statistical methods and the performance is great. Two main conclusions can be drawn from this study: (1) the rules exchanged from other mail servers indeed help the filter blocking more spam than before. (2) a combination of algorithms improves both accuracy and reducing false positives for the problem of spam detection.
目次 Table of Contents
Chapter 1 Introduction....................................1
1.1 Recent Reports on Spam................................2
1.2 Problem Definition and Motivation.....................4
1.3 Reader’s Guide.......................................7
Chapter 2 Related Works...................................9
2.1 Spam Filtering Techniques Review......................9
2.2 Rough Sets Theory....................................17
2.3 Genetic Algorithm....................................22
2.4 XCS Classifier System................................25
Chapter 3 Alliance-based Approach........................30
3.1 Single-server System.................................32
3.2 System Architecture..................................39
3.3 Performance Criteria.................................46
Chapter 4 Evaluation and Validation......................49
4.1 Design of Experiments................................49
4.2 Steps of Experiments.................................60
4.3 The Respective Performances..........................63
4.4 The Overall Performance..............................67
Chapter 5 Conclusions and Future Work....................74
Appendix A–The Configuration of .procmailrc.............76
Appendix B–Miscellaneous Notation and System Parameters.77
Bibliography.............................................78
參考文獻 References
[1] Mark Levitt and Robert P. Mahowald, "Worldwide email usage 2005-2009 forecast: Email's future depends on keeping its value high and its cost low, " Tech. Rep, pp. 36, 22 Dec, 2005.
[2] IDC, "IDC_ROI_Calculator for Anti-Spam solution," 2004. Available: http://www.surfcontrol.com/resources/asroi/IDC_ROI_Calculator.htm
[3] SophosLabs, "Sophos reveals latest ‘dirty dozen’ spam relaying countries", Tech. Rep, 12 October, 2005.
[4] The Spamhause Project, "The 10 worst spam origin countries," Spamhaus, Tech. Rep. 20051030, 30 October, 2005.
[5] James Carpinter and Ray Hunt, "Tightening the net: A review of current and next generation spam filtering tools, " Presented at Asia Pacific Regional Internet Conference on Operational Technologies, 2005.
[6] MessageLabs, "MessageLabs intelligence report: 2006 quarter 2 summary report, " Tech. Rep, pp. 17, June 2006.
[7] Hassan, Y. Tazaki, E. "Rule extraction based on rough set theory combined with genetic programming and its application to medical data analysis," Presented at Intelligent Information Systems Conference, the Seventh Australian and New Zealand, 2001.
[8] Pivotal Veracity, "Anti-Spam Methods & Checks,"
[9] Bart Massey, Mick Thomure, Raya Budrevich and Scott Long. "Learning spam: Simple techniques for freely-available software," 2003.
[10] A. Chouchoulas, "A rough set approach to text classification," 1999. Available: http://www.bedroomlan.org/~alexios/files/alexios_msc_thesis.pdf
[11] P. Alina Lazar, "An overview of heuristic knowledge discovery for large data sets Using genetic algorithms and rough sets," pp. 7, 2002.
[12] A. Hassanien, "Rough set approach for attribute reduction and rule generation: a case of patients with suspected breast cancer," J. Am. Soc. Inf. Sci. Technol., vol. 55, pp. 954-962, 2004.
[13] Z. Pawlak, "Rough sets," Int. J. Inf. Comput. Sci., 11. 1982.
[14] L. A. Zadeh, "Fuzzy sets," Inf Control, 8. 1965.
[15] Z. Pawlak, J. Grzymala-Busse, R. Slowinski and W. Ziarko, "Rough sets," Commun ACM, vol. 38, pp. 88-95, 1995.
[16] B. Walczak and D. L. Massart, "Tutorial Rough sets theory," Chemometrics Intellig. Lab. Syst., vol. 47, pp. 1-16, 1999.
[17] Z. Zheng, G. Wang and Y. Wu, "Objects'combination based simple computation of attribute core," Intelligent Control, 2002.Proceedings of the 2002 IEEE International Symposium on, pp. 514-519, 2002.
[18] S. Fujimori, T. Kaiya and T. Inoue, "Analysis of discharge currents with discernibility matrices," Electrical Insulating Materials, 1998.Proceedings of 1998 International Symposium on, pp. 649-652, 1998.
[19] S. Vinterbo and A. Ohrn, "Minimal approximate hitting sets and rule templates," International Journal of Approximate Reasoning, vol. 25, pp. 123-143, 2000.
[20] J. Wroblewski, "Finding minimal reducts using genetic algorithm (extended version)," Proceedings of Second Joint Annual Conference on Information Sciences, USA, pp. 186-189, 1995.
[21] Binbin Qu and Yansheng Lu, "A rough sets & genetic based approach for rule induction," in 2004, pp. 4300-4303 Vol.5.
[22] G. Chakraborty and B. Chakraborty, "A rough-GA hybrid algorithm for rule extraction from large data," in 2004, pp. 85-90.
[23] Tian-Le Tan, Ping Li and Zhi-Huan Song, "Matrix computation for dynamic modification of rough set information system," in 2003, pp. 1692-1697 Vol.3.
[24] Sen Guo, Zhi-Yan Wang, Zhi-Cheng Wu and He-Ping Yan, "A novel dynamic incremental rules extraction algorithm based on rough set theory," in 2005, pp. 1902-1907 Vol. 3.
[25] Tong Lingyun and An Liping, "Incremental learning of decision rules based on rough set theory," in 2002, pp. 420-425 vol.1.
[26] Tianrui Li, Ning Yang, Yang Xu and Jun Ma, "An incremental algorithm for mining classification rules in incomplete information systems," in 2004, pp. 446-449 Vol.1.
[27] J. H. Holland, "Adaptation in Natural and Artificial Systems [M]," Ann Arbor: University of Michigan Press, vol. 183, 1975.
[28] L. Khoo and L. Zhai, "A prototype genetic algorithm-enhanced rough set-based rule induction system," Comput. Ind., vol. 46, pp. 95-106, August. 2001.
[29] R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms. Wiley-Interscience, 2004,
[30] S. W. Wilson, "State of XCS classifier system research," in Learning Classifier Systems, from Foundations to Applications, 2000, pp. 63-82.
[31] Stewart W. Wilson, "Classifier Fitness Based on Accuracy," Evolutionary Computation, Vol. 3, No.2, pp. 175, 1995.
[32] M. V. Butz, T. Kovacs, P. L. Lanzi and S. W. Wilson, "Toward a theory of generalization and learning in XCS," Evolutionary Computation, IEEE Transactions on, vol. 8, pp. 28-46, 2004.
[33] J. Hidalgo, "Evaluating cost-sensitive unsolicited bulk email categorization," in SAC '02: Proceedings of the 2002 ACM Symposium on Applied Computing, 2002, pp. 615-620.
[34] H. Katirai, "Filtering Junk E-Mail: A Performance Comparison between Genetic Programming and Naive Bayes," September 10, 1999. 1999.
[35] G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. Spyropoulos and P. Stamatopoulos, "A memory-based approach to anti-spam filtering," 2001.
[36] I. Androutsopoulos, J. Koutsias, K. V. Chandrinos and C. D. Spyropoulos, "An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages," in SIGIR '00: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000, pp. 160-167.
[37] E. Riloff and W. Lehnert, "Information extraction as a basis for high-precision text classification," ACM Trans. Inf. Syst., vol. 12, pp. 296-333, 1994.
[38] H. Drucker, Donghui Wu and V. N. Vapnik, "Support vector machines for spam categorization," Neural Networks, IEEE Transactions on, vol. 10, pp. 1048-1054, 1999.
[39] Aleksander Øhrn, "Discernibility and Rough Sets in Medicine: Tools and Applications", December 1999.
[40] T. Kovacs, "Evolving Optimal Populations with XCS Classier Systems," Research Papers CSRP-96-17, the University of Birmingham, School of Computer Science, 1996.
[41] E. Bernado-Mansilla and Tin Kam Ho, "Domain of competence of XCS classifier system in complexity measurement space," Evolutionary Computation, IEEE Transactions on, vol. 9, pp. 82-104, 2005.
[42] Mo-Yi Tzeng, "A Spam Filter Based on Rough Sets Theory," July 2005.
[43] Doug Herbers, "Collaborative E-mail Filtering," 2005.
[44] F. D. Garcia,J.-H.Hoepman and J. van Nieuwenhuizen, "Spam Filter Analysis," Presented at Proceedings of 19th IFIP International Information Security Conference, WCC2004-SEC, 2004.
[45] Lorrie Faith, Brain A. LaMacchia. “Spam!”, Commun ACM, vol. 41, pp. 74-83, 8. 1998.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內一年後公開,校外永不公開 campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 35.174.62.162
論文開放下載的時間是 校外不公開

Your IP address is 35.174.62.162
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code