國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以資料探勘方法協助偵測網路服務不當使用之研究,Network Service Misuse Detection: A Data Mining Approach

論文名稱 Title	以資料探勘方法協助偵測網路服務不當使用之研究 Network Service Misuse Detection: A Data Mining Approach
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	92 學年度第 2 學期 The spring semester of Academic Year 92	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	99
研究生 Author	蕭漢威 Han-wei Hsiao
指導教授 Advisor	魏志平 Chih-Ping Wei
召集委員 Convenor	楊竹星 none
口試委員 Advisory Committee	陳嘉玫, 曾新穆, 林福仁 none; none; none
口試日期 Date of Exam	2004-07-05	繳交日期 Date of Submission	2004-09-01
關鍵字 Keywords	網路流量分析、地下 FTP 伺服程式偵測、部份樣本空間分類分析、網路服務不當使用偵測、網路管理、交互式後門程式偵測、資料探勘 Interactive Backdoor Detection, Underground FTP Server Detection, Classification with Partial Training Space, Network Management, Network Service Misuse Detection, Network Traffic Analysis, Data Mining
統計 Statistics	本論文已被瀏覽 5699 次，被下載 3327 次 The thesis/dissertation has been browsed 5699 times, has been downloaded 3327 times.

中文摘要
網際網路應用的蓬勃發展，促成了各式網路服務的興起與廣泛使用，除了加強網路管理的工作維護一個穩定而安全的網路環境之外，因為網路使用者不當使用網路服務的行為而造成影響，也成為現今網路管理必須面對的重要挑戰。所謂的網路上不當使用網路服務的意義是指網路上的使用者以濫用、不合道德使用、未經授權使用或是違法使用網路服務。而這些不當使用行為經常會刻意躲避網路管理者的監視，以隱匿的方式進行其不當的使用行為。基於偵測網路服務不當使用的重要性，我們發展了以路由器的網路流量資料做為基礎的網路不當使用的偵測技術。並且在本研究中我們提出了互助式(Cross-Training)的分類學習方法，從路由器的流量資料中建立了網路服務類別的分類預測模式，藉以偵測地下 FTP 伺服程式以及交互式網路後門程式(Interactive Backdoors)兩項網路服務不當使用的問題。在我們的評估驗證中，互助式的分類學習方法(特別是 NN -> C4.5)的分類預測結果要比傳統分類分析的方法(C4.5、倒傳式類神經網以及貝氏分類法)來的優秀，並且我們在實際網路的實證評估中，偵測地下 FTP 伺服程式系統(以互助式的分類學習方法 NN -> C4.5)可以達到 95% 的召回率(Recall Rate)以及 34%的準確率(Precision Rate)。在交互式網路後門程式偵測的預測方面，我們於實際網路上真實的找出了數個高懷疑度的交互式網路後門程式。
Abstract
As network services progressively become essential communication and information delivery mechanisms of business operations and individuals’ activities, a challenging network management issue emerges: network service misuse. Network service misuse is formally defined as “abuses or unethical, surreptitious, unauthorized, or illegal uses of network services by those who attempt to mask their uses or presence that evade the management and monitoring of network or system administrators.” Misuses of network services would inappropriately use resources of network service providers (i.e., server machines), compromise the confidentiality of information maintained in network service providers, and/or prevent other users from using the network normally and securely. Motivated by importance of network service misuse detection, we attempt to exploit the use of router-based network traffic data for facilitating the detection of network service misuses. Specifically, in this thesis study, we propose a cross-training method for learning and predicting network service types from router-based network traffic data. In addition, we also propose two network service misuse detection systems for detecting underground FTP servers and interactive backdoors, respectively. Our evaluations suggest that the proposed cross-training method (specifically, NN->C4.5) outperforms traditional classification analysis techniques (namely C4.5, backpropagation neural network, and Naïve Bayes classifier). In addition, our empirical evaluation conducted in a real-world setting suggests that the proposed underground FTP server detection system could effectively identify underground FTP servers, achieving a recall rate of 95% and a precision rate of 34% (by the NN->C4.5 cross-training technique). Moreover, our empirical evaluation also suggests that the proposed interactive backdoor detection system have the capability in capturing “true” (or more precisely, highly suspicious) interactive backdoors existing in a real-world network environment.

目次 Table of Contents
Chapter 1 Introduction 4 1.1 Background 5 1.2 Definition of Network Service Misuse 7 1.3 Research Motivation 8 1.4 Research Objectives 10 1.5 Organization of the Dissertation 12 Chapter 2 Literature Review and Formulation of Research Questions 14 2.1 Literature Review 14 2.2 Research Framework 18 2.3 Research Questions 20 Chapter 3 Aggregation of Network Traffic Data and Network Server Identification 22 3.1 Format and Characteristics of NetFlow Traffic Data 22 3.2 Network Environment Concerning in This Study 24 3.3 Network Server Identification 26 3.4 Aggregation of Network Traffic Data 27 Chapter 4 Classification with Partial Training Space: Technique Development and Empirical Evaluations 33 4.1 Definition 33 4.2 Cross-Training Method for Classification with Partial Training Space 41 4.2.1 Traditional Classification Analysis Techniques 41 4.2.2 Learning Bias of Traditional Classification Analysis Techniques 45 4.2.3 Design Principle of Cross-Training Method 47 4.2.4 Process and Algorithmic Details of Cross-Training Method 50 4.3 Empirical Evaluations of the Proposed Cross-Training Method 53 4.3.1 Data Collection 54 4.3.2 Evaluation Design 56 4.3.3 Benchmark Techniques and Specific Cross-training Techniques 58 4.3.4 Evaluation Result for FTP Service Prediction Task 61 4.3.5 Data Size Sensitivity Analysis for FTP Service Prediction Task 65 4.3.6 Evaluation Result for Interactive Service Prediction Task 69 4.3.7 Data Size Sensitivity Analysis for Interactive Service Prediction Task 73 Chapter 5 Empirical Evaluations of Network Service Misuse Detection Systems 78 5.1 Underground FTP Server Detection System 78 5.2 Interactive Backdoor Detection System 80 5.3 Empirical Evaluations of the Proposed Detection Systems 83 5.3.1 Data Collection 83 5.3.2 Evaluation Criteria 84 5.3.3 Evaluation Results of Underground FTP Server Detection System 87 5.3.4 Evaluation Results of Interactive Backdoor Detection System 89 5.3.5 Limitations of Our Empirical Evaluations 91 Chapter 6 Conclusion and Future Research Directions 93 References 97

參考文獻 References
1. Archer, N. P. and Wang, S., “Learning Bias in Neural Networks and An Approach to Controlling Its Effects in Monotonic Classification,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 15, No. 9, September 1993. 2. Atlas, L., Cole, R., Connor, J., El-Sharkawi, M., Marks II, R. J., Muthusamy, Y., and Barnard, E., “Performance Comparisons between Backpropagation Networks and Classification Trees on Three Real-World Applications,” Neural Information Processing Systems (NIPS) 2, D. S. Turetzky (Ed.), Morgan Kaufmann, San Mateo, CA, 1990, pp.622-629. 3. Bass, T. and Watt, G., “A Simple Framework for Filtering Queued SMTP Mail (cyberwar Countermeasures),” Proceedings of Military Communications Conference (MILCOM), November 1997, Vol. 3, pp.1140-1144. 4. Berry, M. J. and Linoff, G., Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc., New York, NY, 1997. 5. Blum, A. and Mitchell, T., “Combining Labeled and Unlabeled Data with Co-Training,” Proceedings of Annual Conference on Computational Learning Theory, 1998, pp.92-100. 6. Blum, A., Neural Network in C++, Wiley and Sons, Inc., New York, NY, 1992. 7. Breiman, L., Friedman, J., Olshen, R. and Stone, C., Classification and Regression Trees, Wadsworth, Pacific Grove, 1984. 8. Chen, B. and Li, Y. L., “An Implementation of An Open Network Management System,” Proceedings of International Conference on Communication Technology (ICCT), May 1996, Vol. 1, pp.391-395. 9. Cisco, “NetFlow Switching Enhancements”, Cisco Systems, Inc. [online]. available at : http://www.cisco.com/univercd/cc/td/doc/product/software/ios111/ca111/netflow.pdf, Release 11.1 CA, January 2003. 10. Dietterich, T. G. and Kong, E. B., “Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms,” Technical Report, Department of Computer Science, Oregon State University., 1995 11. Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., and Tan, P., “Data Mining for Network Intrusion Detection,” Proceedings of the NSF Workshop on Next Generation Data Mining, Baltimore, MD, November 2002. 12. Fu, L., “Knowledge Discovery Based on Neural Networks,” Communications of the ACM, Vol. 42, No. 11, November 1999, pp.47-50. 13. Halsall, F. and Modiri, N., “An Implementation of An OSI Network Management System,” IEEE Network, Vol. 4, No. 4, July 1990, pp.44-53. 14. IFPI, “Music Piracy Report 2002,” International Federation of the Phonographic Industry (IFPI) [online]. available at: http://www.ifpi.org/site-content/antipiracy/piracy2002.html, 2002. 15. Jacobson, V., Leres, C., and McCanne, S., “tcpdump”, [online]. available at : ftp://ftp.ee.lbl.gov/tcpdump.tar.Z, 1991. 16. Kass, G. V., “An Exploratory Technique for Investigating Large Quantities of Categorical Data,” Applied Statistics, Vol. 29, 1980, pp.119-127. 17. Kim, J., Lee, J., Han, K., and Lee, M., “Business as Buildings: Metrics for the Architectural Quality of Internet Businesses,” Information Systems Research, Vol. 13, No. 3, September 2002, pp.239-254. 18. Knorr, E. M. and Ng, R. T., “A Unified Approach for Mining Outliers,” Proceedings of Centre for Advanced Studies on Collaborative research, 1997 19. LAC Co., “A Walk Through Sombria: A Network Surveillance System,” Research Report, Little eArth Corporation, May 2003 20. Lawton, G., “Open Source Security: Opportunity or Oxymoron?,” IEEE Computer, Vol. 35, No. 3, March 2002, pp.18-21. 21. Lazarevic, A., Ertoz, L., Ozgur, A, Srivastava, J. and Kumar, V., “A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection,” Proceedings of Third SIAM Conference on Data Mining, San Francisco, May 2003. 22. Lee, W., Stolfo, S., and Mok, K., “A Data Mining Framework for Building Intrusion Detection Models,” Proceedings of the 1999 IEEE Symposium on Security and Privacy, Oakland, CA, May 1999, pp.120-132. 23. Miller, D. J. and Browning, J., “A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 25, No. 11, November 2003. 24. Mingers, J., “An Empirical Comparison of Selection Measures for Decision-Tree Induction,” Machine Learning, Vol. 3, 1989, pp.319-341. 25. Mitchell, T. M., “The Need for Biases in Learning Generalizations,” Technical Report, CBM-TR-117, Rutgers University, New Brunswick, NJ, 1980. 26. Murthy, S. K., Kasif, S. and Salzberg, S., “A System for Induction of Oblique Decision Trees,” Journal of Artificial Intelligence Research, Vol. 2, 1994, pp.1-32. 27. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993. 28. Quinlan, J. R., “Induction of Decision Trees,” Machine Learning, Vol. 1, 1986, pp.81-106. 29. Ranganathan, C. and Ganapathy, S., “Key Dimensions of Business-to-Consumer Web Sites,” Information & Management, Vol. 39, 2002, pp.457-465. 30. Rumelhart, D. E., Hinton, G. E. and Williams, R. J. “Learning Internal Representations by Error Propagation,” In Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, D. E. Rumelhart and J. L. McClelland (Eds.), MIT Press, Cambridge, MA, 1986, pp.318-362. 31. Stolfo, S. J., Lee, W., Chan, P. K., Fan, W. and Eskin., E., “Data Mining-based Intrusion Detectors: An Overview of the Columbia IDS Project,” ACM SIGMOD Record, Vol. 30, No.4, December 2001, pp.5-14. 32. Schapire, R. E., “A Brief Introduction to Boosting,” Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999. 33. Schapire, R. E., “The Boosting Approach to Machine Learning An Overview,” MSRI Workshop on Nonlinear Estimation and Classification, 2002. 34. Schwartz D. G., Stoecklin S. and Yilmaz E., “A Case-based Approach to Network Intrusion Detection,” Proceedings of the Fifth International Conference on Information Fusion, July 2002, Vol. 2, pp.1084-1089. 35. Tickle, A. B., Andrews, R., Golea, M. and Diederich, J. “The Truth Will Come to Light: Directions and Challenges in Extracting the Knowledge Embedded Within Trained Artificial Neural Networks,” IEEE Transactions on Neural Networks, Vol. 9, No. 6, November 1998, pp.1057-1068. 36. Wei, C., Piramuthu, S., and Shaw, M. J., “Knowledge Discovery and Data Mining,” Chapter 41 in Handbook of Knowledge Management, Vol. 2, C. W. Holsapple (Ed.), Springer-Verlag, Berlin, Germany, 2003, pp.157-189. 37. Zhang, Y. and Paxson, V., “Detecting Backdoors,” Proceedings of the 9th Usenix Security Symposium, 2000a. 38. Zhang, Y. and Paxson, V., “Detecting Stepping Stones,” Proceedings of the 9th Usenix Security Symposium, 2000b.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0901104-100029.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS