國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,多因子Android惡意程式偵測系統,Multi-Factor Android Malware Detection System

論文名稱 Title	多因子Android惡意程式偵測系統 Multi-Factor Android Malware Detection System
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	102 學年度第 2 學期 The spring semester of Academic Year 102	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	63
研究生 Author	喬峯 Feng Chiao
指導教授 Advisor	范俊逸, 王智弘 Chun-I Fan; Chih-Hung Wang
召集委員 Convenor	官大智 D.J. guan
口試委員 Advisory Committee	陳嘉玫 Chen-chia Mei
口試日期 Date of Exam	2014-06-12	繳交日期 Date of Submission	2014-07-21
關鍵字 Keywords	系統函式、惡意App偵測、資料探勘、權限、分類法 Malicious App Detection, Data Mining, Classifying, System Call, Permission
統計 Statistics	本論文已被瀏覽 5640 次，被下載 414 次 The thesis/dissertation has been browsed 5640 times, has been downloaded 414 times.

中文摘要
自從Apple的iPhone以及Google的Android系列智慧型手機在2007與2008上市後，智慧型手機的市占率便節節上升，而其中又以Android系統的智慧型手機之市占率成長率最為顯著。智慧型手機能夠成功擄獲使用者的心的最主要的原因之一就是在官方App市集(App Store、Google Play)上資源豐富的App。由於Android系統的開放性以及有些熱心使用者會將一些原本需要付費的App重新封裝後供他人下載，一般的使用者即可輕易的在自己的智慧型手機上安裝第三方市集上所下載之App。然而，由於第三方市集上之App毋須經過官方認證，因此出現惡意App之機率較高。本研究先側錄了App執行時、閒置時所使用到的System Call以及App所要求之權限，接著使用資料探勘(Data Mining)的技術來比較官方市集與已知惡意App之紀錄差異，再使用機器學習(Machine Learning)之技術來建造偵測模型，未來即可使用此模型來偵測未知之惡意App，最後再使用特徵選取(Attribute Selection)之演算法來進行降維的動作，降低偵測所需要的時間。本研究實驗結果顯示，本研究所使用之方法對於App的正確判別率可以超過96%，且根據模型偵測之結果，第三方市集上約莫有20%的App含有惡意行為。
Abstract
Since Apple and Google introduced iPhone and Android smartphones in 2007 and 2008, the market share of smartphones has been on the increase. Above all, the market share of Android devices has had the most significant increment. One of the reasons that smartphones became so successful is because of the official application store(App Store and Google Play). In this research, first we recorded the system calls an App uses while in execution and while idle as well as the permissions it requested. We then used the techniques of data mining to find record differences between malicious Apps and benign Apps, and machine learning techniques to build the model for detecting unknown malicious Apps. Finally, with the help of attribute selection methods, we reduced the time cost by using less attributes. The experiment results showed that the accuracy achieved more than 96% with proposed scheme, and that approximately 20% of the Apps of third-party market acts maliciously.

目次 Table of Contents
論文審定書 i Acknowledgments iv 摘要v Abstract vi List of Figures ix List of Tables x List of Listings xi Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Background 4 2.1 MalApp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Dynamic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 The Proposed Method 10 3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 System Call Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Feature Extracting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 Experiment and Evaluation 20 4.1 Sample Apps Collecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Data Collecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Dynamic Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Static Features Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.5 Attribute Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.7 Third Party Market Apps Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.8 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Chapter 5 Conclusion and FutureWorks 36 Bibliography 38 Appendix A Source Codes 44 List of Figures 3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1 Key Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Scatter Plots of Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 List of Tables 4.1 Datasets for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 RedFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.4 Result of Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Result of Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.6 Top 5 Weighted and Selected Attributes for Experiment 1 . . . . . . . . . . . . 28 4.7 Top 5 Weighted and Selected Attributes for Experiment 2 . . . . . . . . . . . . 29 4.8 Important Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.9 Wrongly Classified Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.10 Attributes of Wrongly Classified Samples . . . . . . . . . . . . . . . . . . . . . . 32 4.11 Estimations of imobile Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.12 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 List of Listings A.1 Set Up Emulator and System Call Record . . . . . . . . . . . . . . . . . . . . . . 44 A.2 Transform Data to Acceptable Format . . . . . . . . . . . . . . . . . . . . . . . . 47 A.3 Retrieve Static and Dynamic Features . . . . . . . . . . . . . . . . . . . . . . . . 49

參考文獻 References
[1] 手機之家. http://imobile.com.cn/. [2] Androguard. https://code.google.com/p/androguard/. [3] Android Malware ITU Regional Forum on Cybersecurity. https://www.itu.int/ ITU-D/eur/rf/cybersecurity/presentations/symantec-itu_mobile.pdf. [4] Android rooting. https://en.wikipedia.org/wiki/Android_rooting. [5] APK Downloader. http://apps.evozi.com/apk-downloader/. [6] Bayesian network. https://en.wikipedia.org/wiki/Bayesian_network. [7] Contagio mobile. http://contagiominidump.blogspot.tw/. [8] Logistic regression. https://en.wikipedia.org/wiki/Logistic_regression# Bayesian_logistic_regression. [9] Strace for Android. http://benno.id.au/android/strace. [10] UI/Application Exerciser Monkey. https://developer.android.com/tools/help/ monkey.html. [11] VirusTotal - Free Online Virus, Malware and URL Scanner. https://www.virustotal. com/. [12] XDA Developer Forum. http://forum.xda-developers.com/. [13] Yousra Aafer, Wenliang Du, and Heng Yin. DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android. In Tanveer Zia, Albert Y. Zomaya, Vijay Varadharajan, and Zhuoqing Morley Mao, editors, SecureComm, volume 127 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pages 86–103. Springer, 2013. [14] Kevin Joshua Abela, Don Kristopher Angeles, Jan Raynier Delas Alas, Robert Joseph To- lentino, and Miguel Alberto Gomez. An Automated Malware Detection System for An- droid using Behavior-based Analysis AMDA. In International Journal of Cyber-Security and Digital Forensics, pages 1–11, 2013. [15] D. Aha and D. Kibler. Instance-based learning algorithms. Machine Learning, 6:37–66, 1991. [16] Zarni Aung and Win Zaw. Permission-Based Android Malware Detection. In Interna- tional Journal of Scientific & Technology Research, pages 228–234, 2013. [17] AV-Comparatives. Mobile Security Review August 2013. Technical report, AV- Comparatives, 2013. [18] AV-Comparatives. File Detection Test March 2014. Technical report, AV-Comparatives, 2014. [19] Iker Burguera, Urko Zurutuza, and Simin Nadjm-Tehrani. Crowdroid: Behavior-based Malware Detection System for Android. In Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, SPSM ’11, pages 15–26, New York, NY, USA, 2011. ACM. [20] Chih-Chung Chang and Chih-Jen Lin. LIBSVM - A Library for Support Vector Machines, 2001. The Weka classifier works with version 2.82 of LIBSVM. [21] Blue Coat. Blue Coat Systems 2014 Mobile Malware Report. Technical report, Blue Coat, 2014. [22] Corinna Cortes and Vladimir Vapnik. Support-Vector Networks. Mach. Learn., 20(3):273–297, September 1995. [23] Gianluca Dini, Fabio Martinelli, Andrea Saracino, and Daniele Sgandurra. MADAM: A Multi-level Anomaly Detector for Android Malware. In Proceedings of the 6th Inter- national Conference on Mathematical Methods, Models and Architectures for Computer Network Security: Computer Network Security, MMM-ACNS’12, pages 240–253, Berlin, Heidelberg, 2012. Springer-Verlag. [24] Yasser EL-Manzalawy. WLSVM, 2005. You don’t need to include the WLSVM package in the CLASSPATH. [25] F-Secure. Mobile Threat Report Q1 2014. Technical report, F-Secure, 2014. [26] Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, and David Wagner. Android Permissions Demystified. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, pages 627–638, New York, NY, USA, 2011. ACM. [27] Eibe Frank and Ian H. Witten. Generating Accurate Rule Sets Without Global Optimiza- tion. In J. Shavlik, editor, Fifteenth International Conference on Machine Learning, pages 144–151. Morgan Kaufmann, 1998. [28] Yoav Freund and Robert E. Schapire. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting, 1997. [29] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive Logistic Regression: a Statistical View of Boosting. Annals of Statistics, 28:2000, 1998. [30] Joao Gama. Functional Trees. 55(3):219–250, 2004. [31] Alexander Genkin, David D. Lewis, and David Madigan. Large-scale bayesian logistic regression for text categorization. Technical report, DIMACS, 2004. [32] Sheran Gunasekera. Android Apps Security. Apress, Berkely, CA, USA, 1st edition, 2012. [33] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl., 11(1):10–18, November 2009. [34] You Joung Ham, Daeyeol Moon, Hyung-Woo Lee, Jae Deok Lim, , and Jeong Nyeo Kim. Android Mobile Application System Call Event Pattern Analysis for Determination of Malicious Attack. In International Journal of Security and Its Applications, pages 231– 246, 2014. [35] Andrew Hoog. Android Forensics Investigation, Analysis, and Mobile Security for Google Android. Elsevier, 2011. [36] Chun-Ying Huang, Yi-Ting Tsai, and Chung-Han Hsu. Performance Evaluation on Permission-Based Detection for Android Malware. In Proceedings of International Computer Symposium (ICS), pages –, 2012. [37] Takamasa Isohara, Keisuke Takemori, and Ayumu Kubota. Kernel-based Behavior Analysis for Android Malware Detection. In Proceedings of the 2011 Seventh International Conference on Computational Intelligence and Security, CIS ’11, pages 1011–1015, Washington, DC, USA, 2011. IEEE Computer Society. [38] George H. John and Pat Langley. Estimating Continuous Distributions in Bayesian Classifiers. In Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338–345, San Mateo, 1995. Morgan Kaufmann. [39] Ryan Johnson, ZhaohuiWang, Corey Gagnon, and Angelos Stavrou. Analysis of Android Applications’ Permissions. In SERE (Companion), pages 45–46. IEEE, 2012. [40] S. Kullback and R. A. Leibler. On Information and Sufficiency. Ann. Math. Statist., 22(1):79–86, 1951. [41] Kaspersky Lab. Kaspersky Security Bulletin 2013. Technical report, Kaspersky Lab, 2013. [42] McAfee Labs. McAfee Labs Threats Report: Fourth Quarter 2013. Technical report, McAfee Labs, 2013. [43] Niels Landwehr, Mark Hall, and Eibe Frank. Logistic model trees. 95(1-2):161–205, 2005. [44] Lookout. Mobile Threats, Made to Measure. Technical report, Lookout, 2013. [45] Steve Mansfield-Devine. Android malware and mitigations. Network Security, 2012(11):12–20, 2012. [46] Andreas Moser, Christopher Kruegel, and Engin Kirda. Limits of static analysis for malware detection. In ACSAC, pages 421–430. IEEE Computer Society, 2007. [47] Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, 1993. [48] Vaibhav Rastogi, Yan Chen, and Xuxian Jiang. Droidchameleon: Evaluating android anti-malware against transformation attacks. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, ASIA CCS ’13, pages 329–334, New York, NY, USA, 2013. ACM. [49] Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero, Pablo Garcia Bringas, and Gonzalo Álvarez Marañón. PUMA: Permission Usage to Detect Malware in Android. In CISIS/ICEUTE/SOCO Special Sessions, pages 289–298, 2012. [50] Borja Sanz, Igor Santos, Xabier Ugarte-Pedrero, Carlos Laorden, Javier Nieves, and Pablo Garcia Bringas. Anomaly Detection Using String Analysis for Android Malware Detection. In SOCO-CISIS-ICEUTE, pages 469–478, 2013. [51] Sophos. Security Threat Report 2014. Technical report, Sophos, 2014. [52] Marc Sumner, Eibe Frank, and Mark Hall. Speeding up Logistic Model Tree Induction. In 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 675–683. Springer, 2005. [53] Symantec. Mobile Adware and Malware Analysis. Technical report, Symantec, 2013. [54] F. Tchakounté and P. Dayang. System Calls Analysis of Malwares on Android. In International Journal of Science and Technology, pages 669–674, 2013. [55] Ian H. Witten, Eibe Frank, and Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Elsevier, 2011. [56] Mike Wolfson. Android Developer Tools Essentials. O’Reilly Media, 2013. [57] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. Top 10 Algorithms in Data Mining. Knowl. Inf. Syst., 14(1):1–37, December 2007. [58] Min Zhao, Tao Zhang, Fangbin Ge, and Zhijian Yuan. RobotDroid: A Lightweight Malware Detection Framework On Smartphones. JNW, 7(4):715–722, 2012. [59] Yajin Zhou and Xuxian Jiang. Dissecting Android Malware: Characterization and Evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP ’12, pages 95–109, Washington, DC, USA, 2012. IEEE Computer Society. [60] Jiawei Zhu, Zhi Guan, Yang Yang, Liangwen Yu, Huiping Sun, and Zhong Chen. Permission-based Abnormal Application Detection for Android. In Proceedings of the 14th International Conference on Information and Communications Security, ICICS’12, pages 228–239, Berlin, Heidelberg, 2012. Springer-Verlag.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0621114-120229.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS