國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,處理資料可靠性與次序尺度之模式型協同推薦 ,A Model-based Collaborative Filtering Approach to Handling Data Reliability and Ordinal Data Scale

論文名稱 Title	處理資料可靠性與次序尺度之模式型協同推薦 A Model-based Collaborative Filtering Approach to Handling Data Reliability and Ordinal Data Scale
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	98 學年度第 2 學期 The spring semester of Academic Year 98	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	51
研究生 Author	曾詩惠 Shih-hui Tseng
指導教授 Advisor	張德民 Te-Min Chang
召集委員 Convenor	黃三益 San-Yih Hwang
口試委員 Advisory Committee	蕭文峰, 陳建錦 Wen-Feng Hsiao; Chien-Chin Chen
口試日期 Date of Exam	2010-07-27	繳交日期 Date of Submission	2010-08-16
關鍵字 Keywords	協同過濾、推薦系統、以模型為基礎之協同過濾、資料可靠度、資料尺度 data reliability, recommender system, model-based CF, data scale, collaborative filtering
統計 Statistics	本論文已被瀏覽 5836 次，被下載 6 次 The thesis/dissertation has been browsed 5836 times, has been downloaded 6 times.

中文摘要
伴隨網際網路的快速成長，使得資訊容易取得，但大量的資料出現也造成人們想要搜尋及獲取所需資料時，遇到資訊過載的問題。資訊擷取與資訊過濾的相關技術被發展來輔助我們的閱讀及理解能力。採用資訊過濾技術的推薦系統隨之興起，適用於使用者的需求不明確而無法以關鍵字表示的時候。協同過濾技術常被用於推薦系統中，其技術為利用與目標使用者有相似興趣的其它使用者的意見來做出推薦。其中一種協同過濾技術是以模型為基礎，可以將使用者過去的意見建立學習模型，並利用建立的模型進行推薦預測。然而以模型為基礎之協同過濾要考慮二個問題：其一是資料的可靠度(是否含雜訊、冗贅的資料)會影響其預測結果；其二是目前大部分的模型視資料輸出為名義尺度，而忽略了評比資料是次序尺度。因此本研究的目的是提出增進資料可靠度及考慮資料尺度的協同過濾模型，期望能得到較佳的推薦結果。我們提出三個實驗來驗證比較。實驗結果顯示，我們所提的方法有不錯的績效表現，特別是在稀疏度不太高或是大型資料集的時候。這些結果也因此驗證我們所提方法於實際應用的可行性。
Abstract
Accompanying with the Internet growth explosion, more and more information disseminates on the Web. The large amount of information, however, causes the information overload problem that disturbs users who desire to search and find useful information online. Information retrieval and information filtering arise to compensate for the searching and comprehending ability of the users. Recommender systems as one of the information filtering techniques emerge when users cannot describe their requirements precisely as keywords. Collaborative filtering (CF) compares novel information with common interests shared by a group of people to make the recommendations. One of its methods, the Model-based CF, generates predicted recommendation based on the model learned from the past opinions of the users. However, two issues on model-based CF should be addressed. First, data quality of the rating matrix input can affect the prediction performance. Second, most current models treat the data class as the nominal scale instead of ordinal nature in ratings. The objective of this research is thus to propose a model-based CF algorithm that considers data reliability and data scale in the model. Three experiments are conducted accordingly, and the results show our proposed method outperforms other counterparts especially under data of mild sparsity degree and of large scale. These results justify the feasibility of our proposed method in real applications.

目次 Table of Contents
CHAPTER 1 Introduction 1 1.1 Overview 1 1.2 Objective of the research 3 1.3 Organization of the research 3 CHAPTER 2 Literature Review 4 2.1 Recommender systems 4 Content-based methods 4 Collaborative methods 5 Hybrid methods 7 2.2 Collaborative Filtering 7 2.3 Singular Vector Decomposition Analysis 10 2.3.1 SVD Factorization 11 2.3.2 SVD with generalized Hebbian learning rule 13 2.4 Support Vector Machine 14 CHAPTER 3 Proposed Approach 19 Step 1: Data processing with Hebbian learning SVD 20 Step 2: Building the SVOR_based model 21 Step 3: Predicting the ratings 24 CHAPTER 4 Experiments and Results 25 4.1 Experimental Design 25 Dataset Descriptions 25 Objectives of the Experiments 27 Performance Measures 27 Evaluation Scheme 27 Platform 28 4.1 Experiment I 28 4.2 Experiment II 31 4.3 Experiment III 34 CHAPTER 5 Conclusions 36 5.1 Concluding remarks 36 5.2 Future work 37 REFERENCE 39

參考文獻 References
Adomavicius, G., and Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), pp. 734-749. Al Mamunur Rashid, S. K. L., Karypis, G., and Riedl, J. (2006). ClustKNN: A highly scalable hybrid model and memory-based CF algorithm. Proc. of WebKDD 2006: KDD Workshop on Web Mining and Web Usage Analysis, in Conjunction with the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23 2006, Philadelphia, PA, Anand, D., and Bharadwaj, K. (2010). Enhancing accuracy of recommender system through adaptive similarity measures based on hybrid features. Intelligent Information and Database Systems, pp. 1-10. Balabanović, M., and Shoham, Y. (1997). Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3), pp. 72. Basu, C., Hirsh, H., and Cohen, W. (1998). Recommendation as classification: Using social and content-based information in recommendation. Proceedings of the National Conference on Artificial Intelligence, pp. 714-720. Berry, M. W., Dumais, S. T., and O'Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), pp. 573-595. Billsus, D., and Pazzani, M. J.(1998). Learning collaborative information filters. Proceedings of the Fifteenth International Conference on Machine Learning,54,pp. 47. Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 461 Cherkassky, V., and Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17(1), pp. 113-126. Chu, W., and Keerthi, S. S. (2007). Support vector ordinal regression. Neural Computation, 19(3), pp. 792-815. de Campos, L. M., Fernandez-Luna, J. M., Huete, J. F., and Rueda-Morales, M. A. (2010). Combining content-based and collaborative recommendations: A hybrid approach based on bayesian networks. International Journal of Approximate Reasoning, Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), pp. 391-407. Goldberg, D., Nichols, D., Oki, B. M., and Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12), pp. 70. Gorrell, G. (2006). Generalized hebbian algorithm for incremental singular value decomposition in natural language processing. Proceedings of 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, pp. 97–104. Kim, T. H., Ryu, Y. S., Park, S. I., and Yang, S. B. (2002). An improved recommendation algorithm in collaborative filtering. Lecture Notes in Computer Science, pp. 254-261. Ling, C. X., and Yan, R. J. (2003). Decision tree with better ranking. MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE, 20(2), pp. 480. Maltz, D., and Ehrlich, K. (1995). Pointing the way: Active collaborative filtering. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 202-209. McCallumzy, A., and Nigamy, K. (1998). A comparison of event models for naive bayes text classification. Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, pp. 41-48, AAAI Press. Mehta, B., Hofmann, T., and Nejdl, W. (2007). Robust collaborative filtering. Proceedings of the 2007 ACM Conference on Recommender Systems, pp. 56. Merkl, D., and Rauber, A. (2000). Document classification with unsupervised artificial neural networks. Soft Computing in Information Retrieval: Techniques and Applications, 50, pp. 102-121. Miha, G., Dunja, M., Blaž, F., and Marko, G. (2006). Data sparsity issues in the collaborative filtering framework. Advances in Web Mining and Web Usage Analysis, 4198, pp. 58-76 Mobasher, B., Burke, R., and Sandvig, J. J. (2006). Model-based collaborative filtering as a defense against profile injection attacks. Proceedings of the National Conference on Artificial Intelligence, 21(2), pp.1388. Paterek, A. (2007). Improving regularized singular value decomposition for collaborative filtering. Proc. KDD Cup and Workshop. Pazzani, M. J. (1999). A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 13(5), pp. 393-408. Pennock, D. M., Horvitz, E., Lawrence, S., and Giles, C. L. (2000). Collaborative filtering by personality diagnosis: A hybrid memory-and model-based approach. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 473-480. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. (1994). GroupLens: An open architecture for collaborative filtering of netnews. Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175-186. Sandvig, J. J., Mobasher, B., and Burke, R. (2008). A survey of collaborative recommendation and the robustness of model-based algorithms. IEEE Data Engineering Bulletin, 31(2), pp. 3-13. Sanger, T. D. (1989). Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 2(6), pp. 459-473. Sarwar, B. M., Karypis, G., Konstan, J. A., and Riedl, J. (2000). Application of dimensionality reduction in recommender system–a case study. ACM WebKDD 2000 Web Mining for E-Commerce Workshop Shashua, A., and Levin, A. (2003). Ranking with large margin principle: Two approaches. Advances in Neural Information Processing Systems, pp. 961-968. Soboroff, I. M., and Nicholas, C. K. (1999). Combining content and collaboration in text filtering. Proceedings of the IJCAI Workshop on Machine Learning in Information Filtering, 86, pp. 91. Sun, X., Wang, H., and Li, J. (2010). Satisfying privacy requirements: One step before anonymization. Advances in Knowledge Discovery and Data Mining, pp. 181-188. Ungar, L. H., and Foster, D. P. (1998). Clustering methods for collaborative filtering. AAAI Workshop on Recommendation Systems, pp. 112–125. Vapnik, V. N. (1995). The nature of statistical learning theory. Webb, B. (2006) Netflix update: Try this at home. Simon Funk’s personal blog. http://sifter.org/~simon/journal/20061211.html Xu, L., and Schuurmans, D. (2005). Unsupervised and semi-supervised multi-class support vector machines. Proceedings of the National Conference on Artificial Intelligence, 20(2), pp. 904. Zhang, S., Wang, W., Ford, J., Makedon, F., and Pearlman, J. (2005). Using singular value decomposition approximation for collaborative filtering. Proc. of the 7th IEEE Conf. on E-Commerce, pp. 257-264.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內一年後公開，校外永不公開 campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.15.190.144 論文開放下載的時間是校外不公開 Your IP address is 3.15.190.144 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS