國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以協同過濾輔助內容分析之文件推薦系統,A Content via Collaboration Approach to Text Filtering Recommender Systems

論文名稱 Title	以協同過濾輔助內容分析之文件推薦系統 A Content via Collaboration Approach to Text Filtering Recommender Systems
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	94 學年度第 2 學期 The spring semester of Academic Year 94	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	53
研究生 Author	黃信傑 Hsin-Chieh Huang
指導教授 Advisor	張德民 Te-Min Chang
召集委員 Convenor	蕭文峰 Wen-Feng Hsiao
口試委員 Advisory Committee	孫培真 Pei-Chen Sun
口試日期 Date of Exam	2006-07-13	繳交日期 Date of Submission	2006-08-01
關鍵字 Keywords	潛在語意索引、內容為主的過濾、推薦系統、協同過濾 recommender systems, collaborative filtering, content-based filtering, LSI
統計 Statistics	本論文已被瀏覽 5863 次，被下載 2045 次 The thesis/dissertation has been browsed 5863 times, has been downloaded 2045 times.

中文摘要
隨著網際網路及電子商務的興起，大量的資訊充斥於網路上。面對這些資訊，使用者需要適當的工具來處理資訊的超載。就像我們每天處理決策過程會依賴推薦的行為，線上使用者也可以藉由其它有共同興趣使用者的推薦或是依循自己過去喜好的推薦而更快速、更準確地找尋所需的資訊。傳統的推薦系統可分為協同過濾和內容為主過濾這兩種方法，但由於各有各的缺點，推薦系統便走向混合式的方式，希望能在保留自己的優點時也能解決各自的問題。所以本研究的目的在於提出一個混合式的文件推薦方法，結合其它有共同興趣使用者的喜好與使用者原本的喜好一起做推薦。本研究分為兩階段，第一階段將使用者原本的喜好藉由協同過濾來拓展使用者的喜好，在第二階段則是從拓展的喜好來建立使用者對文件字詞的喜好，再利用潛在語意索引提高推薦結果的準確率。本研究提出兩個實驗來驗證，實驗的目的是比較本研究所提方法與其它二種推薦方法的表現。實驗的結果顯示，我們提出的方法能夠區別使用者不同喜好的程度，既可以推薦使用者喜歡的文件，也可以避免推薦使用者不喜歡的文件。這樣的特性使得本研究所提方法在實務上更具實用性。
Abstract
Ever since the rapid growth of the Internet, recommender systems have become essential in helping online users to search and retrieve relevant information they need. Just like the situation that people rely heavily on recommendation in their daily decision making processes, online users may identify desired documents more effectively and efficiently through recommendation of other users who exhibit similar interests, and/or through extracting crucial features of the users’ past preferences. Typical recommendation approaches can be classified into collaborative filtering and content-based filtering. Both approaches, however, have their own drawbacks. The purpose of this research is thus to propose a hybrid approach for text recommendations. We combine collaborative input and document content to facilitate the creation of extended content-based user profiles. These profiles are then rearranged with the technique of latent semantic indexing. Two experiments are conducted to verify our proposed approach. The objective of these experiments is to compare the recommendation results from our proposed approach with those from the other two approaches. The results show that our approach is capable of distinguishing different degrees of document preference, and makes appropriate recommendation to users or does not make recommendation to users for uninterested documents. The application of our proposed approach is justified accordingly.

目次 Table of Contents
CHAPTER 1 Introduction......................................................................................................1 1.1 Overview......................................................................................................................1 1.2 Objective of the research.............................................................................................2 1.3 Organization of the Thesis...........................................................................................2 CHAPTER 2 Literature Review.............................................................................................4 2.1 Information Retrieval...................................................................................................4 Vector space models...................................................................................................4 Latent semantic indexing...........................................................................................5 2.2 Text mining..................................................................................................................6 Novelty Detection......................................................................................................7 Concept Extraction.....................................................................................................8 2.3 Content-Based Filtering...............................................................................................8 Content limitation....................................................................................................10 Over-specialization..................................................................................................10 2.4 Collaborative Filtering...............................................................................................10 User-based collaborative filtering............................................................................11 Item-based collaborative filtering............................................................................12 First-rater Problem...................................................................................................13 Sparsity....................................................................................................................13 Other Issues..............................................................................................................14 2.5 Hybrid Filtering Approaches......................................................................................14 CHAPTER 3 Proposed Approach........................................................................................16 3.1 Stage 1: Item-based CF..............................................................................................18 Step 1: Building Item-to-Item Similarity Matrix.....................................................19 Step 2: Generating top-N Recommendation List.....................................................21 Step 3: Adding top-N Recommendation to Original Ratings..................................22 3.2 Stage 2: Collaborative-Incorporated Content-based Filtering...................................22 Step 1: Building profile-construction matrix...........................................................23 Step 2: Creating content-based user profiles...........................................................24 Step 3: Applying LSI...............................................................................................26 Step 4: Determining relevance of new documents..................................................27 CHAPTER 4 Experiments and Results...............................................................................28 4.1 Dataset Descriptions..................................................................................................28 4.2 Experimental Design..................................................................................................30 4.3 Experiment I...............................................................................................................31 4.4 Experiment II.............................................................................................................34 CHAPTER 5 Conclusions.....................................................................................................40 5.1 Concluding remarks...................................................................................................40 5.2 Future Work...............................................................................................................41 References..............................................................................................................................43

參考文獻 References
林子翔, 以漸進式方法探究網際網路中資訊涵義之研究,中山大學資訊管理研究所碩士論文,民94 Adomavicius, G and Tuzhilin, A “Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions." IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, June 2005 Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y., “Topic detection and tracking pilot study: Final report,” In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998 Baeza-Yates, R and Ribeiro-Neto, B, Modern Information Retrieval. Addison-Wesley, 1999. Balabanovic, M and Shoham, Y, “Fab: Content-Based, Collaborative Recommendation,” Comm. ACM, vol. 40, no. 3, pp. 66-72, 1997. Chen, H., Lynch, K. J. Automatic Construction of Networks of Concepts Characterizing Document Database. IEEE Transaction on Systems, Man and Cybernetics, Vol. 22 No. 5, 1992, pp. 885-902 Deerwester, Scott; Dumais, Susan T.; Furnas, George W.; Landauer, Thomas K. and Harshman, Richard, “Indexing by latent semantic indexing,” Journal of the American Society for Information Science, Vol. 41, No. 6, 1990 Deshpande, M and Karypis, G, “Item-Based Top-N Recommendation Algorithms,” ACM Trans. Information Systems, vol. 22, no. 1, pp. 143-177, 2004 Halliday, M. A.K. and Hansan, R., Cohesion in English, Longman, 1976 Golub, G. and Van Loan, C., “Matrix Computations. Johns-Hopkins,” Baltimore, Maryland, second edition, 1989. Grobelnik, M., and Mladenic, D., Natasa Milic-Frayling, “Text Mining as Integration of Several Related Research Areas: Report on KDD'2000 Workshop on Text Mining,” SIGKDD Explorations, Vol. 2, No. 2 , 2000, pp. 99-102 Kontostathis, April and William M. Pottenger, “Detecting Patterns in the LSI Term-Term Matrix,” Workshop on the Foundation of Data Mining and Discovery, The 2002 IEEE International Conference on Data Mining, 2002, pp.243-248 Melville, P., Mooney, R. J., and Nagarajan, R., “Content-Boosted Collaborative Filtering for Improved Recommendations,” Proc. 18th Nat’l Conf. Artificial Intelligence, 2002. 43 Morris, J. and Hirst, G., “Lexical Cohesion Computed by Thesaural Relations as Indicator of the Structure of Text,” Computational Linguistics, Vol. 17, No. 1, 1991, pp 21-48 Ohsawa, Y., “The Scope of Chance Discovery,” New Frontiers in Artificial Intelligence: Joint JSAI 2001 Workshop Post-Proceedings, 2001, pp 413 Pazzani, M, “A Framework for Collaborative, Content-Based, and Demographic Filtering, Artificial Intelligence Rev., pp. 393-408, Dec. 1999. Pazzani, M and Billsus, D, “Learning and Revising User Profiles:The Identification of Interesting Web Sites,” Machine Learning, vol. 27, pp. 313-331, 1997. Salton, G, Wong, A, and Yang, C. S., “A vector space model for automatic indexing,” Communications of the ACM. Vol.18, 1975. Sarwar, B, Karypis, G, Konstan, J, and Riedl, J, “Application of Dimensionality Reduction in Recommender Systems—A Case Study,” Proc. ACM WebKDD Workshop, 2000. Shardanand, U and Maes, P, “Social Information Filtering:Algorithms for Automating ‘Word of Mouth’,” Proc. Conf. Human Factors in Computing Systems, 1995. Soboroff, I and Nicholas, C, “Combining Content and Collaboration in Text Filtering,” Proc. Int’l Joint Conf. Artificial Intelligence Workshop: Machine Learning for Information Filtering, Aug. 1999. Zhang, Zhenxue and Zhang, Dongsong. What Will You Like? Ask People Who Are Like You: Past and Future Research on Collaborative Filtering in Recommender Systems. The Fourth Workshop on e-Business, Dec.10, 2005. Las Vegas, Nevada.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內立即公開，校外一年後公開 off campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0801106-224333.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS