Responsive image
博碩士論文 etd-0813112-201905 詳細資訊
Title page for etd-0813112-201905
論文名稱
Title
利用Latent Dirichlet Allocation之個人化文章推薦
Personalized Document Recommendation by Latent Dirichlet Allocation
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
77
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-07-31
繳交日期
Date of Submission
2012-08-13
關鍵字
Keywords
推薦系統、內容式過濾、協同過濾、潛在主題分析、潛在狄利克里分配
recommender systems, collaborative filtering, hidden topic analysis, latent Dirichlet allocation, content-based filtering
統計
Statistics
本論文已被瀏覽 5907 次,被下載 1343
The thesis/dissertation has been browsed 5907 times, has been downloaded 1343 times.
中文摘要
由於網際網路的出現與快速的成長,越來越多的使用者透過此新媒介來取得、分享資訊。但是由於使用者對於資訊的處理能力是有限的,大量的資訊造成資訊過載的問題。因此推薦系統隨之興起,當使用者的需求不明確而無法確切的表達其需求時,推薦系統即可用來幫助使用者獲得其所需的資訊。
在推薦系統中,資訊過濾的方法主要可以分為兩種:內容式過濾以及協同過濾。雖然文獻中指出協同過濾表現較優於內容式過濾,但個人化文件推薦又易偏向採用內容式過濾,因其文件性質所致。但從另一角度思考,這樣的推薦工作恰巧提供一個很好的機會發展混合式過濾方法,能將二者去蕪存菁,得到更好的推薦結果。
因此本研究的目的在於提出一個混合式過濾方法進行個人化文件推薦。其中,我們應用了潛在狄利克里分配 (latent dirichlet allocation, LDA) 模式找出文件潛在主題分佈,並利用該結果結合協同過濾計算文件相似度,或是結合內容式過濾探究使用者輪廓。我們隨即進行兩個實驗來驗證所提方法,實驗結果顯示我們所提方法有不錯的績效表現,亦優於傳統使用者協同過濾與物件協同過濾。這些結果也因此驗證了所提的方法在實際應用上的可行性。
Abstract
Accompanying with the rapid growth of Internet, people around the world can easily distribute, browse, and share as much information as possible through the Internet. The enormous amount of information, however, causes the information overload problem that is beyond users’ limited information processing ability. Therefore, recommender systems arise to help users to look for useful information when they cannot describe the requirements precisely.
The filtering techniques in recommender systems can be divided into content-based filtering (CBF) and collaborative filtering (CF). Although CF is shown to be superior over CBF in literature, personalized document recommendation relies more on CBF simply because of its text content in nature. Nevertheless, document recommendation task provides a good chance to integrate both techniques into a hybrid one, and enhance the overall recommendation performance.
The objective of this research is thus to propose a hybrid filtering approach for personalized document recommendation. Particularly, latent Dirichlet allocation to uncover latent semantic structure in documents is incorporated to help us to either obtain robust document similarity in CF, or explore user profiles in CBF. Two experiments are conducted accordingly. The results show that our proposed approach outperforms other counterparts on the recommendation performance, which justifies the feasibility of our proposed approach in real applications.
目次 Table of Contents
論文審定書 i
誌謝 ii
中文摘要 iii
英文摘要 iv
CHAPTER 1 Introduction 1
1.1 Overview 1
1.2 Objective of the research 4
1.3 Organization of the research 5
CHAPTER 2 Literature Review 6
2.1 Filtering Approach 6
2.1.1 Content-based Filtering 7
2.1.2 Collaborative Filtering 8
2.1.3 Hybrid Filtering 15
2.2 Hidden Topic analysis model 16
2.2.1 Latent Semantic Analysis 17
2.2.2 Probabilistic Latent Semantic Analysis 19
2.2.3 Latent Dirichlet Allocation 21
CHAPTER 3 Proposed Approach 30
3.1 Semantic–based Collaborative Filtering 31
Step 1: Building LDA Model 32
Step 2: Measuring Similarity between Documents 35
Step 3: Expanding active user’s preferences 36
Step 4: Predicting Top-N Recommendation 39
3.2 Collaborative-based Profile Filtering 40
Step 2: Discovering Topic Preferences in User Profiles 41
Step 3: Measuring Similarity between Document and User Profile 43
CHAPTER 4 Experiments and Results 45
4.1 Experimental Design 45
Data Collection 45
Objective of the Experiments 47
Performance Measures 47
Evaluation Scheme 48
4.2 Experiment I 49
4.3 Experiment II 55
CHAPTER 5 Conclusions 60
5.1 Concluding remarks 60
5.2 Future work 62
REFERENCE 64
參考文獻 References
Adomavicius, G., & Tuzhilin, A. (2005). Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Trans. on Knowl. and Data Eng., 17(6), 734-749.
Ansari, A., Essegaier, S., & Kohli, R. (2000). Internet Recommendation Systems. Journal of Marketing Research, 37(3), 363-375.
Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2009). On smoothing and inference for topic models. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada. 27-34
Biro, I., Siklosi, D., Szabo, J., & Benczur, A. A. (2009). Linked latent Dirichlet allocation in web spam filtering. Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, Madrid, Spain. 37-40
Balabanović, M., & Shoham, Y. (1997). Fab: content-based, collaborative recommendation. Commun. ACM, 40(3), 66-72.
Basu, C., Hirsh, H., & Cohen, W. (1998). Recommendation as classification: using social and content-based information in recommendation. Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, Madison, Wisconsin, United States. 714-720
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4-5), 993-1022.
Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, Madison, Wisconsin. 43-52
Carpenter, Bob. (2010). Collapsed Gibbs sampling for LDA and Bayesian naive Bayes. LingPipe Blog
Darling, W. M. (2011). A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 642-647.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407.
Foltz, P., & Dumais, S. (1992). Personalized Information Delivery: An Analysis of Information Filtering Methods. Commun. ACM, 35(12), 51-60.
Geman, S., & Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-6(6), 721-741.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(1), 5228-5235.
Hofmann, T. (2001). Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42(1), 177-196.
Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst., 22(1), 89-115.
Ian, M. S., & Charles, K. N. (1999). Combining Content and Collaboration in Text Filtering. Proceedings of the IJCAI Workshop on Machine Learning in information Filtering, 86-91.
Kakkonen, T., Myller, N., Sutinen, E., & Timonen, J. (2008). Comparison of Dimension Reduction Methods for Automated Essay Grading. Educational Technology & Society, 11(3), 275-288.
Karypis, G. (2001). Evaluation of Item-Based Top-N Recommendation Algorithms. Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA. 247-254
Krestel, Fankhauser, R. a., Nejdl, P. a., & Wolfgang. (2009). Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems, New York, New York, USA. 61-68
Misra, H., Yvon, F., Cappe, O., & Jose, J. (2011). Text segmentation: A topic modeling perspective. Information Processing & Management, 47(4), 528-544.
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 39(2), 103-134.
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). GroupLens: an open architecture for collaborative filtering of netnews. Proceedings of the 1994 ACM conference on Computer supported cooperative work, Chapel Hill, North Carolina, United States. 175-186
Salton, G., & McGill, M. J. (1986). Introduction to Modern Information Retrieval: McGraw-Hill, Inc.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce. Proceedings of the 2nd ACM conference on Electronic commerce, Minneapolis, Minnesota, United States. 158 - 167
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. Proceedings of the 10th international conference on World Wide Web, Hong Kong, Hong Kong. 285-295
Su, X., & Khoshgoftaar, T. M. (2009). A Survey of Collaborative Filtering Techniques. Advances in Artificial Intelligence, 2009, 1-19.
Xing, D., & Girolami, M. (2007). Employing Latent Dirichlet Allocation for fraud detection in telecommunications. Pattern Recognition Letters, 28(13), 1727-1734.
Zelikovitz, S. (2004). Transductive LSI for Short Text Classification Problems. American Association for Artificial Intelligence.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code