國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,整合主題模型與合著網路進行學術文獻的推薦,Integrating Topic Model into Co-authorship Network for Recommending Academic Literature

論文名稱 Title	整合主題模型與合著網路進行學術文獻的推薦 Integrating Topic Model into Co-authorship Network for Recommending Academic Literature
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	102 學年度第 2 學期 The spring semester of Academic Year 102	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	68
研究生 Author	陳裕翔 Yu-Siang Chen
指導教授 Advisor	黃三益 San-Yia Hwang
召集委員 Convenor	江祥立 Hsiang-Li Chiang
口試委員 Advisory Committee	范俊逸 Chun-I Fan
口試日期 Date of Exam	2012-07-02	繳交日期 Date of Submission	2014-06-26
關鍵字 Keywords	合著網路、推薦系統、主題模型、潛藏狄利克里分配、學術文獻 recommender system, topic model, latent Dirichlet allocation, coauthorship network, academic literature
統計 Statistics	本論文已被瀏覽 6024 次，被下載 399 次 The thesis/dissertation has been browsed 6024 times, has been downloaded 399 times.

中文摘要
許多文獻資料庫系統使用內容導向技術(content-based)擷取文章給使用者，內容導向技術是根據使用者提供的關鍵字來搜尋文章。另一方面，許多的推薦系統技術根據使用者的長期瀏覽或交易歷史記錄來推薦符合使用者長期的愛好，然而在文獻資料庫系統，通常只擁有使用者短期的愛好並且感興趣的文章通常數量不多。在過去研究已經使用，例如：文章內容、使用者記錄檔與合著網路(coauthorship network)，推薦文章給使用者以滿足短期的愛好。在本研究，我們整合學者之間所合作文章的主題資訊至合作網路來擴展整個共同作者網路。更具體地說明，我們提出以潛藏狄利克里分配為基礎的合著網路 (LDA-coauthorship-network-based)，此技術使用潛藏狄利克里分配(latent Dirichlet allocation, LDA)與任務導向(task-focused)技術做為文獻的推薦技術。實驗結果顯示我們的方法比傳統的合著網路在所有的實驗環境都更有效，與內容導向技術相比，當每個任務檔(task profile)包含的內容相似度非常相近時，我們的方法比內容導向技術好，但任務檔的內容相似度低時，我們的方法結果較差。因此我們進一步發展一套混合方法，可自動切換內容導向與潛藏狄利克里分配為基礎的合著網路。此方法可根據任務檔中內容相似程度來進行切換至最適合的方法。實驗結果顯示了混合方法在所有實驗環境都表現最優。
Abstract
Most literature database systems use content-based technique to retrieve articles to users. However, the content-based technique relies on exact keywords provided by users to search for articles the users are interested in. On the other hand, most recommender system techniques are based on user’s long-term browsing/transaction history so as to recommend items that meet users’ long term interest. However, in literature database system, users’ information need is often short-term. Previous works in recommending articles to satisfy users’ short-term interest have utilized article content, usage log, and coauthorship network. In this study, we extend coauthorship network method and incorporate scholars’ collaboration topics into the coauthorship network. Specifically, we propose a LDA-coauthorship-network-based technique that integrates topic information into links of the coauthorship network using latent Dirichlet allocation (LDA), and a task-focused (short-term) technique is proposed for recommending literature articles. Experimental results show that the proposed approach is more effective than the traditional coauthorship network method under all operating regions. When compared to the content-based technique, it has better performance when each task profile contains articles that are similar in their content but is less effective otherwise. We further develop a hybrid method that switches between content-based technique and LDA-coauthorship-network-based technique based on the content coherence of a task profile. Experimental results show that the hybrid method outperforms all the other methods under all operating regions.

目次 Table of Contents
論文審定書 i 致謝 ii Abstract iii 中文摘要 v CHAPTER 1 – Introduction 1 1.1 Background 1 1.2 Motivation 1 1.3 Thesis Organization 4 CHAPTER 2 – Literature Review 5 2.1 Recommender Systems 5 2.1.1 Content-Based Recommendation 6 2.1.2 Collaborative Recommendation 8 2.2 Social Network Analysis 9 2.3 Topic Model 10 2.3.1 Latent Dirichlet Allocation 11 2.3.2 Author-Topic Model 13 2.4 Social Network-Based Recommendation 13 CHAPTER 3 – The Approach 16 3.1 Architecture 16 3.2 Topic Model Construction 18 3.3 Constructing LDA Coauthorship Network 19 3.3.1 Definition 19 3.3.2 Extending Authors for Each Article 21 CHAPTER 4 – Evaluations 27 4.1 Data Collection 27 4.1.1 Training Data 27 4.1.2 Testing Data 28 4.2 Performance Benchmarks 29 4.2.1 Content-Based 29 4.2.3 LDA-Coauthoship 31 4.3 Evaluation Design 32 4.3.1 Evaluation Scenarios 32 4.3.2 Performance Metric 32 4.4 Preliminary Experiment 33 4.4.1 Selection of Threshold 33 4.4.2 Selection of The number of Topics in LDA 38 4.5 Experiments and Results 42 4.5.1 The Original Methods 42 4.5.2 The Hybrid Methods 45 4.5.3 Increase of Recommended Articles 49 CHAPTER 5 – Conclusions 53 References 55

參考文獻 References
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks, 25(3), 211-230. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749. Balabanović, M., & Shoham, Y. (1997). Fab: content-based, collaborative recommendation. Communications of the ACM, 40(3), 66-72. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022. Boyd-Graber, J., Chang, J., Gerrish, S., Wang, C., & Blei, D. (2009). Reading tea leaves: How humans interpret topic models. Paper presented at the Advances in Neural Information Processing Systems (NIPS). Counts, S., & Geraci, J. (2005). Incorporating physical co-presence at events into digital social networking. Paper presented at the CHI '05 extended abstracts on Human factors in computing systems. Davis, G. F., Yoo, M., & Baker, W. E. (2003). The small world of the American corporate elite, 1982-2001. Strategic organization, 1(3), 301-326. Domingos, P., & Richardson, M. (2001). Mining the network value of customers. Paper presented at the Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1), 5228. Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 22(1), 5-53. Hwang, S. Y., Wei, C. P., Huang, Y., & Tang, Y. (2010). Combining Coauthorship Network and Content for Literature Recommendation. Proc. Of Pacific-Asia Conference on Information Systems (PACIS2010). Hwang, S. Y., Wei, C. P., & Liao, Y. F. (2010). Coauthorship networks and academic literature recommendation. Electronic Commerce Research and Applications, 9(4), 323-334. Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. Internet Computing, IEEE, 7(1), 76-80. Liu, X., Bollen, J., Nelson, M. L., & Van de Sompel, H. (2005). Co-authorship networks in the digital library research community. Information Processing & Management, 41(6), 1462-1480. Lynch, C. (2001). Personalization and recommender systems in the larger context: New directions and research questions. Paper presented at the Second DELOS Network of Excellence Workshop on Personalisation and Recommender Systems in Digital Libraries. Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2004). Finding social network for trust calculation. Paper presented at the Proceedings of the 16th European Conference on Artificial Intelligence. Miller, B. N., Albert, I., Lam, S. K., Konstan, J. A., & Riedl, J. (2003). MovieLens unplugged: experiences with an occasionally connected recommender system. Paper presented at the Proceedings of the 8th international conference on Intelligent user interfaces. Mobasher, B., Dai, H., Luo, T., & Nakagawa, M. (2001). Effective personalization based on association rule discovery from web usage data. Paper presented at the In Proceedings of WIDM 2001. Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2), 404. Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS), 28(1), 4. Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. Paper presented at the Proceedings of the 20th conference on Uncertainty in artificial intelligence. Shen, X., Tan, B., & Zhai, C. (2005). Context-sensitive information retrieval using implicit feedback. Paper presented at the Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. Wei, C. P., Shaw, M. J., & Easley, R. F. (2002). Recommendation Systems in Electronic Commerce. E-Service: new directions in theory and practice, 168. Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. Paper presented at the Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Paper presented at the International Conference on Machine Learning (ICML). Yoshikane, F., & Kageura, K. (2004). Comparative analysis of coauthorship networks of different domains: The growth and change of networks. Scientometrics, 60(3), 435-446. Zacharia, G., Moukas, A., & Maes, P. (2000). Collaborative reputation mechanisms for electronic marketplaces. Decision Support Systems, 29(4), 371-388.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0526114-172606.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS