國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個查詢相關的資訊擷取評等方法 ,A Query Dependent Ranking Approach for Information Retrieval

論文名稱 Title	一個查詢相關的資訊擷取評等方法 A Query Dependent Ranking Approach for Information Retrieval
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	97 學年度第 2 學期 The spring semester of Academic Year 97	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	64
研究生 Author	李連旺 Lian-Wang Lee
指導教授 Advisor	李錫智 Shie-Jue Lee
召集委員 Convenor	錢炳全 Been-Chian Chien
口試委員 Advisory Committee	李健興, 潘欣泰, 潘正祥 Chang-Shing Lee; Shing-Tai Pan; Jeng-Shyang Pan
口試日期 Date of Exam	2009-07-24	繳交日期 Date of Submission	2009-08-28
關鍵字 Keywords	模型結合、查詢相似度、查詢相關的排序、排序學習、資訊擷取、排序模型 information retrieval, Ranking model, model combination, query similarity, learning to rank, query dependent ranking
統計 Statistics	本論文已被瀏覽 5787 次，被下載 0 次 The thesis/dissertation has been browsed 5787 times, has been downloaded 0 times.

中文摘要
建立排序模型在資訊擷取領域上是一個很重要的議題。最近幾年，基於排序學習的想法，許多關於這個主題的方法被提出，而大多數方法企圖利用單個函數，希望可以通用所有查詢並給每篇文件一個分數。在本篇論文裡，我們提出一個新的查詢相關排序架構，將每個訓練查詢和它對應的文件，都分別建立各自的排序模型，當一個新的測試查詢被使用者要求，所擷取到的文件會根據與訓練查詢相似度，挑出一些較合適的排序模型，並且將它們作模型結合，經由這個組合的排序模型得到分數。而此機制也提供了結合模型的權重值。實驗結果可以證明查詢相關的排序方法具有不錯的效果，優於其他方法。
Abstract
Ranking model construction is an important topic in information retrieval. Recently, many approaches based on the idea of “learning to rank” have been proposed for this task and most of them attempt to score all documents of different queries by resorting to a single function. In this thesis, we propose a novel framework of query-dependent ranking. A simple similarity measure is used to calculate similarities between queries. An individual ranking model is constructed for each training query with corresponding documents. When a new query is asked, documents retrieved for the new query are ranked according to the scores determined by a ranking model which is combined from the models of similar training queries. A mechanism for determining combining weights is also provided. Experimental results show that this query dependent ranking approach is more effective than other approaches.

目次 Table of Contents
摘要 i Abstract ii 目錄 iii 圖目錄 v 表目錄 vi 第一章簡介 1 1.1 研究背景 1 1.2 問題定義 4 1.3 研究目的 5 1.4 論文架構 5 第二章文獻探討 7 2.1 傳統資料檢索方法 7 2.2 排序學習 10 2.3 支援向量機排序法（Ranking SVM） 14 2.4 評估工具 15 2.4.1 平均精確率（MAP） 15 2.4.2 正規化遞減累積獲益（NDCG） 17 第三章研究方法 19 3.1 研究動機 19 3.2 我們的方法（Query Dependent Ranking, QDR） 21 3.2.1 方法概述 21 3.2.2 排序模型的建立 23 3.2.3 查詢的表示 24 3.2.4 模型的選擇 28 3.2.5 模型的結合 30 第四章實驗結果與分析 32 4.1 實驗資料 32 4.2 特徵選取 34 4.3 結果與分析 39 第五章結論與未來展望 52 5.1 結論 52 5.2 未來研究方向 52 參考文獻 53

參考文獻 References
[1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, 1999. [2] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, “Learning to Rank Using Gradient Descent,” 22nd International Conference on Machine Learning, pages 89-96, 2005. [3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li, “Learning to Rank: From Parwise Approach to Listwise Approach,” 24th International Conference on Machine Learning, pages 129-136, 2007. [4] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An Efficient Boosting Algorithm for Combining Preferences,” Journal of Machine Learning Research, Vol. 4, pages 933-969, 2003. [5] X.-B. Geng, T.-Y. Liu, T. Qin, H. Li, and H.-Y. Shum, “Query-Dependent Ranking Using K-Nearest Neighbor,” 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 115-122, 2008. [6] R. Herbrich, T. Graepel, and K. Obermayer, “Large Margin Rank Boundaries for Ordinal Regression,” Advances in Large Margin Classifiers. MIT Press, 2000. [7] W. Hersh, C. Burkley, T. J. Leone, and D. Hickam, “Ohsumed: An Interactive Retrieval Evaluation and New Large Test Collection for Research,” 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 115-122, 2008. [8] K. Järvelin and J. Kekäläinen, “Cumulated Gain-based Evaluation of IR Techniques,” ACM Transactions on Information Systems, Vol. 20, No. 4, pages 422-446, 2002. [9] T. Joachims, “Optimizing Search Engines Using Clickthrough Data,” ACM Conference on Knowledge Discovery and Data Mining, pages 133-142, 2002. [10] J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," Journal of the ACM, Vol. 46, No. 5, pages 604-622, 1999. [11] S.-J. Lee and C.-S. Ouyang, ”A Neuro-Fuzzy System Modeling with Self-Constructing Rule Generation and Hybrid SVD-Based Learning,” IEEE Transactions on Fuzzy Systems, Vol. 11, No. 3, pages 341-353, 2003. [12] P. Li, C. Burges, and Q. Wu, “McRank: Learning to Rank Using Multiple Classification and Gradient Boosting,” 21st Annual Conference on Neural Information Processing Systems, pages 845-852, 2007. [13] R. Nallapati, “Discriminative Models for Information Retrieval,” 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 64-71, 2004. [14] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Technical report, Stanford University, 1998. [15] G. Salton and M. J. McGill, Introduction to Modern Retrieval. McGraw-Hill Book Company, 1983. [16] S. E. Robertson and K. S. Jones, “Relevance weighting of search terms,” Journal of the American Society for Information Sciences, Vol. 27, No. 3, pages 129-146, 1976. [17] S. E. Robertson, “Overview of the Okapi Projects”, Journal of Documentation, Vol. 53, No. 1, pages 3-7, 1997. [18] M.-F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, and W.-Y. Ma, “Frank: A Ranking Method with Fidelity Loss,” 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 383-390, 2007. [19] J. Xu and H. Li, “AdaRank: A Boosting Algorithm for Information Retrieval,” 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 391-398, 2007. [20] Y. Yue, T. Finley, F. Radlinski, and T. Joachims, “A Support Vector Method for Optimizing Average Precision,” 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 271-278, 2007. [21] C. Zhai and J. Lafferty, “A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval,” 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334-342, 2001. [22] http://research.microsoft.com/en-us/people/tyliu/. [23] http://research.microsoft.com/en-us/um/beijing/projects/letor/index.html. [24] http://searchenginewatch.com/3632382. [25] http://svmlight.joachims.org/. [26] http://www.find.org.tw/0105/howmany/howmany_disp.asp?id=219.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.218.168.16 論文開放下載的時間是校外不公開 Your IP address is 18.218.168.16 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS