Responsive image
博碩士論文 etd-0828109-151321 詳細資訊
Title page for etd-0828109-151321
論文名稱
Title
一個查詢相關的資訊擷取評等方法
A Query Dependent Ranking Approach for Information Retrieval
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
64
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2009-07-24
繳交日期
Date of Submission
2009-08-28
關鍵字
Keywords
模型結合、查詢相似度、查詢相關的排序、排序學習、資訊擷取、排序模型
information retrieval, Ranking model, model combination, query similarity, learning to rank, query dependent ranking
統計
Statistics
本論文已被瀏覽 5787 次,被下載 0
The thesis/dissertation has been browsed 5787 times, has been downloaded 0 times.
中文摘要
建立排序模型在資訊擷取領域上是一個很重要的議題。最近幾年,基於排序學習的想法,許多關於這個主題的方法被提出,而大多數方法企圖利用單個函數,希望可以通用所有查詢並給每篇文件一個分數。在本篇論文裡,我們提出一個新的查詢相關排序架構,將每個訓練查詢和它對應的文件,都分別建立各自的排序模型,當一個新的測試查詢被使用者要求,所擷取到的文件會根據與訓練查詢相似度,挑出一些較合適的排序模型,並且將它們作模型結合,經由這個組合的排序模型得到分數。而此機制也提供了結合模型的權重值。實驗結果可以證明查詢相關的排序方法具有不錯的效果,優於其他方法。
Abstract
Ranking model construction is an important topic in information retrieval. Recently, many approaches based on the idea of “learning to rank” have been proposed for this task and most of them attempt to score all documents of different queries by resorting to a single function. In this thesis, we propose a novel framework of query-dependent ranking. A simple similarity measure is used to calculate similarities between queries. An individual ranking model is constructed for each training query with corresponding documents. When a new query is asked, documents retrieved for the new query are ranked according to the scores determined by a ranking model which is combined from the models of similar training queries. A mechanism for determining combining weights is also provided. Experimental results show that this query dependent ranking approach is more effective than other approaches.
目次 Table of Contents

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vi
第一章 簡介 1
1.1 研究背景 1
1.2 問題定義 4
1.3 研究目的 5
1.4 論文架構 5
第二章 文獻探討 7
2.1 傳統資料檢索方法 7
2.2 排序學習 10
2.3 支援向量機排序法(Ranking SVM) 14
2.4 評估工具 15
2.4.1 平均精確率(MAP) 15
2.4.2 正規化遞減累積獲益(NDCG) 17
第三章 研究方法 19
3.1 研究動機 19
3.2 我們的方法(Query Dependent Ranking, QDR) 21
3.2.1 方法概述 21
3.2.2 排序模型的建立 23
3.2.3 查詢的表示 24
3.2.4 模型的選擇 28
3.2.5 模型的結合 30
第四章 實驗結果與分析 32
4.1 實驗資料 32
4.2 特徵選取 34
4.3 結果與分析 39
第五章 結論與未來展望 52
5.1 結論 52
5.2 未來研究方向 52
參考文獻 53
參考文獻 References
[1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, 1999.
[2] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, “Learning to Rank Using Gradient Descent,” 22nd International Conference on Machine Learning, pages 89-96, 2005.
[3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li, “Learning to Rank: From Parwise Approach to Listwise Approach,” 24th International Conference on Machine Learning, pages 129-136, 2007.
[4] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An Efficient Boosting Algorithm for Combining Preferences,” Journal of Machine Learning Research, Vol. 4, pages 933-969, 2003.
[5] X.-B. Geng, T.-Y. Liu, T. Qin, H. Li, and H.-Y. Shum, “Query-Dependent Ranking Using K-Nearest Neighbor,” 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 115-122, 2008.
[6] R. Herbrich, T. Graepel, and K. Obermayer, “Large Margin Rank Boundaries for Ordinal Regression,” Advances in Large Margin Classifiers. MIT Press, 2000.
[7] W. Hersh, C. Burkley, T. J. Leone, and D. Hickam, “Ohsumed: An Interactive Retrieval Evaluation and New Large Test Collection for Research,” 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 115-122, 2008.
[8] K. Järvelin and J. Kekäläinen, “Cumulated Gain-based Evaluation of IR Techniques,” ACM Transactions on Information Systems, Vol. 20, No. 4, pages 422-446, 2002.
[9] T. Joachims, “Optimizing Search Engines Using Clickthrough Data,” ACM Conference on Knowledge Discovery and Data Mining, pages 133-142, 2002.
[10] J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," Journal of the ACM, Vol. 46, No. 5, pages 604-622, 1999.
[11] S.-J. Lee and C.-S. Ouyang, ”A Neuro-Fuzzy System Modeling with Self-Constructing Rule Generation and Hybrid SVD-Based Learning,” IEEE Transactions on Fuzzy Systems, Vol. 11, No. 3, pages 341-353, 2003.
[12] P. Li, C. Burges, and Q. Wu, “McRank: Learning to Rank Using Multiple Classification and Gradient Boosting,” 21st Annual Conference on Neural Information Processing Systems, pages 845-852, 2007.
[13] R. Nallapati, “Discriminative Models for Information Retrieval,” 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 64-71, 2004.
[14] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Technical report, Stanford University, 1998.
[15] G. Salton and M. J. McGill, Introduction to Modern Retrieval. McGraw-Hill Book Company, 1983.
[16] S. E. Robertson and K. S. Jones, “Relevance weighting of search terms,” Journal of the American Society for Information Sciences, Vol. 27, No. 3, pages 129-146, 1976.
[17] S. E. Robertson, “Overview of the Okapi Projects”, Journal of Documentation, Vol. 53, No. 1, pages 3-7, 1997.
[18] M.-F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, and W.-Y. Ma, “Frank: A Ranking Method with Fidelity Loss,” 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 383-390, 2007.
[19] J. Xu and H. Li, “AdaRank: A Boosting Algorithm for Information Retrieval,” 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 391-398, 2007.
[20] Y. Yue, T. Finley, F. Radlinski, and T. Joachims, “A Support Vector Method for Optimizing Average Precision,” 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 271-278, 2007.
[21] C. Zhai and J. Lafferty, “A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval,” 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334-342, 2001.
[22] http://research.microsoft.com/en-us/people/tyliu/.
[23] http://research.microsoft.com/en-us/um/beijing/projects/letor/index.html.
[24] http://searchenginewatch.com/3632382.
[25] http://svmlight.joachims.org/.
[26] http://www.find.org.tw/0105/howmany/howmany_disp.asp?id=219.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.218.168.16
論文開放下載的時間是 校外不公開

Your IP address is 18.218.168.16
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code