國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,關鍵字關聯採礦式個人化文獻搜尋方法之研究,A Keyword-Based Association Rule Mining Method for Personal Document Query

論文名稱 Title	關鍵字關聯採礦式個人化文獻搜尋方法之研究 A Keyword-Based Association Rule Mining Method for Personal Document Query
系所名稱 Department	機械與機電工程學系 Department of Mechanical and Electro-Mechanical Engineering
畢業學年期 Year, semester	91 學年度第 2 學期 The spring semester of Academic Year 91	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	60
研究生 Author	曾健銘 Chien-Ming Tseng
指導教授 Advisor	趙平宜 Ping-Yi Chao
召集委員 Convenor	黃三益 San-Yi Huang
口試委員 Advisory Committee	林芬慧 FEN-HUI LIN
口試日期 Date of Exam	2003-07-25	繳交日期 Date of Submission	2003-08-29
關鍵字 Keywords	擴展查詢、數位圖書館、資料採礦 Data Mining, Query Expansion, Digital Library
統計 Statistics	本論文已被瀏覽 5678 次，被下載 20 次 The thesis/dissertation has been browsed 5678 times, has been downloaded 20 times.

中文摘要
由於網際網路的發達與興盛，以及資訊科技的進步，資訊被大量地製造、複製、儲存與傳播，雖加快了資訊的分享與應用，但是對於人類來說，面對大量資訊的產生與流通，逐漸地感到力不從心，無法快速有效地消化並得到所想要的價值資訊與知識，因此很多人都面臨了資訊超載(Information Overload)的問題——注意力有限但資訊無窮；同樣的問題也發生在文獻數位圖書館內文獻的搜尋上。文獻數位圖書館是提供線上文獻搜尋與全文下載服務的網站，其目的是希望將文獻數位化後，能夠永久的保存，並透過網際網路的連結性與傳播力，提供世界各地的研究者一個便利、迅速的查詢相關領域研究成果的環境，以達成知識分享、傳遞的目的，而其所收藏的文獻動輒可達百萬篇之數，且隨著時間的前進，收藏數只會越來越多，因此發展適當的文獻搜尋、推薦機制就成了非常重要的議題。本研究提出了一個結合資料採礦技術的個人化文獻推薦機制，以透過所採掘關鍵字間的關聯規則以及使用者的個人喜好記錄做為文獻推薦的依據，經由初步的實驗驗證後證明，本研究所提的機制確實可以避免目前現有一些方法機制的缺點，且在Precision和Recall兩指標的效能上也有良好的表現，解決了部分文獻數位圖書館資訊超載的問題。
Abstract
Because of the flourishing growth of Internet and IT there are too much information surround us today. We have limited attention but unlimited information. So almost all people today face a novel problem— Information Overload. It means our precious resource— attention, which is not enough to be used to digest any information that we touch. This problem also exists in Literature Digital Libraries. In today, any Literature Digital Library may collect over one million literatures and documents. Hence a well searching or recommendation mechanism is needful for users. But the traditional ones are not good enough for users. Their searching results may need users to spend more effort to select for users’ true requirement. So this study tries to propose a new personal document recommendation mechanism to solve this problem. This mechanism use keyword-based association rule mining method to find association rules between documents. Then according to these rules and user’s history preference, the mechanism recommend documents for user that they really want. After some evaluations, we prove this study’s mechanism actually solve partial information overload problem. And it has good performance on both “Precision” and “Recall” indices.

目次 Table of Contents
目錄目錄 I 圖目錄 IV 表目錄 V 摘要 VI Abstract VII 第1章導論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 研究方法 3 第2章文獻回顧 5 2.1 合作式推薦 5 2.2 內容導向式推薦 7 2.3 語意擴充式推薦 8 2.4 主要資訊推薦方法之整理比較 12 2.5 資料採礦 14 2.5.1 資料採礦概論 14 2.5.2 資料採礦的分析功能 17 2.5.3 關鍵字關聯規則採礦 21 第3章關鍵字關聯採礦式個人化文獻推薦方法 23 3.1 關鍵字關聯規則採礦 24 3.2 個人化文獻推薦 33 3.2.1 文獻推薦 34 3.2.2 調整推薦 37 第4章雛形系統之規劃與建立 39 4.1 系統架構 39 4.2 資料庫設計 42 第5章實證評估 45 5.1 實驗評估指標 45 5.2 實驗設計與進行 46 5.3 實驗結果分析 49 第6章結論 53 6.1 研究結論 53 6.2 研究貢獻 54 6.3 未來發展 54 參考文獻 55 附錄A Apriori演算法 58 附錄B Relevant documents 59 圖目錄圖 1. 1 研究流程 4 圖 2. 1 合作式推薦喜好電影——資料表範例圖 [HKo01] 5 圖 2. 2 語意關聯網路圖 [楊永芳02] 8 圖 2. 3 語意關聯資料庫建置流程圖 [楊永芳02] 9 圖 2. 4 語意擴充的類型 [楊永芳02] 9 圖 2. 5 知識探索過程圖 [HKa01] 16 圖 2. 6 典型的Data Mining系統架構 [HKa01] 17 圖 2. 7 classification的第一步——建立分類模型 (修改自[HKa01]) 19 圖 2. 8 classification的第二步——測試修正&使用分類模型預測 (修改自[HKa01]) 20 圖 2. 9 Chameleon演算法步驟圖 [KHK99] 21 圖 3.1 關鍵字關聯採礦式個人化文獻推薦方法步驟圖 23 圖 3.2 關鍵字關聯採礦演算法過程 25 圖 3.3 關鍵字關聯採礦演算法流程圖 28 圖 3. 4 找出D中頻繁關鍵字集合之演算過程說明，Min. Support Count = 2 31 圖 3. 5 個人化文獻推薦流程圖 33 圖 3. 6 文獻推薦流程圖 35 圖 3. 7 調整推薦流程圖 37 圖 4. 1 三層式系統(Three-Tier)架構圖 40 圖 4. 2 雛形系統架構圖 41 圖 4. 3 文獻資料庫關聯圖 42 圖 4. 4 個人化資料庫關聯圖 43 圖 5. 1 Precision & Recall的集合關係說明圖 45 圖 5. 2 實驗進行流程 46 圖 5. 3 雛形系統登入畫面 47 圖 5. 4 查詢畫面 47 圖 5. 5 結果選取畫面 48 圖 5. 6 喜好文獻檢視頁 48 圖 5. 7 本雛形系統相較於eThesys的Precision相對差值長條圖 50 圖 5. 8 本雛形系統相較於eThesys的Recall相對差值長條圖 51 圖 A. 1 Apriori演算法 [HKa01] 58 表目錄表 2. 1 概念促動值累加結果 [楊永芳02] 10 表 2. 2 關鍵字興趣度 [楊永芳02] 11 表 2. 3 文字興趣度計算範例 [楊永芳02] 11 表 2. 4 各推薦方法比較整理表 13 表 2. 5 資料庫相關技術的演化 (整理自[HKa01]) 14 表 3. 1 文獻資料庫索引資料表範例 25 表 3. 2 第一階候選集合結果表 25 表 3. 3 文獻資料庫索引表單 D 29 表 3. 4 關聯規則採礦控制參數的變動影響 34 表 3. 5 因推薦的結果數太多或太少而修正關聯規則採礦控制參數的修正方案 36 表 3. 6 因大部分推薦結果不符合使用者需求的關聯規則採礦控制參數修正方案 38 表 5. 1 Precision & Recall實驗結果 49 表 5. 2 本雛形系統相較於eThesys的指標相對差值 50 表 5. 3 分析結果整理 52

參考文獻 References
參考文獻 [楊永芳02] 楊永芳，2002，「語意擴充式文件推薦方法之研究」，國立中山大學資訊管理研究所碩士論文。 [熊文江02] 熊文江，2002，「文獻數位圖書館推薦技術之研究」，國立中山大學資訊管理研究所碩士論文。 [戴文波特+02] 戴文波特、貝克，注意力經濟——抓準企業新焦距，天下遠見出版股份有限公司，2002年2月。 [AIS93] Agrawal, R., Imielinski, T., Swami, A., “Mining Association Rules Between Sets of Items in Large Databases”, In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’93), pp. 207-216, Washington, DC, May 1993. [AMS+96] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I., “Fast Discovery of Association Rules”, In Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. editors, Advances in Knowledge Discovery and Data Mining, pp. 307-328. AAAI/MIT Press, 1996. [AS94] Agrawal, R., and Srikant, R., “Fast Algorithm for Mining Association Rules”, In Proc. 1994 Int. Conf. Very Large Databases (VLDB’94), pp. 487-499, Santiago, Chile, Sept. 1994. [CCQ+02] Chau, M., Chen, H., Qin, J., Zhou, Y., Qin, Y., Sung, WK., McDonald, D., “Comparison of Two Approaches to Building a Vertical Search Tool: A Case Study in the Nanotechnology Domain”, In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’02), Portland, OR, USA, July 2002. [CHY96] Chen, M.S., Han, J., Yu, P.S., Dec. 1996, ”Data Mining: An Overview from a Database Perspective”, IEEE Transaction on Knowledge And Data Engineering, Vol. 8, No. 6, pp. 866-883. [CMS99] Cooley, R., Mobasher, B., Srivastava, J., 1999, “Data Preparation for Mining Word Wide Web Browsing Patterns”, Journal of Knowledge and Information Systems, 1(1), pp. 5-32. [CWN+03] Cui, H., Wen, JR., Nie, JY., Ma, WY., 2003, “Query Expansion by Mining User Logs”, IEEE Transaction on Knowledge And Data Engineering, Vol. 15, Issue: 4, pp. 829-839. [FH98] Feldman, R., and Hirsh, H., “Finding Associations in Collections of Text”, In Michalski, R.S., Bratko, I., Kubat, M. editors, Machine Learning and Data Mining: Methods and Applications, pp. 223-240. New York: John Wiley & Sons, 1998. [Han98] Han, J., 1998, “Toward On-Line Analytical Mining in Large Databases”, SIGMOD Record, 27(1), pp.97-107. [Hir01] Hirji, K.K., Jul. 2001, “Exploring Data Mining Implementation”, Communications of the ACM, Vol. 44, No. 7, pp. 87-93. [HH01] Hasan, H., and Hyland, P., Sep. \| Oct. 2001, “Using OLAP and Multidimensional Data for Decision Making”, IT Pro, pp. 44-50. [HKa01] Han, J., and Kamber, M., Data Mining：Concepts and Techniques, San Francisco: Morgan Kaufmann, 2001. [HKo01] Herlocker, J.L., and Konstan, J.A., Nov/Dec 2001, “Content-Independent Task-Focused Recommendation”, Internet Computing, IEEE, Vol. 5, Issue: 6, pp. 40-47. [KHK99] Karypis, G., Han, EH., Kumar, V., Aug. 1999, “Chameleon: Hierarchical Clustering Using Dynamic Modeling”, Computer, IEEE, Vol. 32, Issue: 8, pp. 68-75. [LSH+97] Lim, JH., Seung, HW., Hwang, J., Kim, YC., Kim, HN., “Query Expansion for Intelligent Information Retrieval on Internet”, In Proceedings of the 1997 international Conference on Parallel and Distributed Systems, pp. 656-662, 10-13 Dec. 1997. [MR00] Mooney, R.J., and Roy, L., “Content-Based Book Recommending Using Learning for Text Categorization”, In Proceedings of the fifth ACM Conference on Digital Libraries, San Antonio, TX, USA, June 2000. [PGR+00] Paepcke, A., Garcia-Molina, H., Rodriguez-Mula, G., Cho, J., Mar. 2000, “Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies”, SIGMOD Record, 29(1), pp. 80-92. [PJ01] Pedersen, T.B., and Jensen, C.S., Dec. 2001, “Multidimensional Database Technology”, Computer, IEEE, Vol. 34, Issue: 12, pp. 40-46. [WAD+99] Weiss, S.M., Apte, C., Damerau, F.J., Johnson, D.E., Oles, F.J., Goetz, T., Hampp, T., July/Aug. 1999, “Maximizing Text-Mining Performance”, IEEE Intelligent Systems, pp. 63-69. [YXE+01] Yu, K., Xu, X., Ester, M., Kriegel, HP., “Selecting Relevant Instances for Efficient and Accurate Collaborative Filtering”, In Proceedings of the tenth international Conference on Information and Knowledge Management, Atlanta, Georgia, USA, Nov. 2001.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內一年後公開，校外永不公開 campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.135.187.106 論文開放下載的時間是校外不公開 Your IP address is 3.135.187.106 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS