Responsive image
博碩士論文 etd-0213112-123624 詳細資訊
Title page for etd-0213112-123624
論文名稱
Title
以LDA和使用紀錄為基礎的線上電子書主題趨勢發掘方法
An Approach to eBook Topics Trend Discovery Based on LDA and Usage Log
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
61
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-01-12
繳交日期
Date of Submission
2012-02-13
關鍵字
Keywords
LCSH、LCC、LDA、主題、使用記錄、電子書、主題模型
LDA, Topic Model, Topic, Usage Log, Ebook, LCC, LCSH
統計
Statistics
本論文已被瀏覽 5958 次,被下載 5854
The thesis/dissertation has been browsed 5958 times, has been downloaded 5854 times.
中文摘要
網際網路的發展及科技的進步讓數位內容產業日漸蓬勃,出版業者紛紛開始提供線上電子書檢索、閱讀及下載服務,使用者不受地域或時間的限制,隨時隨地都能使用電腦來閱讀數位內容,另外一方面圖書館購買電子書做為館藏的比例亦逐年增加。使用電子資源的方式,可透過連線到電子書檢索平台或透過圖書館自動化系統檢索,由館藏目錄中直接鏈結至電子書平台進行使用。這一個方式相較於實體館藏來說沒有流通數量上的限制,同時提昇了圖書資源的利用率。
提供電子書檢索服務的出版社或系統整合業者眾多,圖書內容包羅萬象,考量到有限的預算條件下,圖書館採購電子書除了參考讀者的推薦之外亦需要評估電子資源的使用率,做最有效率的投資。目前最普遍的方式是使用統計報表,其通常由出版社所提供。
本研究使用Latent Dirichlet Allocation簡稱LDA的方法,基於圖書的內容來建置主題模型,然後結合電子書檢索平台的使用統計報表,運用主題模型的加權來發掘電子書讀者閱讀主題的變化,進而提供一個具參考價值的訊息。我們在實驗中並比較了其他兩種方式:美國國會分類法和主題標目法。實驗結果證實透過主題加權方法產生的主題模型與其他兩種方法顯著不同,可以提供另一方面的有用資訊。
Abstract
With the growth of digital content industry, publishers start to provide online services for ebook search, reading and downloading. Users can access to online resources from anywhere, any place with laptop or mobile devices at any time. Nowadays more and more libraries have purchased ebooks as an important part of the library collection. To access the online resources users can link directly to publisher's ebook portal or via the OPAC system. Compared to the library circulation process, ebooks are more convenient to patrons and improve the utilization of library online resources.
There are various kinds of ebooks available in the market, so libraries have to focus their investment on the most valuable online resources. Usage statistics report plays an important role in providing valuable information to libraries. It is usually based on the standard of COUNTER to generate the statistic reports, although it provides when and where users access to specific ebooks, it fails show the general topics and how they change.
In this study, we introduce a post process method to weighting the LDA topic model via the usage statistic report to emphasize the changes of topic and compare it to the classification method and subject heading method in the bibliographic, namely LCC and LCSH respectively. The result show that weighted topic model significantly affect the ranking of topics, and the topic model are independent from the classification method and the subject heading method in the bibliographic record.
目次 Table of Contents
第一章 諸論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目地 2
1.4 論文架構 3
第二章 文獻探討 4
2.1 LDA主題模型 4
2.2 LDA參數的選擇 6
2.3 Collapsed Gibbs Sampler 7
2.4 COUNTER統計報表 9
2.5 美國國會圖書館分類法 10
2.6 美國國會圖書館標題表 11
第三章 主題模型建立的方法 13
3.1系統架構 13
3.2文字資料前置處理 15
3.2.1 資料來源 15
3.2.2資料處理方式 18
3.3 使用記錄前置處理 19
3.3.1資料來源 19
3.3.2 資料處理方式 21
3.4 LDA參數選擇 23
3.5 主題模型建置 24
3.5.1 工具的選擇 24
3.5.2 輸入資料格式 25
3.5.3 輸出資料格式 26
3.5.4 主題模型的建置 27
3.5.5 LDA資料庫的設計 28
3.6 LDA主題加權 30
第四章 實驗結果 34
4.1 前言 34
4.2 主題加權結果觀察 34
4.3 LCC與主題模型關聯性 39
4.4 LCSH與主題模型關聯性 41
4.5 LCC、LCSH與主題相關性觀察 43
第五章 結論與未來研究建議 47
5.1 結論 47
5.2 未來研究建議 47
第六章 參考文獻 49
參考文獻 References
AlSumait, L., Barbara, D., & Domeniconi, C. (2008, 15-19 Dec. 2008). On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking. Paper presented at the Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on.
Anthes, G. (2010). Topic models vs. unstructured data. Commun. ACM, 53(12), 16-18. doi: 10.1145/1859204.1859210
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Paper presented at the Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3, 993-1022. doi: 10.1162/jmlr.2003.3.4-5.993
Chang, J., Boyd-graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2010). Reading Tea Leaves: How Humans Interpret Topic Models %U http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.992.
. Columbia University Press. from http://cup.columbia.edu/
. COUNTER - Counting Online Usage of Networked Electronic Resources. from http://www.projectcounter.org/
. COUNTER - Counting Online Usage of Networked Electronic Resources Home. from http://www.projectcounter.org/
Darling, W. M. (2011). A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling.
. Gibbs sampling. from http://en.wikipedia.org/wiki/Gibbs_sampling
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1), 5228-5235. doi: 10.1073/pnas.0307752101
Hall, D., Jurafsky, D., & Manning, C. D. (2008). Studying the history of ideas using topic models. Paper presented at the Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii.
Khosh-khui, S. A. (1987). Relationship Between LCSH and LCC Notationsin Different Classes of LCC. Staff Publications-Library, Texas State University.
. Library of Congress Classification. from http://www.loc.gov/catdir/cpso/lcc.html
Magdy, W., & Darwish, K. (2008). Book search: indexing the valuable parts. Paper presented at the Proceeding of the 2008 ACM workshop on Research advances in large digital book repositories, Napa Valley, California, USA. http://dl.acm.org/citation.cfm?doid=1458412.1458429
Maskeri, G., Sarkar, S., & Heafield, K. (2008). Mining business topics in source code using latent dirichlet allocation. Paper presented at the Proceedings of the 1st India software engineering conference, Hyderabad, India.
Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007). Subject metadata enrichment using statistical topic models. Paper presented at the Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, Vancouver, BC, Canada.
Noh, Y., Hagedorn, K., & Newman, D. (2011). Are learned topics more useful than subject headings. Paper presented at the Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries, Ottawa, Ontario, Canada.
Shepherd, P. T. COUNTER: towards reliable vendor usage statistics. [Conceptual Paper]. VINE, 34(4). doi: 10.1108/03055720410570975
Sun, Y., Han, J., Gao, J., & Yu, Y. (2009). itopicmodel: Information network-integrated topic modeling.
Wang, X., & McCallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. Paper presented at the Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Philadelphia, PA, USA.
. 國家圖書館編目園地全球資訊網. from http://catweb.ncl.edu.tw/
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code