Responsive image
博碩士論文 etd-0810105-210000 詳細資訊
Title page for etd-0810105-210000
論文名稱
Title
演進式文件類別管理技術
An Evolution-based Approach to Support Effective Document-Category Management
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
95
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2005-07-28
繳交日期
Date of Submission
2005-08-10
關鍵字
Keywords
文件類別演進、以本體論為基礎的文件類別演進、本體論學習、階層式文件類別的演進、文件類別管理
Ontology-based Category Evolution, Category Evolution, Ontology Learning, Category Hierarchy Evolution, Document-Category Management
統計
Statistics
本論文已被瀏覽 5683 次,被下載 1277
The thesis/dissertation has been browsed 5683 times, has been downloaded 1277 times.
中文摘要
依據文件的類型將其分門別類,已然成為無論是個人或組織最廣泛使用的文件管理方法。然而隨著時間的經過,各個文件類別可能會因為分類想法的改變,或不斷匯入的新文件與最初的分類想法上的差距,而逐漸擴大同類別中文件內容的差距,而引發重新歸類文件的需求。除此之外,從現有的文件類別中取出部分所需的文件時,亦可能引發將文件重新歸類或群集的需求。由上可知,有效地將現有文件重新歸類到適當的群集中,是有其必要性與重要性。因此,本論文首先改善前人所提出的文件類別演進技術(CE)所面臨的問題與限制,提出CE2技術。其次,本論文提出以本體論(Ontology)為基礎的文件類別演進技術 (ONCE)。該技術藉由Ontology來協助進行文件類別的演進,以改善過去以字彙為文件比較基礎的方式所面臨的因用字差異或是字彙多重語意,導致較無法正確判斷文件類別的問題,並冀望能透過Ontology將處理文件方式由字彙提升到概念的層次。最後,過去的研究多假設文件類別之間為獨立,然而類別之間往往存在著由小到大、層層涵蓋的階層關係。因此,本論文提出階層式文件類別的演進方法(Category Hierarchy Evolution),包括以字彙與以Ontology為基礎等二種技術(CHE與OCHE)。本論文並以實務資料來評估所提技術的優劣。實證評估的結果顯示,CE2能較先前提出的CE、以及傳統的探索式的文件分群技術更有效地進行文件類別的演進;而以Ontology為基礎的類別演進技術ONCE也優於以字彙為基礎的類別演進技術CE2。此外,兩種階層式文件類別的演進技術CHE與OCHE皆達成不錯的效果;而在進一步比較中發現,以Ontology為基礎的OCHE能較以字彙為基礎的CHE有效地演進階層式文件類別。
Abstract
Observations of textual document management by individuals and organizations have suggested the popularity of using categories to organize, archive and access documents. The adequacy of an existing category understandably may diminish as it includes influxes of new documents over time or retains only a part of existing documents, bringing about significant changes to its content. Thus, the existing document categories have to be evolved over time as new documents are acquired. Following an evolution-based approach for document-category management, this dissertation extends Category Evolution (CE) technique by addressing its inherent limitations. The proposed technique (namely, CE2) automatically re-organizes document categories while taking into account those previously established. Furthermore, we propose the Ontology-based Category Evolution technique (namely, ONCE) to overcome the problems of word mismatch and ambiguity encountered by the lexicon-based category evolution approach (e.g., CE and CE2). Facilitated by a domain ontology, ONCE can evolve document categories on the conceptual rather the lexical level. Finally, this dissertation further considers the evolution of category hierarchy and proposes Category Hierarchy Evolution technique (CHE) and Ontology-based Category Hierarchy Evolution technique (OCHE) to evolve from an existing category hierarchy. We empirically evaluate the effectiveness of our proposed CE2, ONCE, CHE, and OCHE in different category evolution scenarios, respectively. Our analysis results show CE2 to be more effective than CE and the category discovery approach (specifically, HAC). The ontology-based category evolution approach, ONCE, shows its advantage over CE2 which represents the lexicon-based approach. Finally, the effectiveness attained by CHE and OCHE are satisfactory; and similarly, the ontology-based approach, OCHE, also outperforms the lexicon-based one. This dissertation has contributed to the text mining, document management, and ontology learning research and practice.
目次 Table of Contents
CHAPTER 1 INTRODUCTION 1
1.1 RESEARCH BACKGROUND 1
1.2 RESEARCH MOTIVATION 3
1.3 RESEARCH OBJECTIVES 5
1.4 ORGANIZATION OF THE DISSERTATION 7
CHAPTER 2 LITERATURE REVIEW 8
2.1 AUTOMATED DOCUMENT-CATEGORY MANAGEMENT 8
2.1.1 Document Clustering 8
2.1.2 Text Categorization 10
2.2 CATEGORY EVOLUTION TECHNIQUE 11
CHAPTER 3 DESIGN AND EVALUATION OF CE2 17
3.1 DESIGN OF CE2 17
3.1.1 Category Decomposition Phase 18
3.1.2 Category Amalgamation Phase 19
3.2 DESIGN OF EMPIRICAL EVALUATION 20
3.2.1 Evaluation Procedure 23
3.2.2 Evaluation Criteria 24
3.3 PARAMETER TUNING 25
3.3.1 Tuning Results for HAC 25
3.3.2 Tuning Results for CE 26
3.3.3 Tuning Results for CE2 28
3.4 COMPARATIVE EVALUATION RESULTS 31
3.5 SUMMARY 39
CHAPTER 4 DESIGN AND EVALUATION OF ONTOLOGY-BASED CATEGORY EVOLUTION (ONCE) TECHNIQUE 41
4.1 LEARNING OF CONCEPT DESCRIPTORS 42
4.1.1 Feature Extraction 42
4.1.2 Concept Descriptor Selection 43
4.1.3 Concept Refinement 45
4.2 DESIGN OF ONTOLOGY-BASED CATEGORY EVOLUTION (ONCE) 45
4.2.1 Document Transformation 46
4.2.2 Category Decomposition 48
4.2.3 Category Amalgamation 51
4.3 EMPIRICAL EVALUATION DESIGN 52
4.3.1 Documents and Concept Hierarchy for Concept Descriptor Learning 52
4.3.2 Evaluation Procedure, Criteria, and Benchmark 53
4.4 PARAMETER TUNING EXPERIMENTS FOR ONCE 54
4.5 COMPARATIVE EVALUATION RESULTS 58
CHAPTER 5 DESIGN AND EVALUATION OF CATEGORY HIERARCHY EVOLUTION TECHNIQUES 62
5.1 DESIGN OF CATEGORY HIERARCHY EVOLUTION (CHE) 62
5.1.1 Category Decomposition 63
5.1.2 Category Amalgamation 64
5.1.3 Category Hierarchy Refinement 65
5.2 DESIGN OF ONTOLOGY-BASED CATEGORY HIERARCHY EVOLUTION (OCHE) 65
5.3 EMPIRICAL EVALUATION DESIGN 67
5.3.1 Document Corpus 67
5.3.2 Evaluation Procedure 68
5.3.3 Evaluation Criteria 69
5.4 EVALUATION RESULTS AND DISCUSSIONS 71
CHAPTER 6 CONCLUSION 80
6.1 SUMMARY AND RESEARCH CONTRIBUTIONS 80
6.2 FUTURE RESEARCH DIRECTIONS 82
參考文獻 References
[A73] Anderberg, M. R., Cluster Analysis for Applications, New York: Academic Press, Inc., 1973.
[ABS99] Agrawal, R., Bayardo, R. and Skirant, R., “Athena: Mining-based Interactive Management of Text Databases,” Proceedings of the Seventh Conference on Extending Database Technology, 1999, pp. 365-379.
[ADW94] Apt
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code