國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,知識分享環境中知識文件間語意關係辨識之研究,Semantic Relationship Annotation for Knowledge Documents in Knowledge Sharing Environments

論文名稱 Title	知識分享環境中知識文件間語意關係辨識之研究 Semantic Relationship Annotation for Knowledge Documents in Knowledge Sharing Environments
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	92 學年度第 2 學期 The spring semester of Academic Year 92	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	57
研究生 Author	白益忠 Yi-chung Pai
指導教授 Advisor	魏志平 Chih-ping Wei
召集委員 Convenor	胡仁華 Paul Jen-hwa Hu
口試委員 Advisory Committee	鄭興 Hsing Kenny Cheng
口試日期 Date of Exam	2004-07-27	繳交日期 Date of Submission	2004-07-29
關鍵字 Keywords	文件分類、知識分享、回應-語意關係、文類分類 Genre Classification, Text Categorization, Reply-semantic Relationship, Knowledge Sharing
統計 Statistics	本論文已被瀏覽 5775 次，被下載 3763 次 The thesis/dissertation has been browsed 5775 times, has been downloaded 3763 times.

中文摘要
傳統的線上知識分享環境存在著大量的知識文件或討論文件。因此，一項組織知識分享的重要議題是當面對大量的知識文件如何做有效率的管理。而我們認為在知識文件間可能存在著外顯或內隱的回應-語意關係（reply-semantic relationships）。然而，這樣的回應-語意關係一旦被發現或辨認出來，藉由這樣一個更先進且具有語意擷取能力的機制，將大大提升企業知識存取的能力。在本研究中，我們針對回應式的文件，提出一組回應-語意關係的初步類別架構，且發展一項知識文件間語意關係辨識技術（SEmantic Enrichment between Knowledge documents, SEEK），以自動化辨識回應式文件間的語意關係。　　本研究以內容式文件分類技術(content-based text categorization)以及文類分類技術(genre classification)為基礎，我們設計並評估不同特徵組合而成的模式、包括：關鍵字特徵（keyword features）、POS統計值特徵（POS statistics features）、及給定/新發現資訊特徵（given/new information features, GI/NI）的組合。根據實證評估的結果顯示，我們所提出的SEEK技術可以達到一個不錯的分類精準度。此外，以關鍵字與GI/NI的組合做為特徵值用於本技術，於Answer/Comment語意關係分類工作有最佳的分類準確率。另一方面，只使用關鍵字做為特徵值用於本技術，於Explanation/Instruction語意關係分類工作表現最好。
Abstract
A typical online knowledge-sharing environment would generate vast amount of formal knowledge elements or interactions that generally available as textual documents. Thus, an effective management of the ever-increasing volume of online knowledge documents is essential to organizational knowledge sharing. Reply-semantic relationships between knowledge documents may exist either explicitly or implicitly. Such reply-semantic relationships between knowledge documents, once discovered or identified, would facilitate subsequent knowledge access by providing a novel and more semantic retrieval mechanism. In this study, we propose a preliminary taxonomy of reply-semantic relationships for documents organized in reply-replied structures and develop a SEmantic Enrichment between Knowledge documents (SEEK) technique for automatically annotating reply-semantic relationships between reply-pair documents. Based on the content-based text categorization techniques and genre classification techniques, we propose and evaluate different feature-set models, combinations of keyword features, POS statistics features, and/or given/new information (GI/NI) features. Our empirical evaluation results show that the proposed SEEK technique can achieve a satisfactory classification accuracy. Furthermore, use of keyword and GI/NI features by the proposed SEEK technique resulted in the best classification accuracy for the Answer/Comment classification task. On the other hand, the use of keyword features only can best differentiate Explanation and Instruction relationships.

目次 Table of Contents
CHAPTER 1 INTRODUCTION 1 1.1 Background and Research Motivation 1 1.2 Research Objective 3 1.3 Organization of the Thesis 4 CHAPTER 2 LITERATURE REVIEW 6 2.1 Schemes for Communicative Actions 6 2.1.1 Speech Act Schemes 6 2.1.2 Classification Scheme for Relationships between Messages in A Reply-Replied Structure 11 2.2 Text Categorization 12 2.3 Genre Classification 16 CHAPTER 3 DESIGN OF SEMANTIC ENRICHMENT FOR KNOWLEDGE DOCUMENTS (SEEK) TECHNIQUE 21 3.1 A Taxonomy of Reply-semantic Relationships in SEEK 21 3.2 Overall Process of SEmantic Enrichment for Knowledge documents (SEEK) Technique 25 3.2.1 Cue Feature Extraction and Selection Phases 26 3.2.2 Reply-pair Representation Phase 30 3.2.3 Induction Phase 30 CHAPTER 4 EMPIRICAL EVALUATION FOR SEEK TECHNIQUE 31 4.1 Evaluation Design 31 4.1.1 Data Collection 31 4.1.2 Evaluation Criteria 33 4.1.3 Evaluation Procedure 33 4.2 Evaluation Results 33 4.2.1 Tuning Number of Keywords for Model 1 to Model 4 34 4.2.2 Comparative Evaluation of SEEK 36 CHAPTER 5 CONCLUSION AND FUTURE RESEARCH DIRECTIONS 39 APPENDIX A: PENN TREEBANK TAGS 42 REFERENCES 45

參考文獻 References
[ABS99] Agrawal, R., Bayardo, R., and Srikant, R., “Athena: Mining-based Interactive Management of Text Databases,” Proceedings of the 6th International Conference on Extending Databases Technology, July 1999, pp.365-379. [AC97] Allen, J.F. and Core, M.G., “Draft of DAMSL: Dialog Act Markup in Several Layers,” draft manual, Dept. of Computer Science, Univ. of Rochester, Rochester, NY, March 1997. [ADW94] Apte, C., Damerau, F., and Weiss, S., “Automated Learning of Decision Rules for Text Categorization,” ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp.233-251. [B92] Brill, E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the 3rd Conference on Applied Natural Language Processing (ACL), Trento, Italy, 1992. [B94] Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,” Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994. [BM98] Baker, L. D. and Mccallum, A. K., “Distributional Clustering of Words for Text Classification,” Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp.96-103. [CI96] Carletta, J., Isard, A., Isard, S., Kowtko, J., and Doherty-Sneddon, G., “HCRC Dialogue Structure Coding Manual,” Technical Report HCRC/TR-82, 1996. [CS96] Cohen, W. W. and Singer, Y., “Context-sensitive Learning Methods for Text Categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, August 1996, pp.307-315. [DDB98] Davenport, T. H., D. W. DeLong, and M. C. Beers, “Successful Knowledge Management Projects,” Sloan Management Review, Winter 1998, pp.43-57. [DP98] Davenport, T. H. and L. Prusak, “Working Knowledge: How Organizations Manager What They Know,” Harvard Business School Press, Boston, MA, 1998. [DPH98] Dumais, S., Platt, J., Heckerman, D., and Sahami, M., “Inductive Learning Algorithms and Representations for Text Categorization,” Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management (CIKM ‘98), 1998, pp.148-155. [DVM01] Dewdney N., VanEss-Dykema C., and MacMillan R., “The Form is the Substance: Classification of Genres in Text,” Workshop on HLT and KM, ACL, 2001. [E03] Eppler, M. J., “Making Knowledge Visible through Knowledge Maps: Concepts, Elements, Cases” Chapter 10 in Handbook of Knowledge Management, Vol. 1, C. W. Holsapple (Ed.), Germany, 2003, pp.189-205. [FAM96] Ferguson, G., Allen, J., and Miller, B. “TRAINS-95: Towards a Mixed-initiative Planning Assistant,” Proceedings of the 3rd International Conference on AI Planning Systems (AIPS-96), 1996. [FK03] Finn, A. and Kushmerick, N., “Learning to Classify Documents According to Genre,” IJCAI-2003 Workshop on Computational Approaches to Text Style and Synthesis (Acapulco), 2003. [H85] Halliday, M., “An Introduction to Functional Grammar.” London: Edward Amold, 1985. [H99] Hickins, M., “Xerox Shares Its Knowledge,” Management Review, Vol. 8, No. 8, 1999. [IT95] Iwayama, M. and Tokunaga, T., “Cluster-based Text Categorization: A Comparison of Category Search Strategies,” Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘95), Seattle, WA, July 1995, pp.273-281. [KB98] Karlgren J., Bretan I., Dewe J.,Hallberg A., and Wolkert N., “Iterative information retrieval using fast clustering and usage-specic genres.” In Proc Eighth DELOS Workshop on User Interfaces in Digital Libraries, Stockholm, Sweden, October 1998, pp.85-92. [KC94] Karlgren J., Cutting D., “Recognizing Text Genres with Simple Metrics Using Discriminant Analysis”, Proc. of COLING94, Kyoto, 1994. [KNS97] Kessler B., Nunberg G., Schutze H., “Automatic Detection of Text Genre” ACL'97, July 1997, pp.32-38. [LA98] Larsson, S., “Coding Schemas for Dialogue Moves,” (available: http://www.ling.gu.se/~sl), 1998. [LC03] Liebowitz, J. and Y. Chen, “Knowledge Sharing Proficiencies: The Key to Knowledge Management,” Chapter 21 in Handbook of Knowledge Management, Vol. 1, C. W. Holsapple (Ed.), Germany, 2003, pp.409-424. [LC96] Larkey, L. and Croft, W., “Combining Classifiers in Text Categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘96), Zurich, Switzerland, August 1996, pp.289-297. [LR94] Lewis, D. and Ringuette, M., “A Comparison of Two Learning Algorithms for Text Categorization,” Proceedings of Symposium on Document Analysis and Information Retrieval, 1994. [MLW92] Masand, B., Linoff, G., and Waltz, D., “Classifying News Stories Using Memory Based Reasoning,” Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’92), 1992, pp.59-64. [MN98] McCallun, A. K. and Nigam, K., “A Comparison of Event Models for Naïve Bayes Text Classification,” Proceedings of AAAI-98 Workshop on Learning for Text Categorization, 1998. [MOI02] Matsuo Y., Ohsawa Y., and Ishizuka M., “Mining Messages in an Electronic Message Board by Repetition of Words,” The Second International Workshop on Chance Discovery, Pacific Rim International Conference on AI (PRICAI), Tokyo, 2002. [NGL97] Ng, H. T., Goh, W. B., and Low, K. L., “Feature Selection, Perceptron Learning, and A Usability Case Study for Text Categorization,” Proceedings of Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘97), 1997, pp.67-73. [P00] Preece, J., Online Communities: Designing for Sociability and Usability, John Wiley, Chichester, UK, 2000. [PLV02] Pang B., Lee L., and Vaithyanathan S., “Thumbs up? Sentiment Classification using Machine Learning Techniques.” Proceedings of EMNLP, 2002. [Q93] Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993. [RHW86] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal Representations by Error Propagation,” Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, Rumelhart, D. E. and McClelland, J. L. (Eds.), MIT Press, Cambridge, MA, 1986, pp.318-362. [RM01] Rauber, A. and Müller-Kögler, A., “Integrating automatic genre analysis into digital libraries,” Proceedings of the First ACM IEEE Joint Conference on Digital Libraries (JCDL’01), 2001, pp.1-10. [RM95] Reithinger, N. and Maier, E., “Utilizing Statistical Dialogue Act Processing in Verbmobil,” Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL-95), 1995. [S75] Searle, J. R., “Language, Mind, and Knowledge, Minnesota Studies in the Philosophy of Science,” in A Taxonomy of Illocutionary Acts, University of Minnesota Press, 1975. [SFT94] Seligman, M., Fais, L., and Tomokiyo, M., “A Bilingual Set of Communicative Act Labels for Spontaneous Dialogues.“ Technical Report TR-IT-0081, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994. [SRC00] Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Martin, R., Van Ess-Dykema, C., and Meteer, M., “Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech,” Computational Linguistics, Vol. 26, No. 3, 2000, pp.339-373. [T02] Turney and Peter D. “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews.” In Proceedings 40th Annual Meeting of the Association for Computational Linguistics (ACL'02), Philadelphia, Pennsylvania, 2002, pp.417-424. [TH92] Traum, D. R., and Hinkelman E. A., “Conversation Acts in Task-oriented Spoken Dialogue,” Computational Intelligence, Vol. 8, No.3, 1992, pp.575-599. [WAD99] Weiss, S. M., Apte, C., Damerau, F. J., Johnson, D. E., Oles, F. J., Goetz, T., and Hampp, T., “Maximizing Text-Mining Performance,” IEEE Intelligent Systems, Vol. 14, No. 4, July/August 1999, pp.63-69. [WHD02] Wei, C., Hu, P., and Dong, Y. X., “Managing Document Categories in E-Commerce Environments: An Evolution-Based Approach,” European Journal of Information Systems, Vol. 11, No. 3, September 2002, pp.208-222. [WKF02] Wu, T., Khan, F. M., Fisher, T. A., Shuler, L. A., and Pottenger, W. M., “Posting Act Tagging Using Transformation-Based Learning,” Proceedings of the Workshop on Foundations of Data Mining and Discovery, IEEE International Conference on Data Mining (ICDM'02), December 2002. [WPW95] Wiener, W., Pedersen, J. O., and Weigend, A. S., “A Neural Network Approach to Topic Spotting,” Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR ’95), Las Vegas, NV, 1995, pp.317-332. [X97] Xu, J., “Solving the Word Mismatch Problem Through Automatic Text Analysis,” Unpublished Ph.D Thesis, University of Massachusetts at Amherst, 1997. [Y94] Yang, Y., “Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval,” Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘94), Dublin, Ireland, July 1994, pp.13-22. [YC94] Yang, Y. and Chute, C. G., “An Example-based Mapping Method for Text Categorization and Retrieval,” ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp.252-277. [YC94] Yang, Y. and Chute, C. G., “An Expample-based Mapping Method for Text Categorization and Retrieval,” ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp.252-277. [YL99] Yang, Y. and Liu, X., “A Re-examination of Text Categorization methods,” Proceedings of SIGIR ’99: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp.42-49.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0729104-020226.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS