國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,線上問答集輔助建立之研究 ,The study of Supporting Online FAQ Generation

論文名稱 Title	線上問答集輔助建立之研究 The study of Supporting Online FAQ Generation
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	91 學年度第 2 學期 The spring semester of Academic Year 91	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	58
研究生 Author	葉飛 Fei Yeh
指導教授 Advisor	張德民 Te-Min Chang
召集委員 Convenor	劉賓陽 Bin-Yang Liu
口試委員 Advisory Committee	蕭文峰 Wen-Feng Hsiao
口試日期 Date of Exam	2003-06-20	繳交日期 Date of Submission	2003-06-29
關鍵字 Keywords	線上問答集、資訊分享、新聞群組、群集分析 FAQ, newsgroups, information sharing
統計 Statistics	本論文已被瀏覽 5799 次，被下載 10 次 The thesis/dissertation has been browsed 5799 times, has been downloaded 10 times.

中文摘要
隨著網際網路的成長，全球性的電子討論版很快地成為一項受歡迎的資訊、知識分享媒介。由新聞群組(newsgroup)管理員將特定領域討論整理而成的線上問答集(FAQ)，成為使用者了解該新聞群組討論背景的重要參考依據，或是搜尋該領域問題之答案的主要來源。然而，線上問答集建立整理的過程既耗時且容易出錯；因此，本研究的目的即針對上述的研究議題提出一個線上問答集輔助建立的方法。我們提出一個四步驟的方法來輔助線上問答集的建立。首先，一篇篇問答文章經過前置處理，進一步擷取其具重要資訊的關鍵字以及關鍵字之間的同義關係；接著運用群集分析來辨識問答集中問題與答案的群集；最後每一群集中具代表性的問題與答案將被擷取出來提供幫助群組管理員整理線上問答集。我們應用一實際新聞群組上的資料¾類神經網路議題來驗證我們所提出的方法，評估的結果驗證了所提方法的適用性。因此我們所提出的方法不但可以有效地幫助新聞群組管理員整理建立線上問答集，更提供後續研究者一個研究思考的方向。
Abstract
Nowadays, with the radical growth of the Internet, worldwide online discussion forums have become a popular social mechanism for people to learn novel information and knowledge. Frequently asked questions (FAQs), which is a collection of questions commonly asked in the newsgroups along with presumably definitive answers, has become an important reference for readers to understand backgrounds of the newsgroup discussions and to locate their desired answers, if any. The construction of FAQs, however, is prone to errors and time-consuming. Approaches to supporting FAQ generation for administrators are desired to develop. In this paper, we propose a four-step approach to supporting the FAQ list generation based on question/answer pairs collected from newsgroup discussions without labor-intensive processes. Texts are processed, and keywords along with synonyms in context are extracted from the answer part. Cluster analysis helps to identify the answer clusters and the corresponding question clusters are formed accordingly. Representative contents of the answer clusters and the question clusters are finally extracted to support administrators to generate FAQs. Our approach is applied in a real-world case where data are collected from the newsgroup in Usenet. FAQ in a primitive form is constructed using our approach. Evaluations are the performed with satisfactory results. The feasibility of our proposed approach is thus justified.

目次 Table of Contents
TABLE OF CONTENTS CHAPTER1 Introduction 1 1.1 Overview 1 1.2 Objective of the Research 2 1.3 Organization of the Thesis 3 CHAPTER2 Literature Review 4 2.1 Information Retrieval 4 2.1.1 Measures of term significance 4 2.1.2 Boolean retrieval method 6 2.1.3 Vector space model 6 2.1.4 Relevance Feedback 7 2.1.5 Usage of Thesauri 7 2.2 Association Analysis 8 2.3 Cluster Analysis 9 2.3.1 Hierarchical clustering 9 2.3.2 Non-hierarchical Clustering 11 2.3.3 Two-stage Clustering 13 2.4 FAQ Generation 14 CHAPTER3 Supporting FAQ Generation Approach 17 3.1 Text processing 18 3.2 Keyword Extraction 22 3.3 Cluster Analysis 26 3.4 Content Extraction 29 CHAPTER 4 Applications and Results 30 4.1 Data Sources 30 4.2 FAQ generation process 31 4.2.1 Text processing 31 4.2.2 Keyword extraction 31 4.2.3 Cluster Analysis 32 4.2.4 Content Extraction 35 4.3 Evaluation 38 Chapter 5. Conclusions 41 5.1 Concluding Remarks 41 5.2 Future works 42 REFERENCES 44 Appendix I. Francis and Kucera’s Stop-list 48 Appendix II. Representative Q/A contents 51 LIST OF FIGURES Figure 2-1 Dendrogram 11 Figure 2-2 Example of non-hierarchical clustering 12 Figure 2-3 Two-Stage clustering method 14 Figure 3-1 Framework of the proposed approach 18 Figure 3-2 Typical Q/A pair in Usenet 19 Figure 3-3 Framework of text processing 20 Figure 3-4 Framework of keyword extraction 23 Figure 3-5 Apriori algorithm 24 Figure 3-6 Synonyms in context of keywords generated by Apriori algorithm 25 Figure 3-8 Adaptation process of SOM 28 Figure 3-9 Clustering results by SOM 29 Figure 4-1 SOM results 34 Figure 4-2 Representative Q/A for cluster 2 36 Figure 4-3 An organized Q/A in the FAQ 37 LIST OF TABLES Table 3-1 Nouns and noun phrases extracted by Link Grammar Parser 21 Table 3-2 Meaningful nouns and noun phrases 21 Table 3-3 Representative nouns and noun phrases 22 Table 3-4 Keywords extracted from texts on neural-networks 25 Table 3-5 Example of input vectors for SOM 27 Table 4-1 Statistics of the dataset 30 Table 4-2 Statistics of number of terms in each process 32 Table 4-3 Results of keyword extraction 32 Table 4-4 Vector form of some answer texts 33 Table 4-5 Parameters specified in SOM 33 Table 4-6 Cluster results 34 Table 4-7 Text vectors in cluster 2 35 Table 4-8 Topic discussed in each cluster 38 Table 4-9 Predefined categories 39 Table 4-10 Experimental results on recall and precision 39

參考文獻 References
Anderberg, M.R., 1973, Cluster Analysis for Applications. Academic Press, Inc. Agrawal, R. and Srikant, R., 1993 “Fast Algorithms for Mining Association Rules in Large Databases.” In Proceedings 1994 International Conference on Very Large Data Bases, pages 487-499, Santiago, Chile, Sept. Deerwester, S., Susan, T., Dumais, George, W., Furnas, Thomas, K., Landauer and Harshman, R.A., 1990, “Indexing by Latent Semantic Analysis.” JASIS 41, no. 6:391-407. Hammond, K., Burke, R., Martin, C., and Lytinen, S., 1995, “FAQ Finder: A Case-Based Approach to Knowledge Navigation.” In Proceedings of the 11th Conference on Artificial Intelligence for Applications, 80-86. Los Alamitos, CA, USA: IEEE Comput. Soc. Press. Hwang, C. W., 1999, “A Neural Network Document Classifier with Linguistic Feature Selection.” M.D, Dissertation, National Taiwan University of Science and Technology. Jardine, N., and Sibson, R., 1971, Mathematical Taxonomy. London: Wiley. Juha, V., Johan, H., Esa, A., and Juha, P., 2000, “SOM Toolbox for Matlab 5.” Helsinki University of Technology. Kaufman, L. and Rousseeuw, P. J., 1990, “Finding Groups in Data: An Introduction to Cluster Analysis.” New York: John Wiley & Sons. Kohonen, T., 1995, Self-Organizing Maps. Springer, Berlin. Kucera, H., and Francis, W. N., 1967, “Computational Analysis of Present-Day American English.” Providence, Rhode Island: Brown University Press. Lam, W. and Ho, C. Y., 1998, “Using A Generalized Instance Set for Automatic Text categorization.“ Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,” pp. 81-89. Lesk, A. and Michael, E., 1964, “The SMART Automatic Text Processing and Document Retrieval System.” Report ISR-8, sec. II. Harvard Computation Laboratory, Cambridge, Massachusetts. Lin, X., Soergel, D., and Marchionini, G., 1991, “A Self-organizing Semantic Map for Information Retrieval.” In Proceedings of the 14th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval, pages 262-269. MacQueen, J., 1967, “Some Methods for Classification and Analysis of Multivariate Observations.” Proc. 5th Berkeley Symp. Math. Statist, Prob., 1:281-297. Nasukawa, T., 2001, “Text Analysis and Knowledge Mining System.” IBM Systems Journal issue 40-4, Knowledge Management. Ng, R. and Han, J., 1994, “Efficient and Effective Clustering Method for Spatial Data Mining.” In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pages 144-155, Santiago, Chile, Sept. Ng, H.T., Goh, W. B., and Low, K. L., 1997, “Feature Selection, Perception Learning, and A Usability Case Study for Text Categorization.” In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp67-73. Porter, M. F., 1980, “An algorithm for Suffix Stripping.” Program 14:130-137. Punj, G.. and Stewart, D. W., 1983, “Cluster Analysis in Marketing Research: Review and Suggestions for Application.” Journal of Marketing Research, Vol.20, pp.137-148. Quillian, M. R., 1968, “Semantic Memory.” In Semantic Information Processing, ed. Marvin Minsky, 216-270. Cambridge, Mass.: MIT Press. Ritter, H and Kohonen, T., 1989, “Self-organizing Semantic Maps.” Biological Cybernetics, 61, 241-254. Robertson, S. and Jones, K. S., “Relevance Weighting of Search Terms.” Journal of the American Society for Information Science, Vol. 27, No. 3, 1976. Rocchio, J.J., 1971, Jr. “Relevance Feedback in Information Retrieval.” Chap. 14 in The SMART retrieval system-Experiments in automatic document processing, ed. G. Salton, pp. 313-323. Englewood Cliffs, New Jersey: Prentice-Hall. Rocchio, J.J., 1965, Jr. “Relevance Feedback in Information Retrieval.” Scientific report ISR-9, sec. 23, Harvard Computation Laboratory, Cambridge, Massachusetts. Salton, G., 1964, “A Flexible Automatic System for the Organization, Storage, and Retrieval of Language Data (SMART).” Reprot ISR-5, sec. I. Harvard Computation Laboratory, Cambridge, Massachusetts. Salton, G., 1983, Introduction to Modern Information Retrieval, McGraw-Hill. Salton, G.., ed. 1971a, “The SMART Retrieval System-Experiments in Automatic Document Processing.” Englewood Cliffs, New Jersey: Prentice-Hall. Salton, G.. and Yang, C. S., 1973, “”On the Specification of Term Values in Automatic Indexing.” Journal of Documentation, 29(4), 351-72. Schutze, H. and Pedersen, J., 1994, “A Cooccurrence-based Thesaurus and Two Applications to Information Retrieval.” In proceedings of Intelligent Multimedia Information Retrieval Systems (RIAO ’94, New York, NY), 266-274. Sleator, D. and Temperley, D., 1993, “Parsing English with a Link Grammar.” Third International Workshop on Parsing Technologies. Sneath, P., 1957, “The Application of Computers to Taxonomy.” Journal of General Microbiology, Vol. 17, pp.201-226. Sneiders, E., 1999 “Automated FAQ Answering on WWW Using Shallow Language Understanding.” Thesis in partial fulfillment of the requirements for the degree of Licentiate of Technology, Dept. of Computer and Systems Sciences, Stockholm University / Royal Institute of Technology, Sweden. Soergel, D., 1974, “Automatic and Semi-Automatic Methods as an Aid in the Construction of Indexing Languages and Thesauri.” Intern. Classif. 1(1), 34-39. Van Rijsbergen, C.J., 1979, “Information Retrieval.” 2d ed. London: Butterworths. Verhoeff, J., William, G.. and Belzer, J., 1961, “Using the Cosine Measure in a Neural Network for Document Retrieval.” In Perceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, pp. 202-210. Vesanto, J. and Alhonieme, E., 2000, “Clustering of the Self-Organizing Map.” IEEE Transactions on Neural Networks, Vol.11, 2000, pp.586-600. Voorhees, E. M., 1986a, “The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval.” Ph.D. thesis, Cornell University. Voorhees, E. M. and Harman, D., 1997, “Overview of the Sixth Text Retrieval Conference (TREC-6).” In Proceedings of the 6th Text Retrieval Conference (TREC-6), NIST Special Publication 500-240. Ward, J. H. Jr., 1963, “Hierarchical Grouping to Optimize an Objective Function.” Journal of American Statistical Association, Vol.69, pp. 236-244. Wen, J. R., Nie, J.Y., and Zhang, H. J., January 2002, “Query Clustering by Using User Logs.” ACM Transactions on Information Systems, Vol. 20, No. 1, Pages 59–81. Whitehead, S. D., 1995, “Auto-FAQ: an Experiment in Cyberspace Leveraging.” Computer Networks and ISDN Systems, Vol. 28, No. 1-2: 137-146. Xu, J., and Croft. W.B., 2000, “Improving the Effectiveness of Informational Retrieval with Local Context Analysis.” ACM Transactions on Information Systems, Vol. 18, No. 1, January 2000, pp. 79-112.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內一年後公開，校外永不公開 campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.220.106.241 論文開放下載的時間是校外不公開 Your IP address is 18.220.106.241 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS