Responsive image
博碩士論文 etd-0629103-175637 詳細資訊
Title page for etd-0629103-175637
論文名稱
Title
線上問答集輔助建立之研究
The study of Supporting Online FAQ Generation
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
58
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2003-06-20
繳交日期
Date of Submission
2003-06-29
關鍵字
Keywords
線上問答集、資訊分享、新聞群組、群集分析
FAQ, newsgroups, information sharing
統計
Statistics
本論文已被瀏覽 5799 次,被下載 10
The thesis/dissertation has been browsed 5799 times, has been downloaded 10 times.
中文摘要
隨著網際網路的成長,全球性的電子討論版很快地成為一項受歡迎的資訊、知識分享媒介。由新聞群組(newsgroup)管理員將特定領域討論整理而成的線上問答集(FAQ),成為使用者了解該新聞群組討論背景的重要參考依據,或是搜尋該領域問題之答案的主要來源。然而,線上問答集建立整理的過程既耗時且容易出錯;因此,本研究的目的即針對上述的研究議題提出一個線上問答集輔助建立的方法。
我們提出一個四步驟的方法來輔助線上問答集的建立。首先,一篇篇問答文章經過前置處理,進一步擷取其具重要資訊的關鍵字以及關鍵字之間的同義關係;接著運用群集分析來辨識問答集中問題與答案的群集;最後每一群集中具代表性的問題與答案將被擷取出來提供幫助群組管理員整理線上問答集。
我們應用一實際新聞群組上的資料¾類神經網路議題來驗證我們所提出的方法,評估的結果驗證了所提方法的適用性。因此我們所提出的方法不但可以有效地幫助新聞群組管理員整理建立線上問答集,更提供後續研究者一個研究思考的方向。





Abstract
Nowadays, with the radical growth of the Internet, worldwide online discussion forums have become a popular social mechanism for people to learn novel information and knowledge. Frequently asked questions (FAQs), which is a collection of questions commonly asked in the newsgroups along with presumably definitive answers, has become an important reference for readers to understand backgrounds of the newsgroup discussions and to locate their desired answers, if any. The construction of FAQs, however, is prone to errors and time-consuming. Approaches to supporting FAQ generation for administrators are desired to develop.
In this paper, we propose a four-step approach to supporting the FAQ list generation based on question/answer pairs collected from newsgroup discussions without labor-intensive processes. Texts are processed, and keywords along with synonyms in context are extracted from the answer part. Cluster analysis helps to identify the answer clusters and the corresponding question clusters are formed accordingly. Representative contents of the answer clusters and the question clusters are finally extracted to support administrators to generate FAQs.
Our approach is applied in a real-world case where data are collected from the newsgroup in Usenet. FAQ in a primitive form is constructed using our approach. Evaluations are the performed with satisfactory results. The feasibility of our proposed approach is thus justified.





目次 Table of Contents
TABLE OF CONTENTS

CHAPTER1 Introduction 1
1.1 Overview 1
1.2 Objective of the Research 2
1.3 Organization of the Thesis 3
CHAPTER2 Literature Review 4
2.1 Information Retrieval 4
2.1.1 Measures of term significance 4
2.1.2 Boolean retrieval method 6
2.1.3 Vector space model 6
2.1.4 Relevance Feedback 7
2.1.5 Usage of Thesauri 7
2.2 Association Analysis 8
2.3 Cluster Analysis 9
2.3.1 Hierarchical clustering 9
2.3.2 Non-hierarchical Clustering 11
2.3.3 Two-stage Clustering 13
2.4 FAQ Generation 14
CHAPTER3 Supporting FAQ Generation Approach 17
3.1 Text processing 18
3.2 Keyword Extraction 22
3.3 Cluster Analysis 26
3.4 Content Extraction 29
CHAPTER 4 Applications and Results 30
4.1 Data Sources 30
4.2 FAQ generation process 31
4.2.1 Text processing 31
4.2.2 Keyword extraction 31
4.2.3 Cluster Analysis 32
4.2.4 Content Extraction 35
4.3 Evaluation 38
Chapter 5. Conclusions 41
5.1 Concluding Remarks 41
5.2 Future works 42
REFERENCES 44
Appendix I. Francis and Kucera’s Stop-list 48
Appendix II. Representative Q/A contents 51

LIST OF FIGURES
Figure 2-1 Dendrogram 11
Figure 2-2 Example of non-hierarchical clustering 12
Figure 2-3 Two-Stage clustering method 14
Figure 3-1 Framework of the proposed approach 18
Figure 3-2 Typical Q/A pair in Usenet 19
Figure 3-3 Framework of text processing 20
Figure 3-4 Framework of keyword extraction 23
Figure 3-5 Apriori algorithm 24
Figure 3-6 Synonyms in context of keywords generated by Apriori algorithm 25
Figure 3-8 Adaptation process of SOM 28
Figure 3-9 Clustering results by SOM 29
Figure 4-1 SOM results 34
Figure 4-2 Representative Q/A for cluster 2 36
Figure 4-3 An organized Q/A in the FAQ 37

LIST OF TABLES

Table 3-1 Nouns and noun phrases extracted by Link Grammar Parser 21
Table 3-2 Meaningful nouns and noun phrases 21
Table 3-3 Representative nouns and noun phrases 22
Table 3-4 Keywords extracted from texts on neural-networks 25
Table 3-5 Example of input vectors for SOM 27
Table 4-1 Statistics of the dataset 30
Table 4-2 Statistics of number of terms in each process 32
Table 4-3 Results of keyword extraction 32
Table 4-4 Vector form of some answer texts 33
Table 4-5 Parameters specified in SOM 33
Table 4-6 Cluster results 34
Table 4-7 Text vectors in cluster 2 35
Table 4-8 Topic discussed in each cluster 38
Table 4-9 Predefined categories 39
Table 4-10 Experimental results on recall and precision 39


參考文獻 References
Anderberg, M.R., 1973, Cluster Analysis for Applications. Academic Press, Inc.
Agrawal, R. and Srikant, R., 1993 “Fast Algorithms for Mining Association Rules in Large Databases.” In Proceedings 1994 International Conference on Very Large Data Bases, pages 487-499, Santiago, Chile, Sept.
Deerwester, S., Susan, T., Dumais, George, W., Furnas, Thomas, K., Landauer and Harshman, R.A., 1990, “Indexing by Latent Semantic Analysis.” JASIS 41, no. 6:391-407.
Hammond, K., Burke, R., Martin, C., and Lytinen, S., 1995, “FAQ Finder: A Case-Based Approach to Knowledge Navigation.” In Proceedings of the 11th Conference on Artificial Intelligence for Applications, 80-86. Los Alamitos, CA, USA: IEEE Comput. Soc. Press.
Hwang, C. W., 1999, “A Neural Network Document Classifier with Linguistic Feature Selection.” M.D, Dissertation, National Taiwan University of Science and Technology.
Jardine, N., and Sibson, R., 1971, Mathematical Taxonomy. London: Wiley.
Juha, V., Johan, H., Esa, A., and Juha, P., 2000, “SOM Toolbox for Matlab 5.” Helsinki University of Technology.
Kaufman, L. and Rousseeuw, P. J., 1990, “Finding Groups in Data: An Introduction to Cluster Analysis.” New York: John Wiley & Sons.
Kohonen, T., 1995, Self-Organizing Maps. Springer, Berlin.
Kucera, H., and Francis, W. N., 1967, “Computational Analysis of Present-Day American English.” Providence, Rhode Island: Brown University Press.
Lam, W. and Ho, C. Y., 1998, “Using A Generalized Instance Set for Automatic Text categorization.“ Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,” pp. 81-89.
Lesk, A. and Michael, E., 1964, “The SMART Automatic Text Processing and Document Retrieval System.” Report ISR-8, sec. II. Harvard Computation Laboratory, Cambridge, Massachusetts.
Lin, X., Soergel, D., and Marchionini, G., 1991, “A Self-organizing Semantic Map for Information Retrieval.” In Proceedings of the 14th Annual International ACM/SIGIR Conference on Research & Development in Information Retrieval, pages 262-269.
MacQueen, J., 1967, “Some Methods for Classification and Analysis of Multivariate Observations.” Proc. 5th Berkeley Symp. Math. Statist, Prob., 1:281-297.
Nasukawa, T., 2001, “Text Analysis and Knowledge Mining System.” IBM Systems Journal issue 40-4, Knowledge Management.
Ng, R. and Han, J., 1994, “Efficient and Effective Clustering Method for Spatial Data Mining.” In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pages 144-155, Santiago, Chile, Sept.
Ng, H.T., Goh, W. B., and Low, K. L., 1997, “Feature Selection, Perception Learning, and A Usability Case Study for Text Categorization.” In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp67-73.
Porter, M. F., 1980, “An algorithm for Suffix Stripping.” Program 14:130-137.
Punj, G.. and Stewart, D. W., 1983, “Cluster Analysis in Marketing Research: Review and Suggestions for Application.” Journal of Marketing Research, Vol.20, pp.137-148.
Quillian, M. R., 1968, “Semantic Memory.” In Semantic Information Processing, ed. Marvin Minsky, 216-270. Cambridge, Mass.: MIT Press.
Ritter, H and Kohonen, T., 1989, “Self-organizing Semantic Maps.” Biological Cybernetics, 61, 241-254.
Robertson, S. and Jones, K. S., “Relevance Weighting of Search Terms.” Journal of the American Society for Information Science, Vol. 27, No. 3, 1976.
Rocchio, J.J., 1971, Jr. “Relevance Feedback in Information Retrieval.” Chap. 14 in The SMART retrieval system-Experiments in automatic document processing, ed. G. Salton, pp. 313-323. Englewood Cliffs, New Jersey: Prentice-Hall.
Rocchio, J.J., 1965, Jr. “Relevance Feedback in Information Retrieval.” Scientific report ISR-9, sec. 23, Harvard Computation Laboratory, Cambridge, Massachusetts.
Salton, G., 1964, “A Flexible Automatic System for the Organization, Storage, and Retrieval of Language Data (SMART).” Reprot ISR-5, sec. I. Harvard Computation Laboratory, Cambridge, Massachusetts.
Salton, G., 1983, Introduction to Modern Information Retrieval, McGraw-Hill.
Salton, G.., ed. 1971a, “The SMART Retrieval System-Experiments in Automatic Document Processing.” Englewood Cliffs, New Jersey: Prentice-Hall.
Salton, G.. and Yang, C. S., 1973, “”On the Specification of Term Values in Automatic Indexing.” Journal of Documentation, 29(4), 351-72.
Schutze, H. and Pedersen, J., 1994, “A Cooccurrence-based Thesaurus and Two Applications to Information Retrieval.” In proceedings of Intelligent Multimedia Information Retrieval Systems (RIAO ’94, New York, NY), 266-274.
Sleator, D. and Temperley, D., 1993, “Parsing English with a Link Grammar.” Third International Workshop on Parsing Technologies.
Sneath, P., 1957, “The Application of Computers to Taxonomy.” Journal of General Microbiology, Vol. 17, pp.201-226.
Sneiders, E., 1999 “Automated FAQ Answering on WWW Using Shallow Language Understanding.” Thesis in partial fulfillment of the requirements for the degree of Licentiate of Technology, Dept. of Computer and Systems Sciences, Stockholm University / Royal Institute of Technology, Sweden.
Soergel, D., 1974, “Automatic and Semi-Automatic Methods as an Aid in the Construction of Indexing Languages and Thesauri.” Intern. Classif. 1(1), 34-39.
Van Rijsbergen, C.J., 1979, “Information Retrieval.” 2d ed. London: Butterworths.
Verhoeff, J., William, G.. and Belzer, J., 1961, “Using the Cosine Measure in a Neural Network for Document Retrieval.” In Perceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, pp. 202-210.
Vesanto, J. and Alhonieme, E., 2000, “Clustering of the Self-Organizing Map.” IEEE Transactions on Neural Networks, Vol.11, 2000, pp.586-600.
Voorhees, E. M., 1986a, “The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval.” Ph.D. thesis, Cornell University.
Voorhees, E. M. and Harman, D., 1997, “Overview of the Sixth Text Retrieval Conference (TREC-6).” In Proceedings of the 6th Text Retrieval Conference (TREC-6), NIST Special Publication 500-240.
Ward, J. H. Jr., 1963, “Hierarchical Grouping to Optimize an Objective Function.” Journal of American Statistical Association, Vol.69, pp. 236-244.
Wen, J. R., Nie, J.Y., and Zhang, H. J., January 2002, “Query Clustering by Using User Logs.” ACM Transactions on Information Systems, Vol. 20, No. 1, Pages 59–81.
Whitehead, S. D., 1995, “Auto-FAQ: an Experiment in Cyberspace Leveraging.” Computer Networks and ISDN Systems, Vol. 28, No. 1-2: 137-146.
Xu, J., and Croft. W.B., 2000, “Improving the Effectiveness of Informational Retrieval with Local Context Analysis.” ACM Transactions on Information Systems, Vol. 18, No. 1, January 2000, pp. 79-112.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內一年後公開,校外永不公開 campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.220.106.241
論文開放下載的時間是 校外不公開

Your IP address is 18.220.106.241
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code