Responsive image
博碩士論文 etd-0829106-130820 詳細資訊
Title page for etd-0829106-130820
論文名稱
Title
文件分類於電子化政府之應用:以政府機關市長信箱民眾陳情案件為例
Text Categorization for E-Government Applications: The Case of City Mayor’s Mailbox
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
55
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2006-07-03
繳交日期
Date of Submission
2006-08-29
關鍵字
Keywords
電子化政府、支援向量機、決策樹歸納法、文件分類
Decision Tree Induction, Support Vector Machines, E-government, Text categorization
統計
Statistics
本論文已被瀏覽 5728 次,被下載 14
The thesis/dissertation has been browsed 5728 times, has been downloaded 14 times.
中文摘要
近年來國內各政府機關皆採用電子信箱,或民意信箱類似的服務,以供民眾利用網路來傳達對政府機關的意見或詢問,而當民眾透過電子信箱送入陳情案件時,需經過人工分類方可將民眾的意見送至業務所屬的承辦機關辦理,在案件日益增加的情形下,完全仰賴人工分案的處理,將顯得非常沒有效率,所以我們希望可以利用機器自動分類的方式,輔助人工作業,以提高案件處理效率。為了建立一個適合本研究的分類機制,我們利用各種不同方法的組合,希望歸納出一個最佳的分類模型以用於未來對未分類案件的分類處理。

本研究利用文件分類學習程序來建立一個案件分類學習模型,利用民眾送入的中文己知分類的案件來做為訓練案件資料。在文件分類學習的三個主要階段中的每個階段,本研究皆採用兩種方法來進行效能的評估,在特徵值萃取及挑選階段中,將案件經中研院中文斷詞系統斷詞後,利用x2取最大值法及加權平均法來挑選特徵值集合,在案件表示階段分別以TFxIDF及Binary來表示,而在歸納處理階段則採用決策樹及SVM歸納法來分別進行分類分析並產出分類模型。實證結果顯示特徵值挑選階段使用x2取最大值、案件表示階段以Binary為表示法,歸納處理階段採用SVM為一最佳的分類預測模型,可獲得77.28%的辨識正確率,而召回率及精確率也同樣有大於77%的成效,顯示此案件分類模型能有效的預測案件所屬類別,具有實務應用價值。
Abstract
The central government and most of local governments in Taiwan have adopted the e-mail services to provide citizens for requesting services or expressing their opinions through Internet. Traditionally, these requests/opinions need to be manually classified into appropriate departments for service rendering. However, due to the ever-increasing number of requests/opinions received, the manual classification approach is time consuming and becomes impractical. Therefore, in this study, we attempt to apply text categorization techniques for constructing automatically a classification mechanism in order to establish an efficient e-government service portal.

The purpose of this thesis is to investigate effectiveness of different text categorization methods in supporting automatic classification of service requests/opinions emails sent to Mayor’s mailbox. Specifically, in each phase of text categorization learning, we adopt and evaluate two methods commonly employed in prior research. In the feature selection phase, both the maximal x2 statistic method and the weighted average x2 statistic method of x2 statistic are evaluated. We consider the Binary and TFxIDF representation schemes in the document representation phase. Finally, we adopt the decision tree induction technique and the support vector machines (SVM) technique for inducing a text categorization model for our target e-government application. Our empirical evaluation results show that the text categorization method that employs the maximal x2 statistic method for feature selection, the Binary representation scheme, and the support vector machines as the underlying induction algorithm can reach an accuracy rate of 77.28% and an recall and precision rates of more than 77%. Such satisfactory classification effectiveness suggests that the text categorization approach can be employed to establish an effective and intelligent e-government service portal.
目次 Table of Contents
第一章、緒論 1
 第一節、研究背景 1
 第二節、研究動機與目的 4
 第三節、論文結構 6
第二章、文獻探討 8
 第一節、文件分類技術 8
 第二節、中文文件處理 18
第三章、建構民眾陳情案件分類模型 20
 第一節、民眾陳情案件分類學習模型的建立 20
 第二節、特徵值萃取和挑選階段 21
 第三節、案件表示階段 25
 第四節、歸納階段 26
第四章、實證評估 27
 第一節、資料來源 27
 第二節、評估準則與程序 28
 第三節、實證結果分析 29
第五章、結 論 37
 第一節、研究結論及貢獻 37
 第二節、未來研究方向 38
參考文獻 40
附錄 45
參考文獻 References
(一) 中文部分
1.行政院研究發展考核委員會,行政院資訊發展推動小組,「政府業務電腦化報告書」,台北:行政院研究發展考核委員會,初版,1998年。
2.陳祥,林明童,「我國『電子化政府整合型入網站』使用者行為分析」,圖書館學與資訊科學,第28卷第2期,2002年。
3.陳敦源,蕭乃沂,「台北市政府接受人民施政意見反應機制之研究」,臺北市政府研究發展考核委員會,2001年。
4.黃東益,蕭乃沂,陳敦源,「網路時代公民直接參與機制:台北市政府『市長信箱』的個案研究」,政治與資訊研討會,佛光人文社會學院, 宜蘭,2002年。
5.莊孟杰,「從民眾關係管理看市長電子信箱滿意度調查」,國立中山大學公共事務管理研究所碩士論文,2004年。
6.齊玉美,「不對稱性分類分析之研究」,國立中山大學資訊管理研究所碩士論文,2003年。
7.林家誼,「應用類神經網路文件自動分類技術建構電子化知識文件管理系統」,國立清華大學工業工程與工程管理研究所碩士論文,2004年。
8.韓歆儀,「應用兩階段分類法提昇SVM法之分類準確率」,國立成功大學工業與資訊管理研究所碩士論文,2004年。
9.黃純敏,吳郁瑩,「網路中文文件自動摘要」,TANET’99研討會,國立中山大學,高雄,1999年。
10.黃佳新,「關鍵字擷取與文件分類之因子分析」,國立清華大學工業工程與工程管理研究所碩士論文,2004年。
11.吳信德,「以相關性辭典建構為基礎實現複合關鍵字之概念查詢擴張」,私立元智大學資訊工程研究所碩士論文,2003年。

(二) 英文部分
1.Apte, C., Damerau, F., and Weiss, S., “Automated Learning of Decision Rules for Text Categorization,” ACM Transactions of Information Systems, Vol. 12, No. 3, 1994, pp. 233-251.
2.Bellamy, C. and Taylor, J., Governing in the Information Age, Open University Press, Buckingham, U.K., 1998.
3.Cover, T. M. and Hart, P. E., “Nearest Neighbor Pattern Classification,” IEEE Transactions on Information Theory, Vol. 13, No. 1, 1967, pp. 21-27.
4.Dumais, S., Platt, J., Heckerman, D., and Sahami, M., “Inductive Learning Algorithms and Representation for Text Categorization,” Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM’98), Bethesda, MD, 1998, pp. 148-155.
5.Gunn, S. R., “Support Vector Machines for Classification and Regression,” Technical Report, Department of Electronics and Computer Science, University of Southampton, 1998.
6.Hastie, T. and Tibshirani R., “Classification by Painvise Coupling,” Technical Report, Department of Statistics, Stanford University, 1996.
7.Lam, W. and Ho, C. Y., “Using A Generalized Instance Set for Automatic Text Categorization,” Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), Melbourne, Australia, 1998, pp. 81-89.
8.Larkey, L. and Croft, W., “Combining Classifiers in Text Categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), Zurich, Switzerland, 1996, pp. 289-297.
9.Larsen, B. and Aone, C., “Fast and Effective Text Mining Using Linear-time Document Clustering,” Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, 1999, pp. 16-22.
10.Lewis, D. and Ringuette, M., “A Comparison of Two Learning Algorithms for Text Categorization,” Proceedings of 3rd Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, 1994, pp. 81-93.
11.Ng, H. T., Goh, W. B., and Low, K. L., “Feature Selection, Perception Learning, and A Usability Case Study for Text Categorization,” ACM SIGIR Forum, Vol. 31, No. SI, 1997, pp. 67-73.
12.Platt, J. C., “Fast Training of Support Vector Machines Using Sequential Minimal Optimization,” Chapter 12 in Advances in Kernel Methods–Support Vector Learning, Schölkopf, B., Burges, C., and Smola, A. (Eds.), MIT Press, Cambridge, MA, 1998, pp. 185-208.
13.Platt, J. C., “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines,” Technical Report MST-TR-98-14, Microsoft Research, 1998.
14.Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
15.Robertson, S. E. and Sparck-Jones, K., “Relevance Weighting of Search Terms,” Journal of the American Society for Information Science, Vol. 27, No. 3, 1976, pp. 129-146.
16.Roussinov, D. and Chen, H., “Document Clustering for Electronic Meetings: An Experimental Comparison of Two Techniques,” Decision Support Systems, Vol. 27, No. 1-2, 1999, pp. 67-79.
17.Sebastiani, F., “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, Vol. 34, No. 1, 2002, pp. 1-47.
18.Tsay, J. and Wang, J., “Improving Automatic Chinese Text Categorization by Error Correction,” Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages, Hong Kong, China, 2000, pp. 1-8.
19.Tomek, I., “Two Modifications of CNN,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 6, 1976, pp. 769-772.
20.Vapnik, V., “An Overview of Statistical Learning Theory,” IEEE Transactions on Neural Networks, Vol. 10, No.5, 1999, pp. 988-999.
21.Wei, C., Hu, P., and Dong, Y. X., “Managing Document Categories in E-commerce Environments: An Evolution-based Approach,” European Journal of Information Systems, Vol. 11, No. 3, 2002, pp. 208-222.
22.Wei, C., Lin, Y. T., and Yang, C. C., “Cross-Lingual Text Categorization: Conquering Language Boundaries in Globalized Environments,” Working Paper, Institute of Technology Management, National Tsing Hua University, Taiwan, ROC, 2005.
23.Weiss, S. M., Apte, C., Damerau, F. J., Johnson, D. E., Oles, F. J., Goetz, T., and Hampp, T. “Maximizing Text-mining Performance,” IEEE Intelligence Systems, Vol. 14, No. 4, 1999, pp. 63-69.
24.Wiener, W., Pedersen, J. O., and Weigend, A. S. “A Neural Network Approach to Topic Spotting,” Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR’95), Las Vegas, NV, 1995, pp. 317-332.
25.Witten, I. H. and Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations, Morgan Kaufmann, San Francisco, CA. 2000.
26.Yang, Y., “Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval,” Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), Dublin, Ireland, 1994, pp. 13-22.
27.Yang, Y. and Chute, C. G., “An Example-based Mapping Method for Text Categorization and Retrieval,” ACM Transaction on Information Systems, Vol. 12, No. 3, 1994, pp. 252-277.
28.Yang, Y. and Liu, X., “A Re-examination of Text Categorization Methods,” Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), Berkeley, CA, 1999, pp. 42-49.
29.Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., and Liu, X., “Learning Approaches for Detecting and Tracking News Events,” IEEE Intelligent Systems, Vol. 14, No. 4, 1999, pp. 32-43.
30.Yang, Y. and Pedersen, J. O., “A Comparative Study on Feature Selection in Text Categorization,” Proceedings of the 14th International Conference on Machine Learning, 1997, pp. 412-420.

(三) 網頁部分
1.行政院研究發展考核委員會,電子化政府推動方案(90 至93 年度),2001年,http://www.rdec.gov.tw/home/egov.htm。
2.中文詞知識庫小組,「中文句結構樹建立原則」,中文斷詞系統,中央科學研究院,http://ckipsvr.iis.sinica.edu.tw/。
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內一年後公開,校外永不公開 campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 44.203.219.117
論文開放下載的時間是 校外不公開

Your IP address is 44.203.219.117
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code