Responsive image
博碩士論文 etd-0722108-155145 詳細資訊
Title page for etd-0722108-155145
論文名稱
Title
使用資料探勘技術挖掘線上論壇討論活動型態
Discovering Discussion Activity Flows in an On-line Forum Using Data Mining Techniques
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
136
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2008-06-13
繳交日期
Date of Submission
2008-07-22
關鍵字
Keywords
決策樹、文本分類、隱馬可夫模型、文本探勘、資料探勘、內容管理系統、學習管理系統、支持向量機
Support Vector Machine (SVM), Content Management System (CMS)., Text classification, Learning Management System (LMS), Decision tree, Data mining, Text mining, Hidden Markov Model (HMM)
統計
Statistics
本論文已被瀏覽 5736 次,被下載 1231
The thesis/dissertation has been browsed 5736 times, has been downloaded 1231 times.
中文摘要
隨著網際網路(Internet)時代來臨,愈來愈多學校課程使用課程管理系統(CMS, course management system)或學習管理系統(LMS, learning management system)來教學或輔助教學。為了幫助學生在網路上有效的學習,教師必須知道學生在線上論壇從事那些討論的活動,並且在必要的時候,提供學生所需協助。現今網路教學系統普遍化的結果,更增加老師們參與線上論壇的工作負擔;為減輕教師工作負荷,設計出可協助教師了解討論活動的自動化工具,成為一項重要的工作。本研究呼應這項需求,提出一個可以在課程管理系統或學習管理系統中,協助教師追蹤線上論壇討論活動流程的自動化工具,我們稱此工具為FAFT (Forum Activity Flow Tracer)。
本研究採用資料探勘(data mining)及本文探勘(text ining)技術來發展FAFT 系統。FAFT 系統依其功能可分為,討論活動分類子系統(AC, activity classification)及活動流程探勘子系統(AFD, activity flow discovery)。一般而言,論壇上的一篇文章可以把它歸類為聲明、提問、澄清、解釋(演繹)、詰問、辯護和其它,這六類活動中的一類。討論活動分類子系統採用資料(本文)探勘技術以自動化方式完成每一篇文章活動的分類工作。本文以高中地球科學課程的論壇資料為例,進行實證研究;研究結果顯示,討論活動分類子系統,能有效完成討論活動分類工作。而活動流程探勘子系統採用隱馬爾可夫模型(hidden Markov model)來發覺討論活動流程。由於隱馬爾可夫模型可以方便地以圖形化的方式呈現,故能幫助教師更容易了解學生討論活動。同時也可應用隱馬爾可夫模型為預測模型的特性,來分辨學生的討論活動流程是屬於認知性(cognitive presence)的活動流程,亦或是社交性(social presence)的活動流程。這樣的預測有益於教師採取相對應的措施,來引導學生學習活動。實證結果顯示活動流程探勘子系統,可以有效完成分辨學生活動流程的工作。
因此,我們認為本研究所提的 FAFT 系統,可以協助教師追蹤線上論壇的討論活動流程。
Abstract
In the Internet era, more and more courses are taught through a course management system (CMS) or learning management system (LMS). In an asynchronous virtual learning environment, an instructor has the need to beware the progress of discussions in forums, and may intervene if ecessary in order to facilitate students’ learning. This research proposes a discussion forum activity flow tracking system, called FAFT (Forum Activity Flow Tracer), to utomatically monitor the discussion activity flow of threaded forum postings in CMS/LMS. As CMS/LMS is getting popular in facilitating learning activities, the proposedFAFT can be used to facilitate instructors to identify students’ interaction types in discussion forums.
FAFT adopts modern data/text mining techniques to discover the patterns of forum discussion activity flows, which can be used for instructors to facilitate the online learning activities. FAFT consists of two subsystems: activity classification (AC) and activity flow discovery (AFD). A posting can be perceived as a type of announcement, questioning, clarification, interpretation, conflict, or assertion. AC adopts a cascade model to classify various activitytypes of posts in a discussion thread. The empirical evaluation of the classified types from a repository of postings in earth science on-line courses in a senior high school shows that AC can effectively facilitate the coding rocess, and the cascade model can deal with the imbalanced distribution nature of discussion postings.
AFD adopts a hidden Markov model (HMM) to discover the activity flows. A discussion activity flow can be presented as a hidden Markov model (HMM) diagram that an instructor can adopt to predict which iscussion activity flow type of a discussion thread may be followed. The empirical results of the HMM from an online forum in earth science subject in a senior high school show that FAFT can effectively predict the type of a discussion activity flow. Thus, the proposed FAFT can be embedded in a course management system to automatically predict the activity flow type of a discussion thread, and in turn reduce the teachers’ loads on managing online discussion forums.
目次 Table of Contents
Abstract ..................................................................................................................... I
Keywords ................................................................................................................. II
中文摘要.................................................................................................................III
關鍵詞.....................................................................................................................IV
Table of Contents ..................................................................................................... V
List of Tables ...........................................................................................................IX
List of Figures .........................................................................................................XI
Chapter 1 Introduction ..............................................................................................1
1.1 Motivation ...................................................................................................1
1.2 The Proposed Approach Meeting the Need.................................................6
1.3 Organization of the Thesis ........................................................................ 11
Chapter 2 Background.............................................................................................13
2.1 Learning Activity Flow .............................................................................14
2.1.1 An activity flow example ...............................................................16
2.1.2 Learning in a computer media communication (CMC) environment
.................................................................................................................18
2.2 Learning Management System (LMS)......................................................21
2.3 Text Mining Process..................................................................................23
2.4 Classification in Text Mining ....................................................................28
2.4.1 Decision tree classifiers..................................................................29
2.4.2 Support vector machines (SVM) classifiers...................................30
2.4.3 Imbalanced data distribution issue .................................................31
2.5 Mining Forum Activity Flows...................................................................34
Chapter 3 The Architecture of Forum Activity Flow Tracer (FAFT)......................42
3.1 FAFT Architecture.....................................................................................43
3.2 Activity Classification (AC) subsystem ....................................................44
3.2.1 AC implementation ........................................................................47
3.3 Activity Flow Discovery (AFD) subsystem ..............................................49
3.3.1 AFD implementation ......................................................................51
Chapter 4 Evaluation Design...................................................................................52
4.1 Data Set and Activity Type Coding...........................................................53
4.2 Evaluation Criteria ....................................................................................59
4.2.1 Evaluation criteria for AC ..............................................................59
4.2.2 Evaluation criteria for AFD............................................................61
Chapter 5 Evaluation Results and Discussion.........................................................64
5.1 Evaluation Results of AC Subsystem........................................................64
5.1.1 Results of decision tree classifiers .................................................65
5.1.2 Results of SVM classifiers .............................................................75
5.1.3 Results of the cascade model classifier ..........................................78
5.1.4 Discussion of AC subsystem ..........................................................87
5.2 Evaluation Results of AFD Subsystem .....................................................92
5.2.1 Results of the discovery and prediction of activity flow types ......92
5.2.2 Discussion of the Results of AFD subsystem.................................97
Chapter 6 Conclusion and Research Limitations ..................................................102
6.1 Conclusion...............................................................................................102
6.2 Research Limitations...............................................................................105
References .............................................................................................................108
Appendix A. Examples of Forum Discussion Posts and Corresponding Activity
Types ..................................................................................................................... 113
Appendix B. Evaluation Results of AC................................................................. 116
參考文獻 References
An, G. (1996). The Effects of Adding Noise During Backpropagation Training on a
Generalization Performance. Neural Computation, 8(3), 643-674.
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A Maximization Technique
Occurring in the Statistical Analysis of Probabilistic Functions of Markov
Chains. The Annals of Mathematical Statistics, 41(1), 164-171.
Berge, Z., & Collins, M. (1995). Computer mediated communication and the
online classroom: overview and perspectives (Vol. 1, pp. 129-137). NJ:
Hampton Press.
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning database.
Inf. Comput. Sci., Univ. California, Dept., Irvine.[Online]. Available:
http://www. ics. uci. edu/mlearn/MLRepository. html.
Bloehdorn, S., & Hotho, A. (2004). Boosting for text classification with semantic
features. Proc. of the Mining for and from the Semantic Web Workshop at
KDD, 2004.
Blunsom, P. (2004). Hidden Markov Models. Retrieved on July 15, 2008, from
http://www.cs.mu.oz.au/460/2004/materials/hmm-tutorial.pdf.
Brace-Govan, J. (2003). A method to track discussion forum activity: The
Moderators' Assessment Matrix. The Internet and Higher Education, 6(4),
303-325.
Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Editorial: special issue on
learning from imbalanced data sets. ACM SIGKDD Explorations
Newsletter, 6(1), 1-6.
Church, K. W., & Gale, W. A. (1995). Inverse document frequency (IDF): A
measure of deviations from Poisson. Proceedings of the Third Workshop on
Very Large Corpora, 121–130.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning,
20(3), 273-297.
Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector
Machines and Other Kernel-based Learning Methods (1st ed. pp. 189).
Cambridge University Press.
Dalziel, J. (2003). Implementing Learning Design: The Learning Activity
Management System (LAMS). on ASCILITE (pp. 1-10).
Do, M. N. (2003). Fast approximation of Kullback-Leibler distance for dependence
trees and hidden Markov models. Signal Processing Letters, IEEE, 10(4),
115-118.
Dougiamas, M., & Taylor, P. C. (2002). Interpretive analysis of an internet-based
course constructed using a new courseware tool called Moodle. 2 nd
Conference of HERDSA (The Higher Education Research and Development
Society of Australasia), 7-10.
Dragomir, R., Weiguo, R., & Zhu, F. (2001). Webinessence: A personalized
web-based multidocument summarization and recommendation system.
Retrieved on Dec. 3, 2007, from
http://citeseer.ist.psu.edu/dragomir01webinessence.html.
Fawcett, T., & Provost, F. (1997). Adaptive Fraud Detection. Data Mining and
Knowledge Discovery, 1(3), 291-316.
François, J. M. (2005). Jahmm–A HMM implementation in Java. 2005.
Garrison, Anderson, & Archer. (1999). Critical Inquiry in a Text-Based
Environment: Computer Conferencing in Higher Education. The Internet
and Higher Education, 2(2-3), 87-105. doi:
10.1016/S1096-7516(00)00016-6.
Garrison, D. R., Anderson, T., & Archer, W. (2001). Critical thinking and computer
conferencing: A model and tool to assess cognitive presence. American
Journal of Distance Education, 15(1), 7-23.
Grant, C. A., & Sleeter, C. E. (2006). Turning on Learning: Five Approaches for
Multicultural Teaching Plans for Race, Class, Gender and Disability.
Jossey-Bass, An Imprint of Wiley, 352.
Hewitt, J. (2004). An exploration of community in a knowledge forum classroom:
an activity system analysis. Designing for Virtual Communities in the
Service of Learning, 210-238.
Home - LAMS Documents - Confluence. Retrieved on Jan. 9, 2008, from
http://wiki.lamsfoundation.org/display/lamsdocs/Home.
Hornick, M. F., Marcadé, E., & Venkayala, S. (2006). Java Data Mining: Strategy,
Standard, and Practice: A Practical Guide for architecture, design, and implementation (1st Ed., pp. 544). Morgan Kaufmann.
Huang, X., & Hon, H. W. (2001). Spoken Language Processing: A Guide to Theory,
Algorithm, and System Development. Prentice Hall PTR Upper Saddle
River, NJ, USA.
IWS. (2006). Taiwan Internet and Telecommunications Market Reports. Retrieved
on Apr. 18, 2008, from http://www.internetworldstats.com/asia/tw.htm.
Japkowicz. (2000). Learning from imbalanced data sets: a comparison of various
strategies. AAAI Workshop on Learning from Imbalanced Data Sets, 00-05.
Japkowicz, & Stephen. (2002). The class imbalance problem: A systematic study.
Intelligent Data Analysis, 6(5), 429-449.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy
estimation and model selection. Proceedings of the Fourteenth
International Joint Conference on Artificial Intelligence, 2, 1137–1145.
Kosala, R., & Blockeel, H. (2000). Web mining research: a survey. ACM SIGKDD
Explorations Newsletter, 2(1), 1-15.
Krishnamurthy, V., & Moore, J. B. (1993). On-line estimation of hidden Markov
model parameters based on the Kullback-Leibler information measure.
IEEE Transactions on Signal Processing, 41(8), 2557-2573.
Krogh'f, A., & Brown, I. (1994). Hidden Markov Models in Computational
Biology. J. Mol. Bioi, 235, 1501-1531.
Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine Learning for the Detection
of Oil Spills in Satellite Radar Images. Machine Learning, 30(2), 195-215.
Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for
supervised learning. Proceedings of the Eleventh International Conference
on Machine Learning, 148–156.
Ma, W., & Chen, K. (2003). Introduction to CKIP Chinese word segmentation
system for the first international Chinese Word Segmentation Bakeoff.
(pp. 168-171). Sapporo, Japan: Association for Computational Linguistics.
Mazzolini, M. (2007). When to jump in: The role of the instructor in online
discussion forums. Computers & Education, 49(2), 193-213.
Mitchell, T. (1997). Machine Learning (pp. 52-78). The McGraw-Hill Companies, Inc.
Moodle (2007) - A Free, Open Source Course Management System for Online
Learning. Retrieved on Nov. 7, 2007, from http://moodle.org/.
Murthy, S. K. (1998). Automatic Construction of Decision Trees from Data: A
Multi-Disciplinary Survey. Data Mining and Knowledge Discovery, 2(4),
345-389.
Nickerson, Japkowicz, & Milios. (2001). Using unsupervised learning to guide
re-sampling in imbalanced data sets. Proceedings of the Eighth
International Workshop on AI and Statitsics, 261–265.
Papert, S. (1991). Situating Constructionism. Constructionism, 1-11.
Pena-Shaff, J. B., & Nicholls, C. (2004). Analyzing student interactions and
meaning construction in computer bulletin board discussions. Computers &
Education, 42(3), 243-265.
Peng, F., Huang, X., Schuurmans, D., & Wang, S. (2003). Text Classification in
Asian Languages without Word Segmentation. Proceedings of the sixth
international workshop on Information retrieval with Asian
languages-Volume 11, 41-48.
Platt, J. (1999a). Fast training of support vector machines using sequential minimal
optimization. Advances in Kernel Methods-Support Vector Learning,
185–208.
Platt, J. C. (1999b). Fast training of support vector machines using sequential
minimal optimization, Advances in kernel methods: support vector learning.
MIT Press, Cambridge, MA.
Quinlan, J. R. (1996). Improved Use of Continuous Attributes in C4.5. Journal of
Aritficial Intelligent Research, 4(1), 77-90.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
Quinlan, J. R. (1993). C4. 5: Programs for Machine Learning. Morgan Kaufmann.
Rabiner, L., & Juang, B. (1986). An introduction to hidden Markov models. ASSP
Magazine, IEEE, 3(1), 4-16.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected
applications inspeech recognition. Proceedings of the IEEE, 77(2),
257-286.
Rosen, L. (2008). Open Source Licensing: Software Freedom and Intellectual
Property Law. Free software license. Retrieved on Apr. 14, 2008, from
http://en.wikipedia.org/wiki/Free_software_license.
Rourke, L., Anderson, T., Garrison, D. R., & Walter, A. (1999). Assessing Social
Presence In Asynchronous Text-based Computer Conferencing. Journal of
Distance Education, 14(2).
Rovai, A. P. (2000). Building and sustaining community in asynchronous learning
networks. The Internet and Higher Education, 3(4), 285-297.
Schrire, S. (2003). A Model for Evaluating the Process of Learning in
Asynchronous Computer Conferencing. Journal of Instruction Delivery
Systems, 17(1), 6-12, .
Scott, S., & Matwin, S. (1999). Feature engineering for text classification.
Proceedings of ICML-99, 16th International Conference on Machine
Learning, 379–388.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM
Computing Surveys, 34(1), 1-47.
Sudman, D., Ulowetz, J., Singhi, D., & Pajerski, M. (1997). Apparatus and method
for generating and presenting an audiovisual lesson plan. Google Patents.
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. Springer.
Welch, L. R. (2003). Hidden markov models and the baum-welch algorithm. IEEE
Information Theory Society Newsletter, 53(4).
Witten, I. H., & Frank, E. (2005). Data Mining: Practical Machine Learning Tools
and Techniques, Second Edition (2nd Ed., pp. 560). Morgan Kaufmann.
Yang, S. C., & Tung, C. (2007). Comparison of Internet addicts and non-addicts in
Taiwanese high school. Computers in Human Behavior, 23(1), 79-96. doi:
10.1016/j.chb.2004.03.037.
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text
categorization. Proceedings of the Fourteenth International Conference on
Machine Learning, 97, 412-420.
Yoon, Lee, & Lee. (2005). Systematic Construction of Hierarchical Classifier in
SVM-Based Text Categorization. Natural Language Processing – IJCNLP
2004. Retrieved on Jan. 15, 2008, from
http://www.springerlink.com/content/9f0r032myrdwvke4/fulltext.pdf.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code