國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,偵測值得注意的病症描述之研究,The Research on the Detection of Noteworthy Symptom Descriptions

論文名稱 Title	偵測值得注意的病症描述之研究 The Research on the Detection of Noteworthy Symptom Descriptions
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	104 學年度第 2 學期 The spring semester of Academic Year 104	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	62
研究生 Author	陳鈺翎 Yu-ling Chen
指導教授 Advisor	委員黃三益
召集委員 Convenor	召集人薛幼苓
口試委員 Advisory Committee	委員謝凱生
口試日期 Date of Exam	2014-07-23	繳交日期 Date of Submission	2016-04-07
關鍵字 Keywords	機器學習、中文斷詞系統、文字探勘 Text mining, CKIP, LIBSVM, Classification
統計 Statistics	本論文已被瀏覽 5948 次，被下載 39 次 The thesis/dissertation has been browsed 5948 times, has been downloaded 39 times.

中文摘要
由於手機科技的蓬勃發展,醫生可以藉由病人的自我診斷訊息來追蹤病人的病情狀況。即便如此,過多的工作量造成醫生過於忙碌,而無法隨時檢視病人的自我診斷訊息。因此,必須從這些大量的訊息中,找出真正需要被注意的訊息內容,減輕醫生的負擔,以及確保可以完善的追蹤病人的病情。此篇研究中,我們提出一個文字探勘的方法,用來辨識訊息中所描述的症狀,以及相關的情緒分析。我們發現,值得注意的簡訊內容可以利用情緒屬性、比較屬性、行政內容屬性來特徵化。我們建構一個預測模型來辨識訊息,找出值得關注的訊息內容。從實驗中發現,不同的屬性如何影響預測模型,以及提出有效的方法來辨別有價值的訊息內容。
Abstract
The advance of cell phone and technology create a convenient way to connect doctors and patients. Doctors can keep track of patients’ situations by their self-report messages. Nevertheless, doctors are usually busy and these incoming messages may cause information overloading to them. Thus, how to find messages that need the doctors to pay more attention is imperative. In this thesis, we propose an approach that applies text-mining technologies to identify symptoms conveyed in the messages and their associated sentiment orientation, as well as other factors. We find that noteworthy messages can be characterized by sentiment features, comparison features and administration features. We then construct a prediction model to identify messages that are noteworthy to the doctors. We show from our experiments that the different features have different impact on the performance of the prediction model, and our proposed approach can identify the noteworthy messages effectively.

目次 Table of Contents
CHAPTER 1- Introduction 1 1.1 Background 1 1.2 Motivation 2 CHAPTER 2- Literature Review 4 2.1 Mining of medical data 4 2.2 Different Levels of Analysis 5 2.3 Aspect-based Sentiment Analysis 6 2.4 Dictionary-based Approach 9 2.5 Word-Segmentation and Tools 9 2.6 Sentiment Shifter 10 2.7 LIBSVM 13 CHAPTER 3- Problem Definitions 15 3.1 Messages Description 15 3.2 Features Classification 18 CHAPTER 4- Our Approach 22 4.1 The Process of the Research 22 4.2 Data Pre-Processing 23 4.3 Word-Segmentation and Part-of-Speech tagging 24 4.4 Rules of Getting Aspect Words and Sentiment Words 26 4.5 Vectorization and Classification 33 CHAPTER 5- Performance Evaluation 38 5.1 Experiment Design 38 5.2 Reliability Analysis and Pearson Correlation 39 5.3 Logistic Regression 41 5.4 Performance of Classification 45 CHAPTER 6- Conclusion 50 References 52

參考文獻 References
Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G. A., & Reynar, J. (2008). Building a sentiment summarizer for local service reviews. Paper presented at the WWW Workshop on NLP in the Information Explosion Era. Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27. Chang, P.-C., Tseng, H., Jurafsky, D., & Manning, C. D. (2009). Discriminative reordering with Chinese grammatical relations features. Paper presented at the Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation. Councill, I. G., McDonald, R., & Velikovich, L. (2010). What's great and what's not: learning to classify the scope of negation for improved sentiment analysis. Paper presented at the Proceedings of the workshop on negation and speculation in natural language processing. de Albornoz, J. C., Plaza, L., & Gervás, P. (2012). SentiSense: An easily scalable concept-based affective lexicon for sentiment analysis. Paper presented at the LREC. Guo, H., Zhu, H., Guo, Z., Zhang, X., & Su, Z. (2009). Product feature categorization with multilevel latent semantic association. Paper presented at the Proceedings of the 18th ACM conference on Information and knowledge management. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. Huang, J., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., & Huang, S.-W. (2010). Chinese Wordnet: design, implementation, and application of an infrastructure for cross-lingual knowledge processing. Journal of Chinese Information Processing, 24(2), 14-23. Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. Paper presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Kobayashi, N., Iida, R., Inui, K., & Matsumoto, Y. (2006). Opinion Mining on the Web by Extracting Subject-Aspect-Evaluation Relations. Paper presented at the AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. Lan, G.-C., Lee, C.-H., Lee, Y.-Y., Tseng, V. S., Chin, C.-Y., Day, M.-L., . . . Wu, J.-S. (2012). Disease Risk Prediction by Mining Personalized Health Trend Patterns: A Case Study on Diabetes. Paper presented at the Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on. Levi, A., Mokryn, O., Diot, C., & Taft, N. (2012). Finding a needle in a haystack of reviews: cold start context-based hotel recommender system. Paper presented at the Proceedings of the sixth ACM conference on Recommender systems. Levy, R., & Manning, C. (2003). Is it harder to parse Chinese, or the Chinese Treebank? Paper presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167. Ma, W.-Y., & Chen, K.-J. (2003). Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. Paper presented at the Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17. Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform, 35, 128-144. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2), 1-135. Sleator, D. D., & Temperley, D. (1995). Parsing English with a link grammar. arXiv preprint cmp-lg/9508004. Thomas, L., & Steyvers, M. Prediction and semantic association. W3Schools. PHP levenshtein() Function. from http://www.w3schools.com/php/func_string_levenshtein.asp Zhou, X., Han, H., Chankai, I., Prestrud, A., & Brooks, A. (2006). Approaches to text mining for clinical medical records. Paper presented at the Proceedings of the 2006 ACM symposium on Applied computing.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0306116-140216.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS