Responsive image
博碩士論文 etd-0306116-140216 詳細資訊
Title page for etd-0306116-140216
論文名稱
Title
偵測值得注意的病症描述之研究
The Research on the Detection of Noteworthy Symptom Descriptions
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
62
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2014-07-23
繳交日期
Date of Submission
2016-04-07
關鍵字
Keywords
機器學習、中文斷詞系統、文字探勘
Text mining, CKIP, LIBSVM, Classification
統計
Statistics
本論文已被瀏覽 5948 次,被下載 39
The thesis/dissertation has been browsed 5948 times, has been downloaded 39 times.
中文摘要
由於手機科技的蓬勃發展,醫生可以藉由病人的自我診斷訊息來追蹤病人的病情狀況。即便如此,過多的工作量造成醫生過於忙碌,而無法隨時檢視病人 的自我診斷訊息。因此,必須從這些大量的訊息中,找出真正需要被注意的訊息 內容,減輕醫生的負擔,以及確保可以完善的追蹤病人的病情。
此篇研究中,我們提出一個文字探勘的方法,用來辨識訊息中所描述的症 狀,以及相關的情緒分析。我們發現,值得注意的簡訊內容可以利用情緒屬性、 比較屬性、行政內容屬性來特徵化。我們建構一個預測模型來辨識訊息,找出值 得關注的訊息內容。從實驗中發現,不同的屬性如何影響預測模型,以及提出有 效的方法來辨別有價值的訊息內容。
Abstract
The advance of cell phone and technology create a convenient way to connect
doctors and patients. Doctors can keep track of patients’ situations by their self-report messages. Nevertheless, doctors are usually busy and these incoming messages may cause information overloading to them. Thus, how to find messages that need the doctors to pay more attention is imperative.
In this thesis, we propose an approach that applies text-mining technologies to identify symptoms conveyed in the messages and their associated sentiment orientation, as well as other factors. We find that noteworthy messages can be characterized by sentiment features, comparison features and administration features. We then construct a prediction model to identify messages that are noteworthy to the doctors. We show from our experiments that the different features have different impact on the performance of the prediction model, and our proposed approach can identify the noteworthy messages effectively.
目次 Table of Contents
CHAPTER 1- Introduction 1
1.1 Background 1
1.2 Motivation 2
CHAPTER 2- Literature Review 4
2.1 Mining of medical data 4
2.2 Different Levels of Analysis 5
2.3 Aspect-based Sentiment Analysis 6
2.4 Dictionary-based Approach 9
2.5 Word-Segmentation and Tools 9
2.6 Sentiment Shifter 10
2.7 LIBSVM 13
CHAPTER 3- Problem Definitions 15
3.1 Messages Description 15
3.2 Features Classification 18
CHAPTER 4- Our Approach 22
4.1 The Process of the Research 22
4.2 Data Pre-Processing 23
4.3 Word-Segmentation and Part-of-Speech tagging 24
4.4 Rules of Getting Aspect Words and Sentiment Words 26
4.5 Vectorization and Classification 33
CHAPTER 5- Performance Evaluation 38
5.1 Experiment Design 38
5.2 Reliability Analysis and Pearson Correlation 39
5.3 Logistic Regression 41
5.4 Performance of Classification 45
CHAPTER 6- Conclusion 50
References 52
參考文獻 References
Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G. A., & Reynar, J. (2008). Building a sentiment summarizer for local service reviews. Paper presented at the WWW Workshop on NLP in the Information Explosion Era.

Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

Chang, P.-C., Tseng, H., Jurafsky, D., & Manning, C. D. (2009). Discriminative reordering with Chinese grammatical relations features. Paper presented at the Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation.

Councill, I. G., McDonald, R., & Velikovich, L. (2010). What's great and what's not: learning to classify the scope of negation for improved sentiment analysis. Paper presented at the Proceedings of the workshop on negation and speculation in natural language processing.

de Albornoz, J. C., Plaza, L., & Gervás, P. (2012). SentiSense: An easily scalable concept-based affective lexicon for sentiment analysis. Paper presented at the LREC.

Guo, H., Zhu, H., Guo, Z., Zhang, X., & Su, Z. (2009). Product feature categorization with multilevel latent semantic association. Paper presented at the Proceedings of the 18th ACM conference on Information and knowledge management.

Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.

Huang, J., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., & Huang, S.-W. (2010). Chinese Wordnet: design, implementation, and application of an infrastructure for cross-lingual knowledge processing. Journal of Chinese Information Processing, 24(2), 14-23.

Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. Paper presented at
the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1.

Kobayashi, N., Iida, R., Inui, K., & Matsumoto, Y. (2006). Opinion Mining on the Web by Extracting Subject-Aspect-Evaluation Relations. Paper presented at the AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

Lan, G.-C., Lee, C.-H., Lee, Y.-Y., Tseng, V. S., Chin, C.-Y., Day, M.-L., . . . Wu, J.-S. (2012). Disease Risk Prediction by Mining Personalized Health Trend Patterns: A Case Study on Diabetes. Paper presented at the Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on.

Levi, A., Mokryn, O., Diot, C., & Taft, N. (2012). Finding a needle in a haystack of reviews: cold start context-based hotel recommender system. Paper presented at the Proceedings of the sixth ACM conference on Recommender systems.

Levy, R., & Manning, C. (2003). Is it harder to parse Chinese, or the Chinese Treebank? Paper presented at the Proceedings of the 41st Annual Meeting on Association
for Computational Linguistics-Volume 1.

Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.

Ma, W.-Y., & Chen, K.-J. (2003). Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. Paper presented at the Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17.

Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform, 35, 128-144.

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2), 1-135.

Sleator, D. D., & Temperley, D. (1995). Parsing English with a link grammar. arXiv preprint cmp-lg/9508004.

Thomas, L., & Steyvers, M. Prediction and semantic association.
W3Schools. PHP levenshtein() Function. from
http://www.w3schools.com/php/func_string_levenshtein.asp

Zhou, X., Han, H., Chankai, I., Prestrud, A., & Brooks, A. (2006). Approaches to text mining for clinical medical records. Paper presented at the Proceedings of the 2006 ACM symposium on Applied computing.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code