國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,從推特內容辨識潛在藥物不良反應,Identifying Potential Adverse Drug Events from Tweets

論文名稱 Title	從推特內容辨識潛在藥物不良反應 Identifying Potential Adverse Drug Events from Tweets
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	62
研究生 Author	林秀穎 Hsiu-ying Lin
指導教授 Advisor	李偉柏 Wei-Po Lee
召集委員 Convenor	蔡玉娟 Yuh-Jiuan Tsay
口試委員 Advisory Committee	楊新章 Hsin-Chang Yang
口試日期 Date of Exam	2017-06-22	繳交日期 Date of Submission	2017-08-21
關鍵字 Keywords	藥物安全監視、監督式學習、文字探勘、副作用、社群媒體、藥物不良反應 Adverse drug reactions, Social media, Text mining, Supervised learning, Pharmacovigilance, Side effect
統計 Statistics	本論文已被瀏覽 5931 次，被下載 44 次 The thesis/dissertation has been browsed 5931 times, has been downloaded 44 times.

中文摘要
由於臨床藥物試驗難以偵測到所有藥物副作用，而副作用對於民眾有相當大的危害，因此如何發現藥物的副作用是許多研究人員亟欲解決的問題。隨著Web2.0的蓬勃發展，越來越多的人會在社群媒體上分享他們的就醫經驗。鑒於社群媒體中患者報告具有臨床與科學價值，現今有許多研究學者致力於透過社群媒體收集相關數據並且提取出具有有效性的藥物不良事件。本研究的目的在於提出一個以機器學習演算法為基礎進行特徵選擇之藥物不良事件分類模型，透過機器學習的方法，自動化地辨識出藥物不良事件。本研究透過Twitter推文文本產生資料集的各種維度與其特徵，以不同的機器學習演算法對資料集進行分類，並透過特徵選擇方法增強分類模型的效能，最後對於各維度之作用進行探討。研究結果顯示：(1) N元語法特徵維度是模型中重要的維度；(2) 同義字詞特徵維度與主題建模特徵維度對於模型會造成干擾；(3) 同義字詞特徵維度是N元語法維度的冗餘特徵；(4) 同義字詞維度分別與叢集維度和主題建模具有關聯性。此外，本研究透過特徵選擇提升了藥物不良事件分類模型的效率與效能。
Abstract
ADRs will cause or prolong hospital admission and result in disability or death. Due to the various limitations of clinical trials, only the most common acute ADRs are usually detected in the pre-marketing phase. There is a desperate need for researchers to find a solution to detect all the ADRs. With the vigorous development of the Web 2.0, an increasing number of patients are sharing their experiences of healthcare on the Internet. Since the clinical and scientific value of patient reports in social media, many research scholars are devoted to collecting relevant data from social media and extracting effective drug adverse events. This study proposes a classification model based on machine learning algorithms, using various feature selection methods to identify drug adverse events automatically. In this research, we generate a large set of features from the dataset, which is consist of annotated tweets sourced from Twitter. The dataset is classified by different machine learning algorithms. Moreover, we enhance the effectiveness of the classification model by feature selection method. Finally, we investigation the contribution of each of the dimensions on classification. The research results indicate that: (1) N-gram is the most important feature dimension on classification; (2) Syn-set and topic vector dimensions will decline the performance of the model; (3) Syn-set is a redundant feature of N-gram; (4) Both topic vector and cluster feature dimensions are correlated with syn-set. In addition, this research improves the efficiency and efficacy of drug adverse event classification model through feature selection.

目次 Table of Contents
論文審定書 i 中文摘要 ii Abstract iii 目錄 iv 圖次 vii 表次 viii 第一章　緒論 1 1.1 研究背景 1 1.1.1 藥物不良反應與藥物安全監視 1 1.1.2 社群媒體 2 1.2 研究動機與目的 3 第二章　文獻探討 5 2.1 藥物不良反應檢測 5 2.2 社群媒體上的藥物安全監視 5 2.3 自然語言處理 6 2.4 效能評估 8 第三章　研究方法 9 3.1 研究架構 9 3.2 資料預處理 9 3.2.1 詞幹提取(Stemming) 11 3.2.2 詞性標記(Part-of-speech Tagging, POST) 11 3.2.3 停用詞(Stopwords) 11 3.3 資料集特徵(Features) 11 3.3.1 N元語法(N-gram) 12 3.3.2 同義詞擴充(Syn-set expansion) 12 3.3.3 情緒(sentiment)字彙 12 3.3.4 叢集(cluster) 14 3.3.5 藥物不良反應語料庫(ADR lexicon) 15 3.3.6 主題建模(Topic modeling) 16 3.3.7 極性(Polarity) 17 3.3.8 其他 18 3.4 分類器 18 3.5 特徵選擇(Feature Selection) 20 3.6 維度探討 22 3.7 評估方法 22 第四章　研究結果 24 4.1 實驗環境 24 4.2 Twitter資料集 24 4.3 分類器選擇 26 4.3.1 樣本擬合與懲罰係數 31 4.4 特徵選擇 32 4.4.1 特徵評價指標選擇 32 4.4.2 特徵選擇重要性測試 36 4.5 維度探討 37 4.5.1 維度重要性 37 4.5.2 維度關聯度 41 4.6 實驗結果討論 45 第五章　結論 46 5.1 研究結論與貢獻 46 5.2 研究限制 47 5.3 未來研究方向 47 參考文獻 48

參考文獻 References
[1] Ahmad, S. R. (2003). Adverse Drug Event Monitoring at the Food and Drug Administration. Journal of General Internal Medicine, 18(1), 57-60. doi: 10.1046/j.1525-1497.2003.20130.x [2] Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC. [3] Basch, E. (2010). The Missing Voice of Patients in Drug-Safety Reporting. New England Journal of Medicine, 362(10), 865-869. doi: 10.1056/NEJMp0911494 [4] Benton, A., Ungar, L., Hill, S., Hennessy, S., Mao, J., Chung, A., . . . Holmes, J. H. (2011). Identifying potential adverse effects using the web: A new approach to medical hypothesis generation. Journal of biomedical informatics, 44(6), 989-996. doi: 10.1016/j.jbi.2011.07.005 [5] Bian, J., Topaloglu, U., & Yu, F. (2012). Towards large-scale twitter mining for drug-related adverse events. Paper presented at the Proceedings of the 2012 international workshop on Smart health and wellbeing, Maui, Hawaii, USA. [6] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. [7] Chee, B. W., Berlin, R., & Schatz, B. (2011). Predicting adverse drug events from personal health messages. Paper presented at the AMIA Annu Symp Proc. [8] Chunara, R., Andrews, J. R., & Brownstein, J. S. (2012). Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. The American journal of tropical medicine and hygiene, 86(1), 39-45. [9] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. [10] Deyo, R. A. (2004). Gaps, tensions, and conflicts in the FDA approval process: implications for clinical practice. The Journal of the American Board of Family Practice, 17(2), 142-149. [11] Fox, S. (2011). The Social Life of Health Information, 2011. from http://www.pewinternet.org/2011/05/12/the-social-life-of-health-information-2011/ [12] Freifeld, C. C., Brownstein, J. S., Menone, C. M., Bao, W., Filice, R., Kass-Hout, T., & Dasgupta, N. (2014). Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Safety, 37(5), 343-350. [13] Freund, Y., & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. Paper presented at the European conference on computational learning theory. [14] Giacomini, K. M., Krauss, R. M., Roden, D. M., Eichelbaum, M., Hayden, M. R., & Nakamura, Y. (2007). When good drugs go bad. Nature, 446(7139), 975-977. [15] Greaves, F., Ramirez-Cano, D., Millett, C., Darzi, A., & Donaldson, L. (2013). Harnessing the cloud of patient experience: using social media to detect poor quality healthcare. BMJ Quality & Safety, bmjqs-2012-001527. [16] Gurulingappa, H., Fluck, J., Hofmann-Apitius, M., & Toldo, L. (2011). Identification of adverse drug event assertive sentences in medical case reports. Paper presented at the First international workshop on knowledge discovery and health care management (KD-HCM), European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD). [17] Hazell, L., & Shakir, S. A. W. (2006). Under-Reporting of Adverse Drug Reactions. Drug Safety, 29(5), 385-396. doi: 10.2165/00002018-200629050-00003 [18] Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. [19] Internet Live Stats. (2017). Twitter Usage Statistics. from http://www.internetlivestats.com/twitter-statistics/ [20] Johnson, J. A., & Bootman, J. L. (1995). Drug-related morbidity and mortality. A cost-of-illness model. Arch Intern Med, 155(18), 1949-1956. [21] Kuhn, M., Campillos, M., Letunic, I., Jensen, L. J., & Bork, P. (2010). A side effect resource to capture phenotypic effects of drugs. Molecular Systems Biology, 6(1). doi: 10.1038/msb.2009.98 [22] Lazarou, J., Pomeranz, B. H., & Corey, P. N. (1998). Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA, 279(15), 1200-1205. [23] Leaman, R., Wojtulewicz, L., Sullivan, R., Skariah, A., Yang, J., & Gonzalez, G. (2010). Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. Paper presented at the Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Uppsala, Sweden. [24] Lindberg, D. A., Humphreys, B. L., & McCray, A. T. (1993). The Unified Medical Language System. Methods Inf Med, 32(4), 281-291. [25] Liu, M., Wu, Y., Chen, Y., Sun, J., Zhao, Z., Chen, X.-w., . . . Xu, H. (2012). Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. Journal of the American Medical Informatics Association, 19(e1), e28-e35. [26] Mao, J. J., Chung, A., Benton, A., Hill, S., Ungar, L., Leonard, C. E., . . . Holmes, J. H. (2013). Online discussion of drug side effects and discontinuation among breast cancer survivors. Pharmacoepidemiology and drug safety, 22(3), 256-262. [27] Moore, T. J., Cohen, M. R., & Furberg, C. D. (2007). Serious adverse drug events reported to the Food and Drug Administration, 1998-2005. Arch Intern Med, 167(16), 1752-1759. doi: 10.1001/archinte.167.16.1752 [28] Nikfarjam, A., & Gonzalez, G. H. (2011). Pattern Mining for Extraction of mentions of Adverse Drug Reactions from User Comments. AMIA Annual Symposium Proceedings, 2011, 1019-1026. [29] Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R., & Gonzalez, G. (2015). Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association, ocu041. [30] Niu, Y., Zhu, X., Li, J., & Hirst, G. (2005). Analysis of polarity information in medical text. Paper presented at the AMIA. [31] Owoputi, O., O'Connor, B., Dyer, C., Gimpel, K., Schneider, N., & Smith, N. A. (2013). Improved part-of-speech tagging for online conversational text with word clusters. [32] Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., & Schneider, N. (2012). Part-of-speech tagging for Twitter: Word clusters and other advances. School of Computer Science. [33] Patki, A., Sarker, A., Pimpalkhute, P., Nikfarjam, A., Ginn, R., Oconnor, K., . . . Gonzalez, G. (2014). Mining Adverse Drug Reaction Signals from Social Media: Going Beyond Extraction. [34] Pierce, C. E., Bouri, K., Pamer, C., Proestel, S., Rodriguez, H. W., Van Le, H., . . . Edwards, I. R. (2017). Evaluation of Facebook and Twitter Monitoring to Detect Safety Signals for Medical Products: An Analysis of Recent FDA Safety Alerts. Drug Safety, 1-15. [35] Plachouras, V., Leidner, J. L., & Garrow, A. G. (2016). Quantifying Self-Reported Adverse Drug Events on Twitter: Signal and Topic Analysis. Paper presented at the Proceedings of the 7th 2016 International Conference on Social Media & Society, London, United Kingdom. [36] Quinlan, J. R. (2014). C4. 5: programs for machine learning: Elsevier. [37] Rumelhart, D. E., & McClelland, J. L. (1985). On learning the past tenses of English verbs: DTIC Document. [38] Sampathkumar, H., Chen, X.-w., & Luo, B. (2014). Mining Adverse Drug Reactions from online healthcare forums using Hidden Markov Model. BMC Medical Informatics and Decision Making, 14(1), 91. doi: 10.1186/1472-6947-14-91 [39] Sarker, A., & Gonzalez, G. (2015). Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of biomedical informatics, 53, 196-207. doi: http://dx.doi.org/10.1016/j.jbi.2014.11.002 [40] Sultana, J., Cutroneo, P., & Trifirò, G. (2013). Clinical and economic burden of adverse drug reactions. Journal of Pharmacology & Pharmacotherapeutics, 4(Suppl1), S73-S77. doi: 10.4103/0976-500X.120957 [41] Whitebread, S., Hamon, J., Bojanic, D., & Urban, L. (2005). Keynote review: in vitro safety pharmacology profiling: an essential tool for successful drug development. Drug discovery today, 10(21), 1421-1433. [42] Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. Paper presented at the Proceedings of the conference on human language technology and empirical methods in natural language processing. [43] Yang, C. C., Yang, H., Jiang, L., & Zhang, M. (2012). Social media mining for drug safety signal detection. Paper presented at the Proceedings of the 2012 international workshop on Smart health and wellbeing, Maui, Hawaii, USA. [44] Yates, A., & Goharian, N. (2013). ADRTrace: Detecting Expected and Unexpected Adverse Drug Reactions from User Reviews on Social Media Sites. In P. Serdyukov, P. Braslavski, S. O. Kuznetsov, J. Kamps, S. Rüger, E. Agichtein, I. Segalovich, & E. Yilmaz (Eds.), Advances in Information Retrieval: 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24-27, 2013. Proceedings (pp. 816-819). Berlin, Heidelberg: Springer Berlin Heidelberg.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0721117-172944.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS