Responsive image
博碩士論文 etd-0612103-091248 詳細資訊
Title page for etd-0612103-091248
論文名稱
Title
流程萃取以偵測醫療詐欺及濫用之研究
A Process Pattern Mining Framework for the Detection of Health Care Fraud and Abuse
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
116
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2003-06-05
繳交日期
Date of Submission
2003-06-12
關鍵字
Keywords
資料探勘、醫療詐欺及濫用、醫療流程
Clinical pathways, Data mining, Health care fraud and abuse
統計
Statistics
本論文已被瀏覽 5952 次,被下載 4725
The thesis/dissertation has been browsed 5952 times, has been downloaded 4725 times.
中文摘要
隨著生活品質的改善與醫療資訊的普及,民眾愈來愈重視身體健康,對醫療資源的使用也日益頻繁,因此,對醫療保險的需求日益高漲。在各國不同的制度下,民眾或透過私人保險的購買,或透過國家整體醫療保險的參與,分擔高額醫療費用的風險,以取得醫療服務。在不同的醫療保險制度中,按量計酬(Fee for Service)是一種常見的費用給付方式。在按量計酬的方式下,病人於醫療機構先取得醫療服務,醫療機構再依據所提供的各項診斷、治療服務,逐項向保險機構提出費用申請。因此,醫療機構如果申報較多的醫療服務,便可能取得較多的給付,而使得按量計酬常成為醫療機構浪費、謊報醫療服務的誘因。面對可能的浪費、詐欺行為,保險機構因而常聘請專家以審查醫療案例。然而,專家審查的方式,耗費大量的時間、人力成本,對於大量的保險案例(例如,國家整體醫療保險),往往無法負荷。本研究著眼於此,引入流程分析的概念,提出整體分析架構與方法,透過系統化、自動化的方式,偵測可能的醫療浪費、詐欺行為。
Abstract
With the intensive need for health insurances, health care service providers’ fraud and abuse have become a serious problem. The practices, such as billing services that were never rendered, performing medically unnecessary services, and misrepresenting non-covered treatments as medically necessary covered treatments, etc, not only contribute to the problem of rising health care expenditure but also affect the health of patients. We are therefore motivated to investigate the detection of service providers’ fraudulent and abusive behavior.

In this research, we introduce the concept of clinical pathways and thereby propose a framework that facilitates automatic and systematic construction of adaptable and extensible detection systems. For the purposes of building such detection systems, we study the problems of mining frequent patterns from clinical instances, selecting features that have more discriminating power and revising detection model to have higher accuracy with less labeled instances.

The performance of the proposed approaches has been evaluated objectively by synthetic data set and real-world data set. Using the real-world data set gathered from the National Health Insurance (NHI) program in Taiwan, the experiments show that our detection model has fairly good prediction power. Comparing to traditional expense driven approach, more importantly, our detection model tends to capture different fraudulent scenarios.

目次 Table of Contents
TABLE OF CONTENTS

ABSTRACT………...………………………………………………………………v
LIST OF FIGURES………………………………………………………………….vi
LIST OF TABLES….………………………………………………………………viii

1 Introduction………………………………………………………………………1
1.1 Motivation………………………………………………………….………1
1.2 Problem statement and the proposed approach………………………………2
1.3 Overview of the research……………………………………………………3

2 The Problem and the related work………………………………………………4
2.1 Health care fraud and abuse………………………………………………….4
2.2 Current status…………………………………………………………………8
2.3 Clinical pathways…………………………………………………………….9
2.4 Research framework………………………………………………………...15

3 Structure pattern discovery……………………………………………………….18
3.1 Related works……………………………………………………………….19
3.2 Formalization of structure pattern discovery problem…………………...…20
3.3 Structure pattern discovery algorithms……………………………………23
3.3.1 TP-Graph algorithm…………………………………………………23
3.3.2 TP-Itemset algorithm……………………………………………...…34
3.3.3 TP-Sequence algorithm……………………………………………...36
3.4 Performance evaluation……………………………………………………41
3.4.1 Generation of synthetic data………………………….………...….41
3.4.2 Effects of minimum support thresholds……………………………43
3.4.3 Effects of instance characteristics………………………………...…45
3.4.4 Scale-up experiments………………………………………………47
3.5 Summary……………………………………………………………………49

4 Feature selection……………………………………………………….…………50
4.1 Related works………………………………………………………………51
4.2 Formalization of feature selection problem……………………………….53
4.3 Feature selection algorithms………………………………………………57
4.4 Performance evaluation……………………………………………………67
4.4.1 Data collection and preprocessing…………………………………...68
4.4.2 Induction method…………………………………………...………69
4.4.3 Evaluation criteria……………………….………………………….71
4.4.4 Evaluation results………………………………………………….72
4.5 Summary……………………………………………………………………78

5 Model Revision……………………………………………………….…………80
5.1 Related works………………………………………………………………81
5.2 Formalization of model revision problem…………………………………84
5.3 Model revision algorithms…..……………………………………………85
5.3.1 Selecting unlabeled examples……………………………………..86
5.3.2 Combining resulting classifiers……………………………………93
5.4 Performance evaluation……………………………………………………94
5.4.1 Data collection and induction algorithms………………………….94
5.4.2 Evaluation results………………………………………………95
5.5 Summary…………………………………………………………………101

6 Conclusion……………………………………………………………………..103
6.1 Summary…………………………………………………………………103
6.2 Contributions………………………………………………………………104
6.3 Limitations………………………………………………………………105
6.4 Future works……………………………………………………………...106

APPENDIX A……………………………………………………………………....108
LIST OF REFERENCES……………………………………………………….109
LIST OF PUBLICATIONS……………………………………………………….115
參考文獻 References
[AD94] H. Almuallim, and T. Dietterich, “Learning Boolean concepts in the presence of many irrelevant features,” Artificial Intelligence, Vol. 69 No. 1-2, 1994.

[AGL98] R. Agrawal, D. Gunopulos, and F. Leymann, “Mining Process Models from Workflow Logs,” Proceedings of the International Conference on Extending Database Technology (EDBT), 1998.

[AL88] D. Angluin, and P. Laird, “Learning from noisy examples,” Machine Learning, Vol. 2, 1988.

[AS94] R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the International conference on Very Large Data Bases, 1994.

[AS95] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proceedings of International Conference on Data Engineering, 1995.

[BD99] K. Bennett and A. Demiriz, “Semi-supervised support vector machines,” Advances in Neural Information Processing Systems, Vol. 11, 1999.

[BL97] A. Blum, and P. Langley, “Selection of relevant features and examples in machine learning,” Artificial Intelligence, 1997.

[BM98] A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” Proceedings of International Conference on Computational Learning Theory, 1998.

[BNHI] The Bureau of National Health Insurance (BNHI). Http://www.nhi.org.tw.

[Brodley93] C. E. Brodley, “Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection,” Proceedings of International Conference on Machine Learning, 1993.

[BWJ98] C. Bettini, X.S. Wang, S. Jajodia, and J.L. Lin, “Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences,” IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, 1998.

[CC02] F. G. Cozman and I. Cohen, “Unlabeled Data Can Degrade Classification Performance of Generative Classifiers,” Proceedings of International Conference on Artificial Intelligence, 2002.

[CF94] R. Caruana, and D. Freitag, “Greedy attribute selection,” Proceedings of International Conference on Machine Learning, 1994.

[CH00] D.J. Cook and L.B. Holder, “Graph-based Data Mining,” IEEE Intelligent Systems, Vol. 15, No. 2, 2000.

[CLR89] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, “Introduction to Algorithms”, MIT Press, 1989.

[Datta98] A. Datta, “Automating the Discovery of AS-IS Business Process Models: Probabilistic and Algorithmic Approaches,” Information Systems Research, Vol. 9 No. 3, 1998.

[DH73] R. Duda, and P. hart, “Pattern Clasification and Scene Analysis,” Wiley, 1973.

[Fukunaga90] K. Fukunaga, “Introduction to Statistical Pattern Recognition,” Academic Press, 1990.

[FW97] C. P. Friedman and J. C. Wyatt, “Evaluation Methods in Medical Informatics,” Springer-Verlag, 1997.

[Glaser91] W. Glaser, “Health insurance in practice: international variations in financing, benefits, and problems,” San Francisco: Jossey-Bass Publisher, 1991.

[Guinane97] C. Guinane, “Clinical care pathways: tools and methods for designing, implementing, and analyzing efficient care practices,” New York: McGraw-Hill, 1997.

[HAIPAP98] L. Healy, M. Ayers, R. Iorio, D. Patch, D. Appleby, and B. Pfeifer, “Impact of a Clinical Pathways and Implant Standardization on Total Hip Arthroplasty,” The Journal of Arthroplasty, Vol. 13 No. 3, 1998.

[Hall96] C. Hall, “Intelligent Data Mining at IBM: New Products and Applications,” Intelligent Software Strategies, Vol. 7 No. 5, 1996.

[HJU90] K. Hogue, C. Jensen, and K. Urban, “The complete guide to health insurance: how to beat the high cost of being sick,” New York: Avon Books, 1990.

[HWGH97] H. He, J. Wang, W. Graco, and S. Hawkins, “Application of Neural Networks to Detection of Medical Fraud,” Expert Systems with Applications, Vol. 13 No. 4, 1997.

[HY02] S. –Y. Hang, and W.-S. Yang, “On the Discovery of Process Models from Their Instances,” Decision Support Systems, Vol. 34 No. 1 , 2002.

[Ireson97] C. Ireson, “Critical Pathways: Effectiveness in Achieving Patient Outcomes,” The Journal of Nursing Administration, Vol. 27 No. 6, 1997.

[JKP94] G. John, R. Kohavi, and K. Pfleger, “Irrelevant features and the subset selection problem,” Proceedings of International Conference on Machine Learning, 1994

[Joachines99] T. Joachines, “Transductive Inference for Text Classification using Support Vector Machines,” Proceedings of International Conference on Machine Learning, 1999.

[JW92] R. Johnson, and D. Wichern, “Applied Multivariate Statistical Analysis,” Englewood Cliffs: Prentice-Hall, 1992.

[KL51] S. Kullback, and R. Leibler, “On information and sufficiency,” Annals of Mathematical Statistics, Vol. 22, 1951.

[KR92] K. Kira and L. Rendell, “The feature selection problem: Traditional methods and a new algorithm,” Proceedings of the Conference on Artificial Intelligence (AAAI), 1992.

[KS96] D. Koller and M. Sahami, “Toward Optimal Feature Selection,” Proceedings of International Conference on Machine Learning, 1996.

[KV94] M. Keans and U. Vazarini, “An introduction to computational learning theory,” MIT Press, 1994.

[Lan00] C. H. Lan, “A Data Mining Technique Combining Fuzzy Sets Theory and Bayesian Classifier- An Application of Auditing the Health Insurance Fee for the National Health Insurance,” a thesis in Yuan-Ze University, 2000.

[Lavrac99] N. Lavrac, “Selected techniques for data mining in medicine,” Artificial Intelligence in Medicine, Vol. 16, 1999.

[LHM98] B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Proceedings of International Conference on Knowledge Discovery and Data Mining, 1998.

[LLM97] M. Lassey, W. Lassey, and M. Jinks, “Health care systems around the world: characteristics, issues, reforms,” Upper Saddle River: Prentice Hall, 1997.

[LS94] P. Langley, and S. Sage, “Induction of selective Bayesian classifiers,” Proceedings of the AAAI Symposium on Relevance, 1994.

[NELH] National Electronic Library for Health. Http://www.nelh.shef.ac.uk

[NG00] K. Nigam and R. Ghani, “Analyzing the effectiveness and applicability in co-training, ” Proceedings of International Conference on Information and Knowledge Management, 2000.

[NHCAA91] “Guidelines to Health Care Fraud,” REPORT, National Health Care Anti-Fraud Association (NHCAA), 1991.

[NHCAA02] “Health Care Fraud: A Serious and Costly Reality for All Americans,” REPORT all_about_hcf, National Health Care Anti-Fraud Association (NHCAA), 2002.
Http://www.nhcaa.org

[NMTM00] K. Nigam, A. Mccalum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents using EM,” Machine Learning, Vol. 34, 2000.

[Pearl88] J. Pearl, “Probabilistic Reasoning in Intelligent Systems,” San Mateo: Morgan Kaufmann, 1988.

[PN89] P. Clark, and T. Niblett, “The CN2 Induction Algorithm”, Machine Learning Journal, Vol. 3 No. 4, 1989.

[Quinlan93] J. Quinlan, “C4.5: Programs for Machine Learning,” Los Altos: Morgan Kaufmann, 1993.

[RHW86] D. Rumelhart, G. Hinton, and R. Williams, “Learning Internal Representations by Error Propagation, Parallel Distributed Processing: Explorations in the Microstructures of Cognition,” MIT Press, 1986.

[SA96] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proceedings of the 5th International Conference on Extending Database Technology (EDBT), 1996.

[SCL99] T. Sung, N. Chang and G. Lee, “Dynamics of Modeling in Data Mining: Interpretive Approach to Bankruptcy Prediction,” Journal of Management Information Systems, Vol. 16 No. 1, 1999.

[SGWRJ01] L. Sokol, B. Garcia, M. West, J. Rodriguez, and K. Johnson, “Precursory Steps to Mining HCFA Health Care Claims,” Proceedings of the Hawaii International Conference on System Sciences, 2001.

[Sokol98] L. Sokol, “Using data mining to support health care fraud detection,” Proceedings of the International Conference on the Practical Application of Knowledge Discovery and Data Mining (PADD), 1998.

[Ting94] K. M. Ting, “The problem of small disjuncts: its remedy in decision trees,” Proceedings of Canadian Conference on Artificial Intelligence, 1994.

[WA96] V. William and B. Archer, “Medicare program: changes to the hospital inpatient prospective payment systems and fiscal year rates,” REPORT RIN: 0938-AH34, The United States General Accounting Office, 1996.
Http://www.gao.gov

[WH99] Y. Wu and T. S. Huang, “Using unlabeled data in supervised learning by discriminate-EM algorithm,” Proceedings of the Workshop on Using Unlabeled Data for Supervised Learning, 1999.

[ZO00] T. Zhang and F. J. Oles, “A probability analysis on the value of unlabeled data for classification problem,” Proceedings of International Conference on Machine Learning, 2000.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code