Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/77885
|
Title: | 運用記憶體內運算於智慧型健保院所異常查核之研究 A Research into In-Memory Computing Techniques for Intelligent Check of Health-Insurance Fraud |
Authors: | 湯家哲 Tang, Jia Jhe |
Contributors: | 姜國輝 Chiang, Johannes K. 湯家哲 Tang, Jia Jhe |
Keywords: | 異常健保院所 記憶內運算 Apache Spark Benford’s Law 機器學習演算法 Illegal Medical Institutions In-Memory Computing Apache Spark Benford’s Law Machine Learning Algorithms |
Date: | 2015 |
Issue Date: | 2015-08-24 10:10:27 (UTC+8) |
Abstract: | 我國全民健保近年財務不佳,民國98年收支短絀達582億元。根據中央健康保險署資料,截至目前為止,特約醫事服務機構違規次數累積達13722次。在所有重大違規事件中,大部分是詐欺行為。
健保審查機制主要以電腦隨機抽樣,再由人工進行調查。然而,這樣的審查方式無法有效抽取到違規醫事機構之樣本,造成審查效果不彰。
Benford’s Law又稱第一位數法則,其概念為第一位數的值越小則該數字出現的頻率越大,反之相反。該方法被應用於會計、金融、審計及經濟領域中。楊喻翔(2012)將Benford’s Law相關指標應用於我國全民健保上,並結合機器學習演算法來進行健保異常偵測。
Zaharia et al. (2012)提出了一種具容錯的群集記憶內運算模式 Apache Spark,在相同的運算節點及資源下,其資料運算效率及速度可勝出Hadoop MapReduce 20倍以上。
為解決健保異常查核效果不彰問題,本研究將採用Benford’s Law,使用國家衛生研究院發行之健保資料計算成為Benford’s Law指標和實務指標,接著並使用支援向量機和邏輯斯迴歸來建構出異常查核模型。然而健保資料量龐大,為加快運算時間,本研究使用Apache Spark做為運算環境,並以Hadoop MapReduce作為標竿,比較運算效率。
研究結果顯示,本研究撰寫的Spark程式運算時間能較MapReduce快2倍;在分類模型上,支援向量機和邏輯斯迴歸所進行的住院資料測試,敏感度皆有80%以上;而所進行的門診資料測試,兩個模型的準確率沒有住院資料高,但邏輯斯迴歸測試結果仍保有一定的準確性,在敏感度仍有75%,整體正確率有73%。
本研究使用Apache Spark節省處理大量健保資料的運算時間。其次本研究建立的智慧型異常查核模型,確實能查核出違約的醫事機構,而模型所查核出可能有詐欺及濫用健保之醫事機構,可進行下階段人工調查,最終得改善健保查核效力。 Financial condition of National Health Insurance (NHI) has been wretched in recent years. The income statement in 2009 indicated that National Health Insurance Administration (NHIA) was in debt for NTD $58.2 billion. According to NHIA data, certain medical institutions in Taiwan violated the NHI laws for 13722 times. Among all illegal cases, fraud is the most serious.
In order to find illegal medical institutions, NHIA conducted random sampling by computer. Once the data was collected, NHIA investigators got involved in the review process. However, the way to get the samples mentioned above cannot reveal the reality.
Benford`s law is called the First-Digit Law. The concept of Benford’s Law is that the smaller digits would appear more frequently, while larger digits would occur less frequently. Benford’s Law is applied to accounting, finance, auditing and economics. Yang(2012) used Benford’s Law in NHI data and he also used machine learning algorithms to do fraud detection.
Zaharia et al. (2012) proposed a fault-tolerant in-memory cluster computing -Apache Spark. Under the same computing nodes and resources, Apache Spark’s computing is faster than Hadoop MapReduce 20 times.
In order to solve the problem of medical claims review, Benford’s Law was applied to this study. This study used NHI data which was published by National Health Research Institutes. Then, we computed NHI data to generate Benford’s Law variables and technical variables. Finally, we used support vector machine and logistics regression to construct the illegal check model. During system development, we found that the data size was big. With the purpose of reducing the computing time, we used Apache Spark to build computing environment. Furthermore, we adopted Hadoop MapReduce as benchmark to compare the performance of computing time.
This study indicated that Apache Spark is faster twice than Hadoop MapReduce. In illegal check model, with support vector machine and logistics regression, we had 80% sensitivity in inpatient data. In outpatient data, the accuracy of support vector machine and logistics regression were lower than inpatient data. In this case, logistics regression still had 75% sensitivity and 73% accuracy.
This study used Apache Spark to compute NHI data with lower computing time. Second, we constructed the intelligent illegal check model which can find the illegal medical institutions for manual check. With the use of illegal check model, the procedure of medical claims review will be improved. |
Reference: | Apache Spark, https://spark.apache.org, 2015
Bhattacharya, Sukanto, Dongming Xu, and Kuldeep Kumar. "An ANN-based auditor decision support system using Benford`s law." Decision support systems 50.3 (2011): 576-584.
Busta, B., & Weinberg, R. "Using Benford’s law and neural networks as a review procedure," Managerial Auditing Journal (13:6) 1998, pp 356-366.
Carlini, Emanuele, et al. "Balanced Graph Partitioning with Apache Spark." Euro-Par 2014: Parallel Processing Workshops. Springer International Publishing, 2014.
Carslaw, Charles APN. "Anomalies in income numbers: Evidence of goal oriented behavior." Accounting Review (1988): 321-327.
Christian, C., and Gupta, S. “New evidence on secondary evasion,” The Journal of the American Taxation Association, 1993, pp 72-92
Coulouris, George F., Jean Dollimore, and Tim Kindberg. Distributed systems: concepts and design. pearson education, 2005.
Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
Dimiduk, Nick, et al. HBase in action. Shelter Island: Manning, 2013.
Glaser, W. A. Paying the doctor: systems of remuneration and their effects Johns Hopkins Press, Baltimore, 1970.
Harnie, D., Vapirev, A., Wegner, J. K., Gedich, A., Steijaert, M., & Wuyts, R. (2015). Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing: Workshop on Clusters, Clouds and Grids for Life Sciences.
Hill, Theodore P. "A statistical derivation of the significant-digit law." Statistical Science (1995): 354-363.
Hill, Theodore P. "The First Digit Phenomenon A century-old observation about an unexpected pattern in many numerical tables applies to the stock market, census statistics and accounting data." American Scientist 86.4 (1998): 358-363.
Kvam, Paul H., and Brani Vidakovic. Nonparametric statistics with applications to science and engineering. Vol. 653. John Wiley & Sons, 2007.
Lin, Chieh-Yen, et al. "Large-scale logistic regression and linear support vector machines using Spark." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014.
Lu, Fletcher, and J. Efrim Boritz. "Detecting fraud in health insurance data: Learning to model incomplete Benford’s law distributions." Machine Learning: ECML 2005. Springer Berlin Heidelberg, 2005. 633-640.
Lu, Fletcher, J. Efrim Boritz, and Dominic Covvey. "Adaptive fraud detection using Benford’s law." Advances in Artificial Intelligence. Springer Berlin Heidelberg, 2006. 347-358.
Mell, Peter, and Tim Grance. "The NIST definition of cloud computing." (2011).
Nigrini, M. J. “A taxpayer compliance application of Benford`s Law,” The Journal of the American Taxation Association, 1996, pp 72-91
Nigrini, M. J. & W. Wood. 1996. Assessing the integrity of tabulated demographic data. Working paper, Saint Mary`s University, Halifax, N.S
Nigrini, M. J. “Using digital frequencies to detect fraud.” The White Paper (April): 3-6. 1996.
Nigrini, Mark J., and Linda J. Mittermaier. "The use of Benford`s law as an aid in analytical procedures." Auditing: A Journal of Practice & Theory 16.2 (1997): 52.
Nigrini, M. J. “Digital Analysis Using Benford’s Law.” Global Audit Publications, Vancouver, B.C., Canada, 2000.
Shvachko, Konstantin, et al. "The hadoop distributed file system." Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. IEEE, 2010.
Solaimani, Mohiuddin, et al. "Statistical technique for online anomaly detection using Spark over heterogeneous data from multi-source VMware performance data." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014.
Sparrow, Malcolm K. Fraud Control in the Health Care Industry: Assessing the State of the Art. US Department of Justice, Office of Justice Programs, National Institute of Justice, 1998.
Thomas, Kurt, et al. "Design and evaluation of a real-time url spam filtering service." Security and Privacy (SP), 2011 IEEE Symposium on. IEEE, 2011.
White, Tom. Hadoop: The definitive guide. " O`Reilly Media, Inc.", 2012.
Wikipedia "Support vector machine," 2015, http://en.wikipedia.org/wiki/Support_vector_machine
Wikipedia "Distributed computing," 2015, http://en.wikipedia.org/wiki/Distributed_computing
Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012
中央健康保險署 "衛生福利部中央健康保險署業務執行報告," 2015, http://www.nhi.gov.tw/webdata/webdata.aspx?menu=17&menu_id=1023&WD_ID=1023&webdata_id=4719
中央健康保險署 "全民健康保險特約醫事服務機構查處統計表," 2015, http://www.nhi.gov.tw/webdata/webdata.aspx?menu=17&menu_id=1023&WD_ID=1023&webdata_id=2401
中央健康保險署 "醫療費用執行報告," 2015, http://www.nhi.gov.tw/webdata/webdata.aspx?menu=17&menu_id=1023&WD_ID=1023&webdata_id=3601
中央健康保險署 "重要統計資料," 2015, http://www.nhi.gov.tw/webdata/webdata.aspx?menu=17&menu_id=1023&WD_ID=1043&webdata_id=805
中央健康保險署 "全民健康保險統計," 2015, http://www.nhi.gov.tw/webdata/webdata.aspx?menu=17&menu_id=1023&WD_ID=1043&webdata_id=3351
郭芷余, 黃馨儀 & 洪振生 "馬光中醫詐健保數百萬," 2014, http://www.appledaily.com.tw/appledaily/article/headline/20140719/35968249/
全民健康保險醫療費用協定委員會 "全民健康保險醫療費用總額支付制度," 2005, http://www.nhi.gov.tw/Resource/webdata/Attach_13636_2_8.2:總額QA手冊第六版含94年.pdf
章殷超 "全民健康保險醫療服務審查問題之探討," 臺灣醫學 (7:1) 2003, pp 104-114.
湯玲郎, & 林信忠 "資料萃取法在健保費用稽核之研究," 醫療資訊雜誌 (11) 2000, pp 85-104.
黃煌雄, 沈美真, & 劉興善 "我國全民健康保險總體檢," 監察院, 2011.
中央健康保險署 "2014-2015 全民健康保險年報," 2015
楊喻翔 "運用Benford定律的智慧型健保費用異常偵測模型之研究," 國立政治大學資訊管理系博士論文, 2012.
趙孟捷 "健保五大花招及最新違規名單," 2014, http://www.thrf.org.tw/Page_Show.asp?Page_ID=1937
蔡明樺 "太扯自診做大腸鏡醫亂掰詐健保," 2014, http://www.appledaily.com.tw/appledaily/article/headline/20140603/35868423/ |
Description: | 碩士 國立政治大學 資訊管理研究所 102356041 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0102356041 |
Data Type: | thesis |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 213 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|