Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/54649
|
Title: | 以規則為基礎的分類演算法:應用粗糙集 A Rule-Based classification algorithm: a rough set approach |
Authors: | 廖家奇 Liao, Chia Chi |
Contributors: | 徐國偉 Hsu, Kuo Wei 廖家奇 Liao, Chia Chi |
Keywords: | 資料探勘 分類 粗糙集 規則學習 規則歸納 data mining classification rough set rule learning separate-and-conquer rule induction |
Date: | 2011 |
Issue Date: | 2012-10-30 11:28:18 (UTC+8) |
Abstract: | 在本論文中,我們提出了一個以規則為基礎的分類演算法,名為ROUSER(ROUgh SEt Rule),它利用粗糙集理論作為搜尋啟發的基礎,進而建立規則。我們使用一個已經被廣泛利用的工具實作ROUSER,也使用數個公開資料集對它進行實驗,並將它應用於真實世界的案例。 本論文的初衷可被追溯到一個真實世界的案例,而此案例的目標是從感應器所蒐集的資料中找出與機械故障之間的關聯。為了能支援機械故障的根本原因分析,我們設計並實作了一個以規則為基礎的分類演算法,它所產生的模型是由人類可理解的決策規則所組成,而故障的徵兆與原因則被決策規則所連結。此外,資料中存在著矛盾。舉例而言,不同時間點所蒐集的兩筆紀錄極為相似、甚至相同(除了時間戳記),但其中一筆紀錄與機械故障相關,另一筆則否。本案例的挑戰在於分析矛盾的資料。我們使用粗糙集理論克服這個難題,因為它可以處理不完美知識。 研究者們已經提出了各種不同的分類演算法,而實踐者們則已經將它們應用於各種領域,然而多數分類演算法的設計並不強調演算法所產生模型的可解釋性與可理解性。ROUSER的設計是專門從名目資料中萃取人類可理解的決策規則。而ROUSER與其它多數規則分類演算法不同的地方是利用粗糙集方法選取特徵。ROUSER也提供了數種方式來選擇合宜的屬性與值配對,作為規則的前項。此外,ROUSER的規則產生方法是基於separate-and-conquer策略,因此比其它基於粗糙集的分類演算法所廣泛採用的不可分辨矩陣方法還有效率。 我們進行延伸實驗來驗證ROUSER的能力。對於名目資料的實驗裡,ROUSER在半數的結果中的準確率可匹敵、甚至勝過其他以規則為基礎的分類演算法以及決策樹分類演算法。ROUSER也可以在一些離散化的資料集之中達到可匹敵甚至超越的準確率。我們也提供了內建的特徵萃取方法與其它方法的比較的實驗結果,以及數種用來決定規則前項的方法的實驗結果。 In this thesis, we propose a rule-based classification algorithm named ROUSER (ROUgh SEt Rule), which uses the rough set theory as the basis of the search heuristics in the process of rule generation. We implement ROUSER using a well developed and widely used toolkit, evaluate it using several public data sets, and examine its applicability using a real-world case study. The origin of the problem addressed in this thesis can be traced back to a real-world problem where the goal is to determine whether a data record collected from a sensor corresponds to a machine fault. In order to assist in the root cause analysis of the machine faults, we design and implement a rule-based classification algorithm that can generate models consisting of human understandable decision rules to connect symptoms to the cause. Moreover, there are contradictions in data. For example, two data records collected at different time points are similar, or the same (except their timestamps), while one is corresponding to a machine fault but not the other. The challenge is to analyze data with contradictions. We use the rough set theory to overcome the challenge, since it is able to process imperfect knowledge. Researchers have proposed various classification algorithms and practitioners have applied them to various application domains, while most of the classification algorithms are designed without a focus on interpretability or understandability of the models built using the algorithms. ROUSER is specifically designed to extract human understandable decision rules from nominal data. What distinguishes ROUSER from most, if not all, other rule-based classification algorithms is that it utilizes a rough set approach to select features. ROUSER also provides several ways to decide an appropriate attribute-value pair for the antecedents of a rule. Moreover, the rule generation method of ROUSER is based on the separate-and-conquer strategy, and hence it is more efficient than the indiscernibility matrix method that is widely adopted in the classification algorithms based on the rough set theory. We conduct extensive experiments to evaluate the capability of ROUSER. On about half of the nominal data sets considered in experiments, ROUSER can achieve comparable or better accuracy than do classification algorithms that are able to generate decision rules or trees. On some of the discretized data sets, ROUSER can achieve comparable or better accuracy. We also present the results of the experiments on the embedded feature selection method and several ways to decide an appropriate attribute-value pair for the antecedents of a rule. |
Reference: | [1] J. G. Bazan, H. S. Nguyen, S. H. Nguyen, P. Synak, J. Wróblewski, and blewski, "Rough set algorithms in classification problem," in Rough set methods and applications, ed: Physica-Verlag GmbH, 2000, pp. 49-88. [2] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming: Athena Scientific, 1996. [3] W.W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int`l Conf. Machine Learning (ICML), pp. 115-123, 1995. [4] C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, pp. 273-297, 1995. [5] J. Dai, Q. Xu, and W. Wang, "A comparative study on strategies of rule induction for incomplete data based on rough set approach," International Journal of Advancements in Computing Technology, vol. 3, p. 176–183, 2011. [6] U. M. Fayyad, K. B. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning”, presented at the Proceedings of 13th international joint conference on Artificial intelligence, 1022-1027, 1993. [7] J. Fürnkranz, "Separate-and-Conquer Rule Learning," Artif. Intell. Rev., vol. 13, pp. 3-54, 1999. [8] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," SIGKDD Explor. Newsl., vol. 11, pp. 10-18, 2009. [9] L. Huan and R. Setiono, "Chi2: feature selection and discretization of numeric attributes," in Tools with Artificial Intelligence, 1995. Proceedings of IEEE Seventh International Conference on, 1995, pp. 388-391. [10] J. C. Huhn and E. Hullermeier, "FR3: A Fuzzy Rule Learner for Inducing Reliable Classifiers," Fuzzy Systems, IEEE Transactions on, vol. 17, pp. 138-149, 2009. [11] W. Jiabing, Z. Pei, W. Guihua, and W. Jia, "Classifying Categorical Data by Rule-Based Neighbors," in Data Mining (ICDM), 2011 IEEE 11th International Conference on, 2011, pp. 1248-1253. [12] X. Jin, A. Xu, R. Bie, and P. Guo, "Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles Data Mining for Biomedical Applications." vol. 3916, J. Li, Q. Yang, and A.-H. Tan, Eds., ed: Springer Berlin / Heidelberg, 2006, pp. 106-115. [13] M. T. Mitchell, Machine Learning, 1997 :McGraw-Hill [14] G. Pagallo and D. Haussler, "Boolean Feature Discovery in Empirical Learning," Machine Learning, vol. 5, pp. 71-99, 1990. [15] Z. Pawlak, "Some Issues on Rough Sets,” Transactions on Rough Sets I, vol. 3100, J. Peters, A. Skowron, J. Grzymala-Busse, B. Kostek, R. Swiniarski, and M. Szczuka, Eds., ed: Springer Berlin / Heidelberg, 2004, pp. 1-58. [16] Z. Pawlak, A. Skowron, "Rudiments of rough sets", Information Sciences, vol.177, no.1, pp.3-27, 2007. [17] J. R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, pp. 81-106, 1986. [18] J. R. Quinlan, C4.5: programs for machine learning: Morgan Kaufmann Publishers Inc., 1993. [19] J. Stefanowski and K. Slowiński, "Rough Set Theory and Rule Induction Techniques For Discovery of Attribute Dependencies in Medical Information Systems,” Principles of Data Mining and Knowledge Discovery. vol. 1263, J. Komorowski and J. Zytkow, Eds., ed: Springer Berlin / Heidelberg, 1997, pp. 36-46. [20] S. M. Weiss and N. Indurkhya, "Reduced complexity rule induction," presented at the Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2, Sydney, New South Wales, Australia, 1991. [21] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. McLachlan, A. Ng, B. Liu, P. Yu, Z.-H. Zhou, M. Steinbach, D. Hand, and D. Steinberg, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008. [22] L. A. Zadeh, "Fuzzy Sets," Information and Control, vol. 8, pp. 338–353, 1965. [23] Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. [24] "Data Mining Curriculum". ACM SIGKDD. 2006-04-30. Retrieved 2011-10-28. |
Description: | 碩士 國立政治大學 資訊科學學系 99753005 100 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0099753005 |
Data Type: | thesis |
DOI 連結: | http://dx.doi.org/10.1109/CyberneticsCom.2012.6381605 |
DOI: | 10.1109/CyberneticsCom.2012.6381605 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
300501.pdf | 1877Kb | Adobe PDF2 | 1046 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|