政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/54649
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113392/144379 (79%)
造访人次 : 51192557      在线人数 : 909
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/54649


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/54649


    题名: 以規則為基礎的分類演算法:應用粗糙集
    A Rule-Based classification algorithm: a rough set approach
    作者: 廖家奇
    Liao, Chia Chi
    贡献者: 徐國偉
    Hsu, Kuo Wei
    廖家奇
    Liao, Chia Chi
    关键词: 資料探勘
    分類
    粗糙集
    規則學習
    規則歸納
    data mining
    classification
    rough set
    rule learning
    separate-and-conquer
    rule induction
    日期: 2011
    上传时间: 2012-10-30 11:28:18 (UTC+8)
    摘要: 在本論文中,我們提出了一個以規則為基礎的分類演算法,名為ROUSER(ROUgh SEt Rule),它利用粗糙集理論作為搜尋啟發的基礎,進而建立規則。我們使用一個已經被廣泛利用的工具實作ROUSER,也使用數個公開資料集對它進行實驗,並將它應用於真實世界的案例。
    本論文的初衷可被追溯到一個真實世界的案例,而此案例的目標是從感應器所蒐集的資料中找出與機械故障之間的關聯。為了能支援機械故障的根本原因分析,我們設計並實作了一個以規則為基礎的分類演算法,它所產生的模型是由人類可理解的決策規則所組成,而故障的徵兆與原因則被決策規則所連結。此外,資料中存在著矛盾。舉例而言,不同時間點所蒐集的兩筆紀錄極為相似、甚至相同(除了時間戳記),但其中一筆紀錄與機械故障相關,另一筆則否。本案例的挑戰在於分析矛盾的資料。我們使用粗糙集理論克服這個難題,因為它可以處理不完美知識。
    研究者們已經提出了各種不同的分類演算法,而實踐者們則已經將它們應用於各種領域,然而多數分類演算法的設計並不強調演算法所產生模型的可解釋性與可理解性。ROUSER的設計是專門從名目資料中萃取人類可理解的決策規則。而ROUSER與其它多數規則分類演算法不同的地方是利用粗糙集方法選取特徵。ROUSER也提供了數種方式來選擇合宜的屬性與值配對,作為規則的前項。此外,ROUSER的規則產生方法是基於separate-and-conquer策略,因此比其它基於粗糙集的分類演算法所廣泛採用的不可分辨矩陣方法還有效率。
    我們進行延伸實驗來驗證ROUSER的能力。對於名目資料的實驗裡,ROUSER在半數的結果中的準確率可匹敵、甚至勝過其他以規則為基礎的分類演算法以及決策樹分類演算法。ROUSER也可以在一些離散化的資料集之中達到可匹敵甚至超越的準確率。我們也提供了內建的特徵萃取方法與其它方法的比較的實驗結果,以及數種用來決定規則前項的方法的實驗結果。
    In this thesis, we propose a rule-based classification algorithm named ROUSER (ROUgh SEt Rule), which uses the rough set theory as the basis of the search heuristics in the process of rule generation. We implement ROUSER using a well developed and widely used toolkit, evaluate it using several public data sets, and examine its applicability using a real-world case study.
    The origin of the problem addressed in this thesis can be traced back to a real-world problem where the goal is to determine whether a data record collected from a sensor corresponds to a machine fault. In order to assist in the root cause analysis of the machine faults, we design and implement a rule-based classification algorithm that can generate models consisting of human understandable decision rules to connect symptoms to the cause. Moreover, there are contradictions in data. For example, two data records collected at different time points are similar, or the same (except their timestamps), while one is corresponding to a machine fault but not the other. The challenge is to analyze data with contradictions. We use the rough set theory to overcome the challenge, since it is able to process imperfect knowledge.
    Researchers have proposed various classification algorithms and practitioners have applied them to various application domains, while most of the classification algorithms are designed without a focus on interpretability or understandability of the models built using the algorithms. ROUSER is specifically designed to extract human understandable decision rules from nominal data. What distinguishes ROUSER from most, if not all, other rule-based classification algorithms is that it utilizes a rough set approach to select features. ROUSER also provides several ways to decide an appropriate attribute-value pair for the antecedents of a rule. Moreover, the rule generation method of ROUSER is based on the separate-and-conquer strategy, and hence it is more efficient than the indiscernibility matrix method that is widely adopted in the classification algorithms based on the rough set theory.
    We conduct extensive experiments to evaluate the capability of ROUSER. On about half of the nominal data sets considered in experiments, ROUSER can achieve comparable or better accuracy than do classification algorithms that are able to generate decision rules or trees. On some of the discretized data sets, ROUSER can achieve comparable or better accuracy. We also present the results of the experiments on the embedded feature selection method and several ways to decide an appropriate attribute-value pair for the antecedents of a rule.
    參考文獻: [1] J. G. Bazan, H. S. Nguyen, S. H. Nguyen, P. Synak, J. Wróblewski, and blewski, "Rough set algorithms in classification problem," in Rough set methods and applications, ed: Physica-Verlag GmbH, 2000, pp. 49-88.
    [2] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming: Athena Scientific, 1996.
    [3] W.W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int`l Conf. Machine Learning (ICML), pp. 115-123, 1995.
    [4] C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, pp. 273-297, 1995.
    [5] J. Dai, Q. Xu, and W. Wang, "A comparative study on strategies of rule induction for incomplete data based on rough set approach," International Journal of Advancements in Computing Technology, vol. 3, p. 176–183, 2011.
    [6] U. M. Fayyad, K. B. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning”, presented at the Proceedings of 13th international joint conference on Artificial intelligence, 1022-1027, 1993.
    [7] J. Fürnkranz, "Separate-and-Conquer Rule Learning," Artif. Intell. Rev., vol. 13, pp. 3-54, 1999.
    [8] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," SIGKDD Explor. Newsl., vol. 11, pp. 10-18, 2009.
    [9] L. Huan and R. Setiono, "Chi2: feature selection and discretization of numeric attributes," in Tools with Artificial Intelligence, 1995. Proceedings of IEEE Seventh International Conference on, 1995, pp. 388-391.
    [10] J. C. Huhn and E. Hullermeier, "FR3: A Fuzzy Rule Learner for Inducing Reliable Classifiers," Fuzzy Systems, IEEE Transactions on, vol. 17, pp. 138-149, 2009.
    [11] W. Jiabing, Z. Pei, W. Guihua, and W. Jia, "Classifying Categorical Data by Rule-Based Neighbors," in Data Mining (ICDM), 2011 IEEE 11th International Conference on, 2011, pp. 1248-1253.
    [12] X. Jin, A. Xu, R. Bie, and P. Guo, "Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles Data Mining for Biomedical Applications." vol. 3916, J. Li, Q. Yang, and A.-H. Tan, Eds., ed: Springer Berlin / Heidelberg, 2006, pp. 106-115.
    [13] M. T. Mitchell, Machine Learning, 1997 :McGraw-Hill
    [14] G. Pagallo and D. Haussler, "Boolean Feature Discovery in Empirical Learning," Machine Learning, vol. 5, pp. 71-99, 1990.
    [15] Z. Pawlak, "Some Issues on Rough Sets,” Transactions on Rough Sets I, vol. 3100, J. Peters, A. Skowron, J. Grzymala-Busse, B. Kostek, R. Swiniarski, and M. Szczuka, Eds., ed: Springer Berlin / Heidelberg, 2004, pp. 1-58.
    [16] Z. Pawlak, A. Skowron, "Rudiments of rough sets", Information Sciences, vol.177, no.1, pp.3-27, 2007.
    [17] J. R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, pp. 81-106, 1986.
    [18] J. R. Quinlan, C4.5: programs for machine learning: Morgan Kaufmann Publishers Inc., 1993.
    [19] J. Stefanowski and K. Slowiński, "Rough Set Theory and Rule Induction Techniques For Discovery of Attribute Dependencies in Medical Information Systems,” Principles of Data Mining and Knowledge Discovery. vol. 1263, J. Komorowski and J. Zytkow, Eds., ed: Springer Berlin / Heidelberg, 1997, pp. 36-46.
    [20] S. M. Weiss and N. Indurkhya, "Reduced complexity rule induction," presented at the Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2, Sydney, New South Wales, Australia, 1991.
    [21] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. McLachlan, A. Ng, B. Liu, P. Yu, Z.-H. Zhou, M. Steinbach, D. Hand, and D. Steinberg, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.
    [22] L. A. Zadeh, "Fuzzy Sets," Information and Control, vol. 8, pp. 338–353, 1965.
    [23] Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
    [24] "Data Mining Curriculum". ACM SIGKDD. 2006-04-30. Retrieved 2011-10-28.
    描述: 碩士
    國立政治大學
    資訊科學學系
    99753005
    100
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0099753005
    数据类型: thesis
    DOI 連結: http://dx.doi.org/10.1109/CyberneticsCom.2012.6381605
    DOI: 10.1109/CyberneticsCom.2012.6381605
    显示于类别:[資訊科學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    300501.pdf1877KbAdobe PDF21027检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈