政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/135931
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113451/144438 (79%)
造访人次 : 51245716      在线人数 : 908
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/135931


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/135931


    题名: 階層式分群方法的同質性與穩固性
    Homogeneity and Stability of Hierarchical Clustering
    作者: 林韋志
    Lin, Wei-Chih
    贡献者: 周珮婷
    Chou, Pei-Ting
    林韋志
    Lin, Wei-Chih
    关键词: 非監督機器學習
    階層式分群
    分群驗證
    Unsupervised Machine Learning
    Hierarchical Clustering
    Cluster Validation
    日期: 2021
    上传时间: 2021-07-01 17:34:21 (UTC+8)
    摘要: 現今,驗證分群結果較主流的方法是透過計算各種cluster validation index來檢驗,但是這些指數在類別變數很多的資料時卻不一定能得到合理的答案,因此,本研究利用階層式分群對目標變數建立分群樹,對另一變數則利用歐式距離建立分群樹,再根據兩分群樹繪製熱力圖,從熱力圖的顏色區塊找出資料幾何較相關的群體;接著,利用ANOVA的概念模擬原始資料,並以模擬資料的分群編碼繪製信度直方圖,以呈現群體相似度,進一步驗證階層式分群結果的正確性及穩固性;若信度直方圖所呈現的趨勢與原始分群結果符合,則可判斷分群的結果正確;本研究方法與cluster validation index的差異是我們可以依據熱力圖所呈現的資料幾何結構,在分群樹上的不同高度做切割,找出相關性高的群組,提出檢驗階層式分群結果的信度指標。
    Nowadays, the most popular method of validating clustering results is to verify through various cluster validation indexes. However, these indexes may not get reasonable answers whenever data with a lot of categorical variables. This study aims to provide a stable method to detect the homogeneity and stability of Hierarchical Clustering (HC). Multiple HC trees based on simulated data are built, and the path to each category in a tree is recorded. Histogram based on the coding path of simulated data is built to validate the reliability and stability of the clustering results from HC. The difference between the proposed method and the common cluster validation indexes is that we can rely on the clustering results presented by the heatmap, cut at different heights on the dendrogram to find reasonable and highly relevant groups, and increase the flexibility of the clustering.
    參考文獻: 一、 中文參考文獻
    [1] 張順全 (1999) 類別資料結構的訊息視覺化

    二、 英文參考文獻
    [1] Balcan, M. F., Liang, Y., & Gupta, P. (2014). Robust hierarchical clustering. The Journal of Machine Learning Research, 15(1), 3831-3871.
    [2] Ben-Hur, A., Elisseeff, A., & Guyon, I. (2001). A stability based method for discovering structure in clustered data. In Biocomputing 2002 (pp. 6-17).
    [3] Brock, G., Pihur, V., Datta, S., & Datta, S. (2011). clValid, an R package for cluster validation. Journal of Statistical Software (Brock et al., March 2008).
    [4] Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1), 1-27.
    [5] Carlsson, G. E., & Mémoli, F. (2010). Characterization, stability and convergence of hierarchical clustering methods. J. Mach. Learn. Res., 11(Apr), 1425-1470.
    [6] Chou, E., McVey, C., Hsieh, Y. C., Enriquez, S., & Hsieh, F. (2020). Extreme-K categorical samples problem. arXiv preprint arXiv:2007.15039.
    [7] Dunn, J. C. (1974). A graph theoretic analysis of pattern classification via Tamura`s fuzzy relation. IEEE Transactions on Systems, Man, and Cybernetics, (3), 310-313.
    [8] Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 4(1), 95-104.
    [9] Fushing, H., & Roy, T. (2018). Complexity of possibly gapped histogram and analysis of histogram. Royal Society open science, 5(2), 171026.
    [10] Goodman, L. A., & Kruskal, W. H. (1979). Measures of association for cross classifications. Measures of association for cross classifications, 2-34.
    [11] Rendón, E., Abundez, I., Arizmendi, A., & Quiroz, E. M. (2011). Internal versus external cluster validation indexes. International Journal of computers and communications, 5(1), 27-34.
    [12] Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423.
    [13] Smith, S. P., & Dubes, R. (1980). Stability of a hierarchical clustering. Pattern Recognition, 12(3), 177-187.
    描述: 碩士
    國立政治大學
    統計學系
    108354027
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0108354027
    数据类型: thesis
    DOI: 10.6814/NCCU202100611
    显示于类别:[統計學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    402701.pdf1902KbAdobe PDF20检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈