政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/130961
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 113451/144438 (79%)
造訪人次 : 51328706      線上人數 : 876
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/130961
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/130961


    題名: 二元分類的同類別異質性
    Label Heterogeneity in Binary Classification
    作者: 柯百翼
    Ko, Pai-Yi
    貢獻者: 周珮婷
    Chou, Pei-Ting
    柯百翼
    Ko, Pai-Yi
    關鍵詞: 二元分類
    多元分類
    標籤內嵌樹
    Pseudo Likelihood分類器
    類別異質性
    Binary Classification
    Multiclass Classification
    Label Tree
    Pseudo Likelihood Classifier
    Label Heterogeneity
    日期: 2020
    上傳時間: 2020-08-03 17:32:22 (UTC+8)
    摘要: 機器學習上,二元分類為最常見的資料型態,這種資料型態可能存在著同類別異質性的潛在問題,導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異,提升預測分類準確率,透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群,將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後,本研究使用標籤內嵌樹(Label Embedding Tree)與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果,再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果,並且不同的是分類預測結果皆穩定處於一個波動不大的區間內,反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動,因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率,同時能夠透過此研究結構得到穩定的分類預測率。
    Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification.
    參考文獻: 一、 中文參考文獻
    [1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001
    [2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5
    [3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9

    二、 英文參考文獻
    [4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset.
    [5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253
    [6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020.
    [7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A
    [8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022
    [9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013
    [10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0
    [11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4
    描述: 碩士
    國立政治大學
    統計學系
    107354020
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0107354020
    資料類型: thesis
    DOI: 10.6814/NCCU202000962
    顯示於類別:[統計學系] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    402001.pdf2468KbAdobe PDF20檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋