政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/140753

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 118405/149442 (79%)
Visitors : 78377465 Online Users : 370

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 商學院 > 統計學系 > 學位論文 > Item 140.119/140753

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/140753

Title:	不平衡資料之數據驅動混合監督式學習方法 Data-driven Hybrid Approach for Imbalanced Data in Supervised Learning
Authors:	劉得心 Liu, Te-Hsin
Contributors:	周珮婷 Chou, Pei-Ting 劉得心 Liu, Te-Hsin
Keywords:	不平衡資料監督式學習 PLR 二元分類問題 Imbalanced data Supervised learning PLR Binary classification
Date:	2022
Issue Date:	2022-07-01 16:58:02 (UTC+8)
Abstract:	不平衡資料意指資料中有特定類別的樣本個數特別少，造成各類別比例懸殊，此資料特性易使監督式學習的分類模型在訓練時，無法有效地學習少數類別的特徵，導致模型預測錯誤。為解決此問題，本研究嘗試對監督式學習方法Pseudo-Likelihood Ratio（PLR）進行兩種不同的調整，並分別提出調整後的分類模型；為了探討兩種分類模型在不同不平衡比例下的分類效能，本研究將調整後的兩個分類模型與原始PLR、KNN、SVM三個模型，對不同不平衡比例的資料集進行分類預測，以此比較五種模型在不同不平衡比例下的分類效能。最後研究顯示，本研究針對PLR所提出之改善方法，在不同資料集中的表現有所不同，但整體而言，對提升原始PLR分類效能是有所成效的。 Imbalanced data means that the number of specific categories in the data is very small, resulting in a disparity in the proportion of each category. This data characteristic easily makes the supervised learning classification model unable to effectively learn the features of a few categories during training, resulting in model prediction error. In order to solve this problem, this study attempts to make two different adjustments to the supervised learning method Pseudo-Likelihood Ratio (PLR), and propose the adjusted classification models respectively; in order to explore the classification accuracy of the two classification models under various imbalance ratios, the adjusted two classification models and the original PLR, KNN, and SVM were put into each imbalanced proportion of the five data sets for classification, so as to compare the classification performance of the five models. The result shows that the improvement methods proposed in this study for PLR have different performances in different data sets. Still, on the whole, it is effective in improving the classification performance of the original PLR.
Reference:	Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. Paper presented at the European conference on machine learning. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, pp. 321- 357. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), pp. 273-297. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE. Transactions on Information Theory, 13(1), pp. 21-27 Elizabeth P. Chou & Shan-Ping Yang. (2022). A virtual multi-label approach to imbalanced data classification. Communications in Statistics - Simulation and Computation, DOI: 10.1080/03610918. 2022.2049820 Fushing Hsieh, Elizabeth Chou. (2020). Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics Perspectives of Baseball Pitching Dynamics, Entropy, 23(7), pp. 792, 23, 7-792. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions. on knowledge and data engineering, 21(9), pp. 1263-1284. Hong, X., Chen, S., & Harris, C. J. (2007). A kernel-based two-class classifier for imbalanced data sets. IEEE Transactions on neural networks, 18(1), pp. 28-41. Seliya, N., Khoshgoftaar, T. M., & Hulse, J. V. (2009). A Study on the Relationships of Classifier Performance Metrics. IEEE International Conference on Tools with Artificial Intelligence, pp. 59-66. Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent data analysis, 6(5), pp. 429-449. J. B. Brown. (2018). Classifiers and their Metrics Quantified. Molecular Informatics, 37(1-2), p. 1700127. Kang, P., & Cho, S. (2006). EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems. Paper presented at the International Conference on Neural Information Processing. Raskutti, B., & Kowalczyk, A. (2004). Extreme re-balancing for SVMs: a case study. ACM Sigkdd Explorations Newsletter, 6(1), pp. 60-69. Lee, H.-j., & Cho, S. (2006). The novelty detection approach for different degrees of. class imbalance. Paper presented at the International conference on neural information processing. Liu, Y., An, A., & Huang, X. (2006). Boosting prediction accuracy on imbalanced datasets with SVM ensembles. Paper presented at the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Chawla, N.V., Japkowicz, N., & Kolcz, A. (2004). Editorial: Special Issue on. Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter, 6(1), pp. 1-6. Tang, Y., & Zhang, Y.-Q. (2006). Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. Paper presented at the 2006 IEEE International Conference on Granular Computing. Wang, B. X., & Japkowicz, N. (2008). Boosting support vector machines for imbalanced data sets. Paper presented at the International Symposium on Methodologies for Intelligent Systems. Fan, W., Stolfo, S.J., Zhang, J., & Chan, P.K. (1999). AdaCost: Misclassification Cost-Sensitive Boosting. Proc. Int’l Conf. Machine Learning, pp. 97-105. Zou, K. H., O’Malley, A. J., & Mauri, L. (2007). Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 115(5), pp. 654-657.
Description:	碩士國立政治大學統計學系 109354020
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0109354020
Data Type:	thesis
DOI:	10.6814/NCCU202200484
Appears in Collections:	[統計學系] 學位論文

Files in This Item:

File	Description	Size	Format
402001.pdf		1250Kb	Adobe PDF2	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback