数据加载中.....
|
请使用永久网址来引用或连结此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/139140
|
题名: | 基於生成對抗網路的異質圖神經網路之不平衡節點分類架構 A Framework of Imbalanced Node Classification On Heterogeneous Graph Neural Network With GAN |
作者: | 林庭樂 Lin, Ting-Le |
贡献者: | 王志宇 周珮婷 Wang, Chih-Yu Chou, Pei-Ting 林庭樂 Lin, Ting-Le |
关键词: | 類別不平衡 生成對抗網路 圖神經網路 異質圖 Class Imbalance Generative Adversarial Network Graph Neural Network Heterogeneous Graph |
日期: | 2022 |
上传时间: | 2022-03-01 16:38:46 (UTC+8) |
摘要: | 圖神經網路(Graph Neural Network;GNN)為近年興起的深度學習模型。由於其可以利用圖狀資訊的特性,因此被廣泛運用於各種任務,並且達到極佳的效果。目前的GNN皆預設不同類別的樣本數量一致,然而許多現實中的應用場景為類別不平衡(Class Imbalance)的狀況,所以GNN在該應用場景上無法達到較好的表現。因此處理類別不平衡對GNN為十分重要的課題。 過取樣(Oversampling)為解決類別不平衡的常用技巧,透過複製或合成以創造少量類別的樣本,調整各類別的樣本數量。但過取樣可能造成過擬合的問題,在GNN的應用框架下,新生成的樣本無法正確地與原始資料結合。且異質圖(Heterogeneous Graph)的設定時常出現在現實的應用場景,這也使得建立關聯的問題更加困難。為了解決上述的問題,本文以過取樣的概念為出發點,藉由生成對抗網路(Generative Adversarial Network;GAN)產生近似真實資料的樣本,並建立深度學習模型將新生成的樣本與原始的資料結合。本研究以Amazon評論商品評論資料集為實驗資料。本研究所提出的方法在多項指標的表現明顯優於其餘方法。 Graph Neural Network (GNN) is a Deep Learning-Based model and recently has received a lot of attention. Since its ability to utilize the information of graph-structured data, it is widely used and dominant in various real-world tasks. However, existing GNNs set the sample size of different classes to be balanced. But in the real world, many scenarios are naturally with the characteristic of class imbalance. Therefore, directly applying GNNs to these scenarios may not achieve optimal performance. Consequently, it is crucial to solving the class imbalance problem for GNNs. Oversampling is a common way to solve the class imbalance problem. It increases minority class samples by duplicating or synthesizing to balance the sample size of each class. Yet oversampling may result in overfitting, and synthetic samples cannot add to the original dataset under the framework of GNNs. Furthermore, the heterogeneous graph setting makes generating connections harder which is frequent in real-world applications. In this work, we propose a novel framework that adopts the idea of oversampling to solve the problem described above. It generates samples with GAN (Generative Adversarial Network) instead of duplicating or synthesizing old samples. In addition, it trains Deep Neural Networks to add the synthetic samples to the original dataset. The proposed framework is applied and evaluated on Amazon Reviews datasets. It outperforms all the other baselines on many metrics. |
參考文獻: | [1] Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein Generative Adversarial Networks. Proceedings of the 34th International Conference on Machine Learning, 214-223. [2] Bradley, P., A., (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 30(7), 1145–1159. [3] Chawla, V., N., Bowyer, W, K., Hall, O., L., and Kegelmeyer, W P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of artificial intelligence research, 321-357. [4] Chen, D., Lin, Y., Zhao, G., Ren, X., Li, P., Zhou, J., and Sun, X. (2021). Topology-Imbalance Learning for Semi-Supervised Node Classification. Pre-proceedings of the 34th Advances in Neural Information Processing Systems. [5] Davis, J., Goadrich, M. (2006). The relationship between precision-recall and roc curves. Proceedings of the 23rd International Conference on Machine Learning, 233-240. [6] Ghorbani, M., Kazi, A., Baghshah, S., M., Rabiee, R., H., and Navab, N. (2021). RA-GCN: Graph Convolutional Network for Disease Prediction Problems with Imbalanced Data. arXiv preprint: 2103.00221. [7] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Nets. Proceedings of the 27th International Conference on Neural Information Processing Systems, 2672-2680. [8] Gori, M., Monfardini, G., and Scarselli, F. (2005). A new model for learning in graph domains. Proceedings of the 2005 IEEE International Joint Conference on Neural Networks. [9] He, H. and Ma, Y. (2013). Imbalanced learning: foundations, algorithms, and applications, John Wiley & Sons. [10] He, H., and Garcia, A., E. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. [11] Hu, Z., Dong, Y., Wang, K., and Sun, Y., (2020). Heterogeneous Graph Transformer. Proceedings of The Web Conference, 2704-2710. [12] Kipf., N., T., and Welling, M. (2017). Semi-supervised Classification with Graph Convolutional Networks. Proceedings of the 5th International Conference on Learning Representations. [13] Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., and Subrahmanian, V.S. (2018). Rev2: Fraudulent user prediction in rating platforms. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 333–341. [14] Ling, C., X. and Li, C. (1998). Data mining for direct marketing: Problems and solutions. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, (6), 73-79. [15] Liu, Z., Chen, C., Yang, X., Zhou, J., Li, X., Song, L. (2018). Heterogeneous Graph Neural Networks for Malicious Account Detection. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2077-2085. [16] Liu, Z., Dou, Y., Yu, P., S., Deng, Y., Peng, H. (2020). Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 1569-1572. [17] Long, Q., Jin, Y., Song, G., Li, Y., Lin, W. (2020) Graph Structural-topic Neural Network. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1065-1073. [18] Mao, X., Li, Q., Xie, H., Lau, Y.K., R., Wang, Z., and Smolley, P., S. (2017). Least Squares Generative Adversarial Networks. Proceedings of the IEEE International Conference on Computer Vision, 2794-2802. [19] Marius, P., and Balas, E., V. (2009). Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 8(7), 579-588 [20] McAuley, J., J., Leskovec, J. (2013). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. Proceedings of the 2013 International World Wide Web Conferences. [21] Mirza, M., and Osindero, S. (2014). Conditional Generative Adversarial Nets, arXiv preprint arXiv:1411.1784. [22] Radford, A., Metz, L., and Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. Proceedings of the International Conference on Learning Representations. [23] Ren, M., Zeng, W., Yang, B., and Urtasun, R. (2018). Learning to Reweight Examples for Robust Deep Learning. Proceedings of the 35th International Conference on Machine Learning, (80), 4334-4343. [24] Sampath, V., Maurtua, I., Martin, J., J., A., and Gutierrez, A. (2021). A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of Big Data, (8), 1-59. [25] Scarselli, F., Gori, M., Tsoi, A., C., Hagenbuchner, M., and Monfardini, G. (2009). The graph neural network model. IEEE Transactions on Neural Networks, (20), 61-80. [26] Shi, M., Tang, Y., Zhu, X., Wilson, A., D., and Liu, J. (2020). Multi-Class Imbalanced Graph Convolutional Network Learning. Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2879-2885. [27] Velickovic P., Cucurull, G., Casanova, A., Romero, A., Lio P., and Bengio, Y. (2018). Graph Attention Networks. Proceedings of the 6th International Conference on Learning Representations. [28] Wang, X., Ji, H., Shi, C., Wang, B., Cui, P., Yu, P., and Ye, Y. (2019). Heterogeneous Graph Attention Network. Proceedings of the 2019 International World Wide Web Conferences. [29] Yuan, B., and Ma, X. (2012). Sampling + reweighting: Boosting the performance of AdaBoost on imbalanced datasets. The 2012 International Joint Conference on Neural Networks, 1-6. [30] Zhao, T., Zhang, X., and Wang, S. (2021). GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, (9), 833-841. |
描述: | 碩士 國立政治大學 統計學系 108354018 |
資料來源: | http://thesis.lib.nccu.edu.tw/record/#G0108354018 |
数据类型: | thesis |
DOI: | 10.6814/NCCU202200296 |
显示于类别: | [統計學系] 學位論文
|
文件中的档案:
档案 |
描述 |
大小 | 格式 | 浏览次数 |
401801.pdf | | 1032Kb | Adobe PDF2 | 0 | 检视/开启 |
|
在政大典藏中所有的数据项都受到原著作权保护.
|