Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/152775
|
Title: | 機器學習方法於分類或預測問題之比較與應用 Machine Learning Methods in Classification or Prediction: Some Comparison and Applications |
Authors: | 古政弘 Gu, Cheng-Hung |
Contributors: | 張育瑋 Chang, Yu-Wei 古政弘 Gu, Cheng-Hung |
Keywords: | 分類迴歸樹 貝氏可加性迴歸樹 隨機森林 Classification and Regression Tree Bayesian Additive Regression Trees Random Forest |
Date: | 2024 |
Issue Date: | 2024-08-05 13:59:17 (UTC+8) |
Abstract: | 近年新的機器學習方法相當蓬勃發展,根據其應變數為連續型或類別型,這 些方法可以被應用於預測或分類問題中。本研究感興趣一些機器學習方法的預測或分類準確度為何,並且特別聚焦於可解釋性的機器學習,因為在實際資料分析中,應用者也常會感興趣自變數與應變數的關係之解釋。在此考慮七種機器學習方法或統計方法:分類迴歸樹(Classification and Regression Tree)、貝氏可加性迴歸樹(Bayesian additive regression trees)、隨機森林(random forest)、多變量適應性迴歸弧線(multivariate adaptive regression splines)、廣義相加模型(generalized additive model)、線性判別分析(linear discriminant analysis)及二次判別分析 (quadratic discriminant analysis),將這些方法分別應用至兩筆實際資料,對於資料的訓練集進行建模,比較各種方法在測試資料集之預測或分類效果。 In recent years, a multitude of machine learning methods have been proposed.Depending on whether the response variable is continuous or ordinal categorical, these methods can be applied to prediction or classification problems. This study is interested in the predictive or classification accuracy of various machine learning methods, with a particular focus on interpretable machine learning. In practical data analysis, users often seek to understand the relationships between independent and dependent variables.We consider seven machine learning and statistical methods: Classification and Regression Tree, Bayesian Additive Regression Trees, Random Forest, Multivariate Adaptive Regression Splines, Generalized Additive Model, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. We apply these methods to two real datasets. Subsequently, we compare the prediction and classification performance of the seven methods on the test sets. |
Reference: | Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32. Chipman, H., George, E., & Mcculloch, R. (2010). BART:Bayesian Additive Regression Trees. Annals of Applied Statistics, 4, 266-98 Hastie, T., & Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82, 371–386. Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Applied Statistics, 19, 1-67. Kim, C., & Park, S. (2022). Comparison of Tree-Based Ensemble Models for Regression. Communications for Statistical Applications and Methods, 29, 561-589. Knežević, marinela., Has, A., & Zekic´ -sušac, M. (2021). Predicting EnergyCost of Public Buildings by Artificial Neural Networks, CART, and Random Forest. Neurocomputing, 439, 223-233. Barros, F., Carvalho, G. C., Costa, Y., & Martins, I. (2022). Sea-Level RiseEffects on Macrozoobenthos Distribution within an Estuarine Gradient Using Species Distribution Modeling. Ecological Informatics, 71, 101816. Hong, H., Naghibi, S.A., Moradi Dashtpagerdi, M. et al. (2017). A comparative between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China. Arab J Geosci 10, 167. VE, S. & Cho, Y. (2020). Season wise bike sharing demand analysis using random forest algorithm. Computational Intelligence, 40. Du, J., Liu, J. S, & Krakovna, V. (2015). Selective Bayesian Forest Classifier Simultaneous Variable Selection and Classification. Arxiv. Martín, B., González–Arias, J., & Vicente–Vírseda, J. A. (2021). Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance. Animal Biodiversity and Conservation, 44.2, 289-301. Gunnarsson, B. R., vanden Broucke, S., Baesens, B., Óskarsdóttir, M., & Lemahieu,W. (2021). Deep learning for credit scoring : do or don’t? EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 295, 292-305. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. I. (1984). Classification and regression trees. Belmont, Calif.:Wadsworth. Chipman, H. A., George, E. I., & Mcculloch, R. E. (1998). Bayesian CART Model Search. Journal of the American Statistical Association, 93, 935- 948. Bleich, J., & Kapelner, A. (2014, November 24). BartMachine: Machine Learning with Bayesian Additive Regression Trees. Arxiv. Mcculloch, R., Spanbauer, C., & Sparapani, R. (2021). Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: TheBART R Package. Journal of Statistical Software, 97, 1–66. Urbanek, S. (2024, January 26). RJava: Low-Level R to Java Interface. Straw I, Wu H. Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction. BMJ Health CareInform 2022;29:e100457. Prasad babu, M. S., Ramana, B. V., & Venkateswarlu, N. B. (2012). A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis. International Journal of Computer Science Issues, 9, 101-114. Ramana, Bendi., & Venkateswarlu, N. (2012). ILPD (Indian Liver PatientDataset). UCI Machine Learning Repository. Quinlan, R. (1993). Auto MPG. UCI Machine Learning Repository. |
Description: | 碩士 國立政治大學 統計學系 111354005 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0111354005 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
400501.pdf | | 3135Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|