Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/131472
|
Title: | 多維度變異係數模型-基於B-Spline 近似之選模 Variable Selection of High Dimension Varying Coefficient Model Under B-Spline Approximation |
Authors: | 楊博安 Yang, Po-An |
Contributors: | 黃子銘 Huang, Tzee-Ming 楊博安 Yang, Po-An |
Keywords: | 變異係數模型 B-平滑曲獻 向前選取法 Varying coefficient model B-spline Forward selection Group lasso |
Date: | 2020 |
Issue Date: | 2020-09-02 11:42:02 (UTC+8) |
Abstract: | 變異係數模型是一種非線性模型,在許多領域都有廣泛的應用。與線型模型相比,變異係數模型最大的特點是允許係數隨著影響變數變動而變動,同時也保留易於詮釋的優點。而在大數據的時代,資料蒐集變得相對容易,當資料的變數個數非常大,而有顯著貢獻的真實變數不多時,如何挑選有用的變數十分重要。現行研究中多半以向前選取法與正規化方法兩種類型為主。本文以模擬實驗比較分組向前選取法與group lasso方法在不同條件設定下的優劣,並提出下列兩點建議:為了防止向前選取法過早停止,建議在BIC不再改善後再進行數步選取變數群組流程;某些時候group lasso傾向選取過多無關變數或選取過少真實變數,建議在進行完數種不同懲罰項的group lasso之後進行向後選取法,以決定最優模型。 Varying coefficient model is a form of nonlinear regression models which has numerous applications in many fields. While enjoying the good interpretability, the major difference from linear model is that the coefficients are allowed to vary systematically and smoothly in more than one dimension. However, in big data, when the number of candidate variables are very large, it is challenging to select the relevant variables. In recent years, there are several works dealing with this situation. Two main approaches are selection methods and regularization methods. In this thesis, we compare groupwise forward selection and group lasso in different conditions of simulations. For forward selection, we suggest running several steps after the stopping criterion is met in order to avoid stopping too early. We also find that group lasso method select too much unrelated variables or select too few true variables under some conditions. Thus, we apply groupwise backward selection after choosing several penalty terms in group lasso to improve the performance. |
Reference: | Bertsekas, D. P. (2016). Nonlinear Programming. 3rd edition. Athena Scientific.
Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4):373–384.
Cai, J., Fan, J., Zhou, H., Zhou, Y., et al. (2007). Hazard models with varying coefficients for multivariate failure time data. The Annals of Statistics, 35(1):324–354.
Cheng, M.-Y., Honda, T., and Zhang, J.T. (2016). Forward variable selection for sparse ultrahigh dimensional varying coefficient models. Journal of the American Statistical Association, 111(515):1209–1221.
De Boor, C., De Boor, C., Mathématicien, E.U., De Boor, C., and De Boor, C. (1978). A practical guide to splines, volume 27. springer-verlag New York.
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression.The Annals of statistics, 32(2):407–499.
Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494):544–557.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.
Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911.
Fan, J., Ma, Y., and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507):1270–1284.
Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–796.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, volume 43. CRC press.
Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultrahigh dimensional feature space. Journal of the American Statistical Association, 109(507):1229–1240.
Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53–71.
Nicholls, D. and Quinn, B. (1982). Random coefficient autoregressive models: an introduction. Lecture notes in statistics. Springer, Springer Nature, United States.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524.
Wei, F., Huang, J., and Li, H. (2011). Variable selection and estimation in high-dimensional varying coefficient models. Statistica Sinica, 21(4):1515–1540.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49–67.
Zhang, W., Lee, S.Y., and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82(1):166–188.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320. |
Description: | 碩士 國立政治大學 統計學系 107354003 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0107354003 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202001217 |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
400301.pdf | | 433Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|