Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/141002
|
Title: | 懲罰R平方對線性混合模型中重要變數選取之研究 Variables selection for linear mixed models with Penalized R-squared statistics |
Authors: | 吳書恆 Wu, Shu-Heng |
Contributors: | 黃佳慧 Huang, Chia-Hui 吳書恆 Wu, Shu-Heng |
Keywords: | 自動化模型選擇 適合度 線性混合模型 精簡模型 懲罰係數 Automatic selection methods Goodness-of-fit Linear mixed model Parsimonious model Penalty coefficient |
Date: | 2022 |
Issue Date: | 2022-08-01 17:14:31 (UTC+8) |
Abstract: | 在時間相依共變量拆解後的線性混合模型 (linear mixed model) 中,已提出不同型式的R2統計量來評估隨機效果與固定效果的適合度 (goodness-of-fit),然而在自動化選模方法中,這些統計量因過度擬合 (overfitting) 的問題而無法直接根據最大值來選擇較精簡模型 (parsimonious model) 。本研究提出一個具有懲罰性質的R2統計量,此統計量懲罰項納入參數個數的考量,可抑制R2隨解釋變數增加而不斷膨脹的問題,並且可搭配自動化選模方式選擇隨機與固定效果的精簡模型。此外,因應使用者對模型精簡程度的需求,此統計量懲罰項含有懲罰係數,可彈性地調整懲罰的強度。當使用者並無特定的精簡程度,本研究亦提供網格搜索與給定容忍值的方式,以得出最佳的懲罰範圍與對應的精簡模型。透過資料模擬結果可發現懲罰R2選到精簡模型的效果較其他AIC統計量 (cAIC與mAIC) 佳,同時使用隨機效果與固定效果的懲罰R2亦不會影響各自選模的結果。在北卡羅來納州 (North Carolina) 犯罪資料的實證分析中,亦發現本研究所提出R2統計量在自動化選模中具有辨別重要變數的能力。 In the Linear Mixed Model (LMM) after time dependent covariates decomposition, different types of R2 statistics have been proposed to evaluate the goodness-of-fit (GOF) of random effects and fixed effects. However, due to the overfitting issue, the maximum value of the statistics cannot be applied in the automatic selection methods. In this study, we propose a R2 statistic, which includes a penalty that discourages the inflation of R2 when extra regressors are added to the model. Therefore, the R2 statistic can select parsimonious model of random effects and fixed effects by automatic model selection. In addition, the penalty coefficient of the proposed R2 statistic can be flexibly adjusted based on researcher`s demand for model simplification. When researchers do not have a specific degree of simplification, we also provide two methods to obtain the optimal range of penalty coefficient and the corresponding parsimonious model. The simulation results showed that the effect of penalized R2 in finding parsimonious model is better than that of other AIC statistics (cAIC and mAIC), and using penalized R2 with both random effects and fixed effects simaltaneously does not affect the results of their model selection. In an empirical analysis of North Carolina crime data, we found that the proposed R2 statistic is able to identify significant variables which were also found in the original study. |
Reference: | Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike, pages 199–213. Springer. Arnau, J., Bono, R., and Vallejo, G. (2009). Analyzing small samples of repeated measures data with the mixed-model adjusted f test. Communications in Statistics-Simulation and Computation, 38(5):1083–1103. Baltagi, B. H. (2006). Estimating an economic model of crime using panel data from north carolina. Journal of Applied econometrics, 21(4):543–547. Brame, R., Bushway, S., and Paternoster, R. (1999). On the use of panel research designs and random effects models to investigate static and dynamic theories of criminal offending. Criminology, 37(3):599–642. Cornwell, C. and Trumbull, W. N. (1994). Estimating the economic model of crime with panel data. The Review of economics and Statistics, pages 360–366. Edwards, L. J., Muller, K. E., Wolfinger, R. D., Qaqish, B. F., and Schabenberger, O. (2008). An r2 statistic for fixed effects in the linear mixed model. Statistics in medicine, 27(29):6137–6157. Ezekiel, M. (1930). Methods of correlation analysis. Wiley. Ghidey, W., Lesaffre, E., and Eilers, P. (2004). Smooth random effects distribution in a linear mixed model. Biometrics, 60(4):945–953. Greven, S. and Kneib, T. (2010). On the behaviour of marginal and conditional aic in linear mixed models. Biometrika, 97(4):773–789. Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American statistical association, 72(358): 320–338. Hawkins, D. M. (2004). The problem of overfitting. Journal of chemical information and computer sciences, 44(1):1–12. Helland, I. S. (2000). Model reduction for prediction in regression models. Scandinavian journal of statistics, 27(1):1–20. Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2):297–307. Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, pages 963–974. Lalonde, T. L. (2015). Modeling time-dependent covariates in longitudinal data analyses. In Innovative statistical methods for public health data, pages 57–79. Springer. Lalonde, T. L., Nguyen, A. Q., Yin, J., Irimata, K., and Wilson, J. R. (2013). Modeling correlated binary outcomes with time-dependent covariates. Journal of Data Science, 11(4). Luke, S. G. (2017). Evaluating significance in linear mixed-effects models in r. Behavior research methods, 49(4):1494–1502. McNeish, D. (2017). Small sample methods for multilevel modeling: A colloquial elucidation of reml and the kenward-roger correction. Multivariate Behavioral Research, 52(5):661–670. Molenberghs, G. and Verbeke, G. (2000). A model for longitudinal data. Linear Mixed Models for Longitudinal Data, pages 19–29. Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution, 4(2):133–142. Neuhaus, J. M. and Kalbfleisch, J. D. (1998). Between-and within-cluster covariate effects in the analysis of clustered data. Biometrics, pages 638–645. Olkin, I. and Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. The annals of mathematical statistics, pages 201–211. Orelien, J. G. and Edwards, L. J. (2008). Fixed-effect variable selection in linear mixed models using r2 statistics. Computational Statistics & Data Analysis, 52(4):1896–1907. Rights, J. D. and Sterba, S. K. (2019). Quantifying explained variance in multilevel models: An integrative framework for defining r-squared measures. Psychological methods, 24(3):309. Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, pages 461–464. Shen, W. and Louis, T. A. (1999). Empirical bayes estimation via the smoothing by roughening approach. Journal of Computational and Graphical Statistics, 8(4):800–823. Snijders, T. A. and Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. sage. Sundberg, R. (1999). Multivariate calibration—direct and indirect regression methodology. Scandinavian Journal of Statistics, 26(2):161–207. Vaida, F. and Blanchard, S. (2005). Conditional akaike information for mixed-effects models. Biometrika, 92(2):351–370. Verbeke, G. and Lesaffre, E. (1996). A linear mixed-effects model with heterogeneity in the random-effects population. Journal of the American Statistical Association, 91(433):217–221. Vonesh, E. and Chinchilli, V. M. (1996). Linear and nonlinear models for the analysis of repeated measurements. CRC press. Welham, S. and Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3):701–714. Xu, R. (2003). Measuring explained variation in linear mixed effects models. Statistics in medicine, 22(22):3527–3541. Zhang, D. and Davidian, M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57(3):795–802. Zheng, B. (2000). Summarizing the goodness of fit of generalized linear models for longitudinal data. Statistics in medicine, 19(10):1265–1275. |
Description: | 碩士 國立政治大學 統計學系 109354003 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109354003 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202200673 |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
400301.pdf | | 715Kb | Adobe PDF2 | 253 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|