政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/141002
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113325/144300 (79%)
Visitors : 51183965      Online Users : 909
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/141002


    Title: 懲罰R平方對線性混合模型中重要變數選取之研究
    Variables selection for linear mixed models with Penalized R-squared statistics
    Authors: 吳書恆
    Wu, Shu-Heng
    Contributors: 黃佳慧
    Huang, Chia-Hui
    吳書恆
    Wu, Shu-Heng
    Keywords: 自動化模型選擇
    適合度
    線性混合模型
    精簡模型
    懲罰係數
    Automatic selection methods
    Goodness-of-fit
    Linear mixed model
    Parsimonious model
    Penalty coefficient
    Date: 2022
    Issue Date: 2022-08-01 17:14:31 (UTC+8)
    Abstract: 在時間相依共變量拆解後的線性混合模型 (linear mixed model) 中,已提出不同型式的R2統計量來評估隨機效果與固定效果的適合度 (goodness-of-fit),然而在自動化選模方法中,這些統計量因過度擬合 (overfitting) 的問題而無法直接根據最大值來選擇較精簡模型
    (parsimonious model) 。本研究提出一個具有懲罰性質的R2統計量,此統計量懲罰項納入參數個數的考量,可抑制R2隨解釋變數增加而不斷膨脹的問題,並且可搭配自動化選模方式選擇隨機與固定效果的精簡模型。此外,因應使用者對模型精簡程度的需求,此統計量懲罰項含有懲罰係數,可彈性地調整懲罰的強度。當使用者並無特定的精簡程度,本研究亦提供網格搜索與給定容忍值的方式,以得出最佳的懲罰範圍與對應的精簡模型。透過資料模擬結果可發現懲罰R2選到精簡模型的效果較其他AIC統計量 (cAIC與mAIC) 佳,同時使用隨機效果與固定效果的懲罰R2亦不會影響各自選模的結果。在北卡羅來納州 (North Carolina) 犯罪資料的實證分析中,亦發現本研究所提出R2統計量在自動化選模中具有辨別重要變數的能力。
    In the Linear Mixed Model (LMM) after time dependent covariates decomposition, different types of R2 statistics have been proposed to evaluate the goodness-of-fit (GOF) of random effects and fixed effects. However, due to the overfitting issue, the maximum value of the statistics cannot be applied in the automatic selection methods. In this study, we propose a R2 statistic, which includes a penalty that discourages the inflation of R2 when extra regressors are added to the model. Therefore, the R2 statistic can select parsimonious model of random effects and fixed effects by automatic model selection. In addition, the penalty coefficient of the proposed R2 statistic can be flexibly adjusted based on researcher`s demand for model simplification. When researchers do not have a specific degree of simplification, we also provide two methods to obtain the optimal range of penalty coefficient and the corresponding parsimonious model. The simulation results showed that the effect of penalized R2 in finding parsimonious model is better than that of other AIC statistics (cAIC and mAIC), and using penalized R2 with both random effects and fixed effects simaltaneously does not affect the results of their model selection. In an empirical analysis of North Carolina crime data, we found that the proposed R2 statistic is able to identify significant variables which were also found in the original study.
    Reference: Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike, pages 199–213. Springer.
    Arnau, J., Bono, R., and Vallejo, G. (2009). Analyzing small samples of repeated measures data with the mixed-model adjusted f test. Communications in Statistics-Simulation and Computation, 38(5):1083–1103.
    Baltagi, B. H. (2006). Estimating an economic model of crime using panel data from north carolina. Journal of Applied econometrics, 21(4):543–547.
    Brame, R., Bushway, S., and Paternoster, R. (1999). On the use of panel research designs and random effects models to investigate static and dynamic theories of criminal offending. Criminology, 37(3):599–642.
    Cornwell, C. and Trumbull, W. N. (1994). Estimating the economic model of crime with panel data. The Review of economics and Statistics, pages 360–366.
    Edwards, L. J., Muller, K. E., Wolfinger, R. D., Qaqish, B. F., and Schabenberger, O. (2008). An r2 statistic for fixed effects in the linear mixed model. Statistics in medicine, 27(29):6137–6157.
    Ezekiel, M. (1930). Methods of correlation analysis. Wiley.
    Ghidey, W., Lesaffre, E., and Eilers, P. (2004). Smooth random effects distribution in a linear mixed model. Biometrics, 60(4):945–953.
    Greven, S. and Kneib, T. (2010). On the behaviour of marginal and conditional aic in linear mixed models. Biometrika, 97(4):773–789.
    Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American statistical association, 72(358): 320–338.
    Hawkins, D. M. (2004). The problem of overfitting. Journal of chemical information and computer sciences, 44(1):1–12.
    Helland, I. S. (2000). Model reduction for prediction in regression models. Scandinavian journal of statistics, 27(1):1–20.
    Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2):297–307.
    Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, pages 963–974.
    Lalonde, T. L. (2015). Modeling time-dependent covariates in longitudinal data analyses. In Innovative statistical methods for public health data, pages 57–79. Springer.
    Lalonde, T. L., Nguyen, A. Q., Yin, J., Irimata, K., and Wilson, J. R. (2013). Modeling correlated binary outcomes with time-dependent covariates. Journal of Data Science, 11(4).
    Luke, S. G. (2017). Evaluating significance in linear mixed-effects models in r. Behavior research methods, 49(4):1494–1502.
    McNeish, D. (2017). Small sample methods for multilevel modeling: A colloquial elucidation of reml and the kenward-roger correction. Multivariate Behavioral Research, 52(5):661–670.
    Molenberghs, G. and Verbeke, G. (2000). A model for longitudinal data. Linear Mixed Models for Longitudinal Data, pages 19–29.
    Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution, 4(2):133–142.
    Neuhaus, J. M. and Kalbfleisch, J. D. (1998). Between-and within-cluster covariate effects in the analysis of clustered data. Biometrics, pages 638–645.
    Olkin, I. and Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. The annals of mathematical statistics, pages 201–211.
    Orelien, J. G. and Edwards, L. J. (2008). Fixed-effect variable selection in linear mixed models using r2 statistics. Computational Statistics & Data Analysis, 52(4):1896–1907.
    Rights, J. D. and Sterba, S. K. (2019). Quantifying explained variance in multilevel models: An integrative framework for defining r-squared measures. Psychological methods, 24(3):309.
    Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, pages 461–464.
    Shen, W. and Louis, T. A. (1999). Empirical bayes estimation via the smoothing by roughening approach. Journal of Computational and Graphical Statistics, 8(4):800–823.
    Snijders, T. A. and Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. sage.
    Sundberg, R. (1999). Multivariate calibration—direct and indirect regression methodology. Scandinavian Journal of Statistics, 26(2):161–207.
    Vaida, F. and Blanchard, S. (2005). Conditional akaike information for mixed-effects models. Biometrika, 92(2):351–370.
    Verbeke, G. and Lesaffre, E. (1996). A linear mixed-effects model with heterogeneity in the random-effects population. Journal of the American Statistical Association, 91(433):217–221.
    Vonesh, E. and Chinchilli, V. M. (1996). Linear and nonlinear models for the analysis of repeated measurements. CRC press.
    Welham, S. and Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3):701–714.
    Xu, R. (2003). Measuring explained variation in linear mixed effects models. Statistics in medicine, 22(22):3527–3541.
    Zhang, D. and Davidian, M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57(3):795–802.
    Zheng, B. (2000). Summarizing the goodness of fit of generalized linear models for longitudinal data. Statistics in medicine, 19(10):1265–1275.
    Description: 碩士
    國立政治大學
    統計學系
    109354003
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0109354003
    Data Type: thesis
    DOI: 10.6814/NCCU202200673
    Appears in Collections:[Department of Statistics] Theses

    Files in This Item:

    File Description SizeFormat
    400301.pdf715KbAdobe PDF2230View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback