Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/95126
|
Title: | 自變數有誤差的邏輯式迴歸模型:估計、實驗設計及序貫分析 Logistic regression models when covariates are measured with errors: Estimation, design and sequential method |
Authors: | 簡至毅 Chien, Chih Yi |
Contributors: | 薛慧敏 張源俊 Hsueh, Huey Mriin Chang, Yuan Chin 簡至毅 Chien, Chih Yi |
Keywords: | 邏輯式迴歸 測量誤差 樣本數計算 序貫分析 二階段抽樣 logistic regression model measurement error sample size calculation sequential sampling two-stage case-control sampling Case-control study |
Date: | 2009 |
Issue Date: | 2016-05-09 15:14:17 (UTC+8) |
Abstract: | 本文主要在探討自變數存在有測量誤差時,邏輯式迴歸模型的估計問題,並設計實驗使得測量誤差能滿足遞減假設,進一步應用序貫分析方法,在給定水準下,建立一個信賴範圍。
當自變數存在有測量誤差時,通常會得到有偏誤的估計量,進而在做決策時會得到與無測量誤差所做出的決策不同。在本文中提出了一個遞減的測量誤差,使得滿足這樣的假設,可以證明估計量的強收斂,並證明與無測量誤差所得到的估計量相同的近似分配。相較於先前的假設,特別是證明大樣本的性質,新增加的樣本會有更小的測量誤差是更加合理的假設。我們同時設計了一個實驗來滿足所提出遞減誤差的條件,並利用序貫設計得到一個更省時也節省成本的處理方法。
一般的case-control實驗,自變數也會出現測量誤差,我們也證明了斜率估計量的強收斂與近似分配的性質,並提出一個二階段抽樣方法,計算出所需的樣本數及建立信賴區間。 In this thesis, we focus on the estimate of unknown parameters, experimental designs and sequential methods in both prospective and retrospective logistic regression models when there are covariates measured with errors. The imprecise measurement of exposure happens very often in practice, for example, in retrospective epidemiology studies, that may due to either the difficulty or the cost of measuring. It is known that the imprecisely measured variables can result in biased coefficients estimation in a regression model and therefore, it may lead to an incorrect inference. Thus, it is an important issue if the effects of the variables are of primary interest.
When considering a prospective logistic regression model, we derive asymptotic results for the estimators of the regression parameters when there are mismeasured covariates. If the measurement error satisfies certain assumptions, we show that the estimators follow the normal distribution with zero mean, asymptotically unbiased and asymptotically normally distributed. Contrary to the traditional assumption on measurement error, which is mainly used for proving large sample properties, we assume that the measurement error decays gradually at a certain rate as there is a new observation added to the model. This kind of assumption can be fulfilled when the usual replicate observation method is used to dilute the magnitude of measurement errors, and therefore, is also more useful in practical viewpoint. Moreover, the independence of measurement error and covariate is not required in our theorems. An experimental design with measurement error satisfying the required degenerating rate is introduced. In addition, this assumption allows us to employ sequential sampling, which is popular in clinical trials, to such a measurement error logistic regression model. It is clear that the sequential method cannot be applied based on the assumption that the measurement errors decay uniformly as sample size increasing as in the most of the literature. Therefore, a sequential estimation procedure based on MLEs and such moment conditions is proposed and can be shown to be asymptotical consistent and efficient.
Case-control studies are broadly used in clinical trials and epidemiological studies. It can be showed that the odds ratio can be consistently estimated with some exposure variables based on logistic models (see Prentice and Pyke (1979)). The two-stage case-control sampling scheme is employed for a confidence region of slope coefficient beta. A necessary sample size is calculated by a given pre-determined level. Furthermore, we consider the measurement error in the covariates of a case-control retrospective logistic regression model. We also derive some asymptotic results of the maximum likelihood estimators (MLEs) of the regression coefficients under some moment conditions on measurement errors. Under such kinds of moment conditions of measurement errors, the MLEs can be shown to be strongly consistent, asymptotically unbiased and asymptotically normally distributed. Some simulation results of the proposed two-stage procedures are obtained. We also give some numerical studies and real data to verify the theoretical results in different measurement error scenarios. |
Reference: | [1] Anderson, J. A. (1972). Separate sample logistic discrimination. Biometrika, 59, 19-35.
[2] Begg, M. D. and Lagakos S. W. (1992). Effects of mismodeling on tests of association based on logistic regression models. The Annals of Statistics, 20, 1929-1952.
[3] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu C. M. (2006). Measurement Error in Nonlinear Models (2nd ed.). London: Chapman & Hall/CRC.
[4] Chang, Y-c. I. and Martinsek, A. T. (1992). Fixed size condence regions for parameters of a logistic regression model. The Annals of Statistics, 20, 1953-1969.
[5] Chang, Y-c. I. (2001). Sequential condence regions of generalized linear models with adaptive designs. Journal of Statistical Planning and Inference, 93, 277-293.
[6] Chen, K. (2000). Optimal Sequential Designs of Case-Control Studies. The Annals of Statistics, 28, 1452-1471.
[7] Cheng, C-L. and Van Ness, J. W. (1999). Statistical Regression with Measurement Error. London: Oxford University Press.
[8] Chow, Y. S. and Robbins, H. (1965). On the Asymptotic Theory of Fixed-Width Sequential Condence Intervals for the Mean. The Annals of Mathematical Statistics, 36, 457-462.
[9] Chow, Y. S. and Teicher, H. (1978). Probability Theory: Independence Interchangeability Martingales. New York: Springer-Verlag.
[10] Demark-Wahnefried, W., Clipp, E. C., Lipkus, I. M., Lobach, D., Snyder, D. C., Sloane, R., Peterson, B., Macri, J. M., Rock, C. L., McBride, C. M. and Kraus, W. E. (2007). Main Outcomes of the FRESH START Trial: A Sequentially Tailored, Diet and Exercise Mailed Print Intervention
Among Breast and Prostate Cancer Survivors. Journal of Clinical Oncology, 25, 2709-2718.
[11] Etzioni, R., Pepe, M., Longton, G., Hu, C. and Goodman, G. (1999). Incorporating The Time Dimension in Receiver Operating Characterstic Curves: a Case Study of Prostat Cancer. Medical Decision Making, 19, 242-251.
[12] Farewell, V. T. (1979). Some Results on the Estimation of Logistic Models Based on Retrospective Data. Biometrika, 66, 27-32.
[13] Fuller, W. A. (1980). Properties of Some Estimators for the Errors-in-Variables Model. The Annals of statistics, 8, 407-422.
[14] Fuller, W. A. (1987). Measurement Error Models. New York: John Wiley & Sons, Inc.
[15] Gleser, C. J. (1981). Estimation in a Multivariate \\Errors in Variables" Regression Model: Large Sample Results. The Annals of Statistics, 9, 24-44.
[16] Janes, H., Pepe, M., Kooperberg, C. and Newcomb, P. (2005). Identifying Target Populations for Screening or Not Screening Using Logic Regression. Statistics in Medicine, 24, 1321-1338.
[17] Kalohn, J. C. and Spray, J. A. (1999). The Eect of Model Misspecication on Classication Decisions Made Using a Computerized Test. Journal of Educational Measurement, 36, 47-59.
[18] Merle, Y., Aouimer, A. and Tod, M. (2004). Impact of Model Misspecication at Design (and/or) Estimation Step in Population Pharmacokinetic Studies. Journal of Biopharmaceutical Statistics, 14, 213-227.
[19] O`neill, R. T. and Anello, C. (1978). Case-control Studies: A Sequential Approach. American Journal of Epidemiology, 120, 145-153.
[20] Owen, J. D. and James, M. S. (1998). Estimating Sample Size for Epidemiologic Studies: The Impact of Ignoring Exposure Measurement Uncertainty. Statistics in Medicine, 17, 1375-1389.
[21] Pagano, M. and Gauvreau, K. (2000). Principles of Biostatistics (2nd ed.). Pacic Grove, California: Duxbury.
[22] Paul, G. and Nhu, D. L. (2002). Comparing the Effects of Continuous and Discrete Covariate Mismeasurement, with Emphasis on the Dichotomization of Mismeasured Predictors. Biometrics, 58, 878-887.
[23] Pierce, J. P., Stefanick, M. L., Flatt, S. W., Natarajan, L., Sternfeld, B., Madlensky, L., Al-Delaimy, W. K., Thomson, C. A., Kealey, S., Hajek, R., Parker, B. A., Newman, V. A., Caan, B. and Rock, C. L. (2007). Greater Survival After Breast Cancer in Physically Active Women With
High Vegetable-Fruit Intake Regardless of Obesity. Journal of Clinical Oncology, 25, 2345-2351.
[24] Prentice, R. L. and Pyke, R. (1979). Logistic Disease Incidence Models and Case-Control Studies. Biometrika, 66, 403-411.
[25] Smith, P. (1997). Model Misspecication in Data Envelopment Analysis. Annals of Operations Research, 73, 233-252.
[26] Tosteson, T. D., Buzas, J. S., Demidenko, E. and Karagas, M. (2003). Power and Sample Size Calculations for Generalized Regression Models with Covariate Measurement Error. Statistics in Medicine, 22, 1069-1082.
[27] Urmanov, A. M., Gribok, A. V., Hines, J. W. and Uhrig, R. E. (2002). An Information Approach to Regularization Parameter Selection under Model Misspecication. Inverse Problems, 18, 1207-1228.
[28] Wang, C. Y. and Wang, S. (1995). On Information Matrices in Casecontrol Studies. Statistics and Probability Letters, 22, 269-274.
[29] Woodroofe, M. (1982). Nonlinear renewal theory in sequential analysis. Philadelphia, Pa: Society for Industrial and Applied Mathematics. |
Description: | 博士 國立政治大學 統計學系 92354503 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0923545033 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 323 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|