Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/54170
|
Title: | 分類蛋白質質譜資料變數選取的探討 On Variable Selection of Classifying Proteomic Spectra Data |
Authors: | 林婷婷 |
Contributors: | 郭訓志 林婷婷 |
Keywords: | LARS Forward Stagewise LASSO Group LASSO Elastic Net 支持向量機 LARS Forward Stagewise LASSO Group LASSO Elastic Net SVM |
Date: | 2011 |
Issue Date: | 2012-10-30 10:13:33 (UTC+8) |
Abstract: | 本研究所利用的資料是來自美國東維吉尼亞醫學院所提供的攝護腺癌蛋白質質譜資料,其資料有原始資料和另一筆經過事前處理過的資料,而本研究是利用事前處理過的資料來作實証分析。由於此種資料通常都是屬於高維度資料,故變數間具有高度相關的現象也很常見,因此從大量的特徵變數中選取到重要的特徵變數來準確的判斷攝護腺的病變程度成為一個非常普遍且重要的課題。那麼本研究的目的是欲探討各(具有懲罰項)迴歸模型對於分類蛋白質質譜資料之變數選取結果,藉由LARS、Stagewise、LASSO、Group LASSO和Elastic Net各(具有懲罰項)迴歸模型將變數選入的先後順序當作其排序所產生的判別結果與利用「統計量排序」(t檢定、ANOVA F檢定以及Kruskal-Wallis檢定)以及SVM「分錯率排序」的判別結果相比較。而分析的結果顯示,Group LASSO對於六種兩兩分類的分錯率,其分錯率趨勢的表現都較其他方法穩定,並不會有大起大落的現象發生,且最小分錯率也幾乎較其他方法理想。此外Group LASSO在四分類的判別結果在與其他方法相較下也顯出此法可得出最低的分錯率,亦表示若須同時判別四種類別時,相較於其他方法之下Group LASSO的判別準確度最優。 Our research uses the prostate proteomic spectra data which is offered by Eastern Virginia Medical School. The materials have raw data and preprocessed data. Our research uses the preprocessed data to do the analysis of real example. Because this kind of materials usually have high dimension, so it maybe has highly correlation between variables very common, therefore choose from a large number of characteristic variables to accurately determine the pathological change degree of the Prostate is become a very general and important subject. Then the purpose of our research wants to discuss every (penalized) regression model in variable selection results for classifying the proteomic spectra data. With LARS, Stagewise, LASSO, Group LASSO and Elastic Net, each variable is chosen successively by each (penalized) regression model, and it is regarded as each variable’s order then produce discrimination results. After that, we use their results to compare with using statistic order (t-test, ANOVA F-test and Kruskal-Wallis test) and SVM fault rate order. And the result of analyzing reveals Group LASSO to two by two of six kinds of rate by mistake that classify, the mistake rate behavior of trend is more stable than other ways, it doesn’t appear big rise or big fall phenomenon. Furthermore, this way’s mistake rate is almostly more ideal than other ways. Moreover, using Group LASSO to get the discrimination result of four classifications has the lowest mistake rate under comparing with other methods. In other words, when must distinguish four classifications in the same time, Group LASSO’s discrimination accuracy is optimum. |
Reference: | 一.中文部分 陳詩佳 (2007),「使用Meta-Learning在蛋白質質譜資料特徵選取之探討」,國立政治大學統計系研究所碩士論文。 黃仁澤 (2005),「對於高維度資料進行特徵選取-應用於分類蛋白質質譜儀資料」,國立政治大學統計系研究所碩士論文。 蒲永孝和黃昌淵,「認識男人的殺手-前列腺癌」,正中書局,1997年。 潘荔錞、蔡志彥和簡志青,「蛋白質體學在臨床醫學之應用」,化工資訊與商情月刊第3 期,2003年9月號。 賴基銘,「癌症篩檢未來的展望:SELDI血清蛋白指紋圖譜的應用」,國家衛生研究院電子報,第52期,2004年6月25日。 簡邦平,「攝護腺健康新知」,原水文化,2006年。
二.英文部分 Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazares, L. H., Semmes, O. J.,Schellhammer, P. F., Yasui, Y., Feng, Z. and Wright, G. L. Jr. (2002), “Serum Protein Fingerprinting Coupled with a Pattern-matching Algorithm Distinguishes Prostate Cancer from Benign Prostate Hyperplasia and Healthy Men”, Cancer Research 62(13) 3609-3614. Degroeve, S., Baets, B. D.,Peer, Y. V. and Rouze, P. (2002), ”Feature Subset Selection for Splice Site Prediction”, Bioinformatics 18(2) 75-83. Efron, B., Hastie, T., Johnstone, I. and Tibshirani R. (2003), “Least Angle Regression”, Annals of Statistics 32(2) 407-499. Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001), ”Empirical Bayes Analysis of a Microarray Experiment”, Journal of the American Statistical Association 96(456) 1151-1160. Fox, R. J. and Dimmic, M. W. (2006), ”A Two-Sample Bayesian t-test for Microarray Data”, BMC Bioinformatics 7:126. Friedman, J., Hastie, T. and Tibshirani, R. (2010), “A Note on the Group LASSO and a Sparse Group LASSO”. Guyon, I., Westion, J. and Barnhill, S. (2002), “Gene Selection for Cancer Classification Using Support Vector Machines”, Barnhill Bioinformatics 46 389-422. Hastie, T., Tibshirani, R. and Friedman, J. (2009), ” The Elements of Statistical Learning. Springer”. Hastie, T., Taylor, J., Tibshirani, R. and Walther, G. (2007), “Forward Stagewise Regression and the Monotone Lasso”, Electronic Journal of Statistics 1(1) 1-29. Issaq, H. L., Veenstra, T. D., Conrads, T. P. and Felschow, D. (2002), “The SELDI-TOF MS Approach to Proteomics: Protein Profiling and Biomarker Identification”, Biochemical and Biophysical Research Communications 587-592. Jiang, H., Deng, Y., Chen, H. S., Tao, L., Sha, Q., Chen, J., Tsai, C. J. and Zhang, S. (2004), ”Joint Analysis of Two Microarray Gene-Expression Data Sets to Select Lung Adenocarcinoma Marker Genes”, BMC Bioinformatics 5:81. Leng, C., Lin, Y. and Wahba, G. (2006), “A Note on the Lasso and Related Procedures in Model Selection”, Statistica Sinica 16 1273-1284. Ma, S. and Huang, J. (2005), ”Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data”, Bioinformatics 21(24) 4356-4362. Meier, L., Geer, S. V. D. and Buhlmann, P. (2008), “The Group LASSO for Logistic Regression”, Journal of the Royal Statistical Society 70(1) 53-71. Park, M. Y. and Hastie, T. (2006), “L1 Regularization Path Algorithm for Generalized Linear Models”, Journal of the Royal Statistical Society 659-677. Somorjai, R. L., Dolenko, B. and Baumgartner, R. (2003), ”Class Prediction and Discovery Using Gene Microarray and Proteomics Mass Spectroscopy Data: curses, caveats, cautions”, Bioinformatics 19(12) 1484-1491. Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso”, Journal of the Royal Statistical Society 58(1) 267-288. West, M. (2003), “Bayesian Factor Regression Models in the Large p, Small n Paradigm”, Bayesian Statistics. Weston, J., Elisseeff, A. and Scholkopf, B. (2003), ”Use of the Zero-Norm with Linear Models and Kernel Methods”, BIOwulf Technologies 3 1439-1461. Yuan, M. and Lin, Y. (2006), “Model Selection and Estimation in Regression with Grouped Variables”, Journal of the Royal Statistical Society 68 49-67. Zou, H. and Hastie, T. (2004), “Regularization and Variable Selection via the Elastic Net”, Journal of the Roual Statistical Society 67 301-320. |
Description: | 碩士 國立政治大學 統計研究所 98354021 100 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0098354021 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
402101.pdf | 1453Kb | Adobe PDF2 | 1025 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|