Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/30944
|
Title: | 維度縮減應用於蛋白質質譜儀資料 Dimension Reduction on Protein Mass Spectrometry Data |
Authors: | 黃靜文 Huang, Ching-Wen |
Contributors: | 余清祥 Yue, Ching-Syang 黃靜文 Huang, Ching-Wen |
Keywords: | 分類 維度縮減 疾病診斷 電腦模擬 Classification Dimension reduction Disease diagnosis Computer simulation |
Date: | 2004 |
Issue Date: | 2009-09-14 |
Abstract: | 本文應用攝護腺癌症蛋白質資料庫,是經由表面強化雷射解吸電離飛行質譜技術的血清蛋白質強度資料,藉此資料判斷受測者是否罹患癌症。此資料庫之受測者包含正常、良腫、癌初和癌末四種類別,其中包括兩筆資料,一筆為包含約48000個區間資料(變數)之原始資料,另一筆為經由人工變數篩選後,僅剩餘779區間資料(變數)之人工處理資料,此兩筆皆為高維度資料,皆約有650個觀察值。高維度資料因變數過多,除了分析不易外,亦造成運算時間較長。故本研究目的即探討在有效的維度縮減方式下,找出最小化分錯率的方法。
本研究先比較分類方法-支持向量機、類神經網路和分類迴歸樹之優劣,再將較優的分類方法:支持向量機和類神經網路,應用於維度縮減資料之分類。本研究採用之維度縮減方法,包含離散小波分析、主成份分析和主成份分析網路。根據分析結果,離散小波分析和主成份分析表現較佳,而主成份分析網路差強人意。
本研究除探討以上維度縮減方法對此病例資料庫分類之成效外,亦結合線性維度縮減-主成份分析,非線性維度縮減-主成份分析網路,希望能藉重疊法再改善僅做單一維度縮減方法之病例篩檢分錯率,根據分析結果,重疊法對原始資料改善效果不明顯,但對人工處理資料卻有明顯的改善效果。 In this paper, we study the serum protein data set of prostate cancer, which acquired by Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS) technique. The data set, with four populations of prostate cancer patients, includes both raw data and preprocessed data. There are around 48000 variables in raw data and 779 variables in preprocessed data. The sample size of each data is around 650. Because of the high dimensionality, this data set provokes higher level of difficulty and computation time. Therefore, the goal of this study is to search efficient dimension reduction methods.
We first compare three classification methods: support vector machine, artificial neural network, and classification and regression tree. And, we use discrete wavelet transform, principal component analysis and principal component analysis networks to reduce the data dimension.
Then, we discuss the dimension reduction methods and propose overlap method that combines the linear dimension reduction method-principal component analysis, and the nonlinear dimension reduction method-principal component analysis networks to improve the classification result. We find that the improvement of overlap method is significant in the preprocessed data, but not significant in the raw data. |
Reference: | 【中文部分】 [01] 行政院衛生署,「中華民國九十三年臺灣地區死因統計結果摘要」。 網址:http://www.doh.gov.tw/statistic/data/死因摘要/93年/93.htm [02] 彭文正譯,Michael J.A. Berry與Gordon S. Linoff著,資料採礦-顧客關係管理暨電子行銷之應用,數博網資訊股份有限公司,2001年。 [03] 葉怡成,應用類神經網路,儒林圖書公司,1997年。 [04] 潘荔錞、蔡志彥和簡志青,「蛋白質體學在臨床醫學之應用」,化工資訊與商情月刊第3期,2003年9月號。 [05] 賴基銘,「癌症篩檢未來的展望:SELDI血清蛋白指紋圖譜的應用」,國家衛生研究院電子報,第52期,2004年6月25日。 網址:http://sars.nhri.org.tw/enews/enews_list_new3.php?volume_indx= 52&enews_dt=2004-06-25 【英文部分】 [06] Alpaydin, E. (2004), Introduction to Machine Learning. MIT Press. [07] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984), Classification and Regression Trees, Wadsworth. [08] Cottrell, G. W., Munro, P. and Zipser, D. (1987), “Learning Internal Representations from Gray-Scale Images: An Example of Extensional Programming”, In Ninth Annual Conference of the Cognitive Science Society, 462-473. Hillsdale, NJ:Erlbsum. [09] Cybenko, G. (1989), “Approximation by Superpositions of a Sigmoidal Function,” Mathematical Control Signal Systems, vol.2, 303-314. [10] Donoho, D. L. and Johnstone, I. M. (1994), “Ideal Spatial Adaptation by Wavelet Shrinkage”, Biometrika, vol.81, 245-455. [11] Donoho, D. L. and Johnstone, I. M. (1995), “Adapting to Unknown Smoothness via Wavelet Shrinkage”, Journal of the American Statistical Association, vol.90, 1200-1224. [12] Donoho, D. L. and Johnstone, I. M. (1998), “Minimax Estimation via Wavelet Shrinkage,” Annals of Statistics, vol.26, 879-921. [13] Daubechies, I. (1992), Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM:Philadelphia. [14] Hornik, K., Stinchcombe, M. and White, H. (1989), Multilayer Feedforward Networks Are Universal Approximations, Neural Networks, vol.2, 336-359. [15] Hsu, C-W., Chang, C-C. and Lin, C-J. (2003), “A Practical Guide to Support Vector Classification”. Paper available at http://www.csie.ntu.edu.tw/~cjlin/papers.html. [16] Huang, T-K., Weng, R. C. and Lin, C-J. (July 2004), “A Generalized Bradley-Terry Model: From Group Competition to Individual Skill”. A short version appears in NIPS. [17] Johnson, D. E. (1998), Applied Multivariate Methods for Data Analysts, Pacific Grove, Calif. Dluxbury Press. [18] Mallat, S. G. (1989), “A Theory for Multiresolution Signal Decomposition: the Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vo1.11, No.7, 674-693. [19] Qu, Y., Adam, B-L., Thornquist, M., Potter, J. D., Thompson, M. L., Yasui, Y., Davis, J., Schellhammer, P. F., Cazares, L., Clements, M. A., Wright, G. L., Jr. and Feng, Z. (March 2003), “Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data”, BIOMETRICS, vol.59, 143-151. [20] Rumelhart E., Hinton G. E., and Williams R. J. (1986), Learning Internal Representations by Error Propagation in Parallel Distributed Processing, MIT Press, Cambridge, MA, vol.1, 318-362. [21] Vapnik V. N. (1995), The Nature of Statistical Learning Theory, Springer, New York. |
Description: | 碩士 國立政治大學 統計研究所 92354012 93 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0923540121 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 401 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|