政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/30944
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113869/144892 (79%)
Visitors : 51887071      Online Users : 525
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/30944


    Title: 維度縮減應用於蛋白質質譜儀資料
    Dimension Reduction on Protein Mass Spectrometry Data
    Authors: 黃靜文
    Huang, Ching-Wen
    Contributors: 余清祥
    Yue, Ching-Syang
    黃靜文
    Huang, Ching-Wen
    Keywords: 分類
    維度縮減
    疾病診斷
    電腦模擬
    Classification
    Dimension reduction
    Disease diagnosis
    Computer simulation
    Date: 2004
    Issue Date: 2009-09-14
    Abstract: 本文應用攝護腺癌症蛋白質資料庫,是經由表面強化雷射解吸電離飛行質譜技術的血清蛋白質強度資料,藉此資料判斷受測者是否罹患癌症。此資料庫之受測者包含正常、良腫、癌初和癌末四種類別,其中包括兩筆資料,一筆為包含約48000個區間資料(變數)之原始資料,另一筆為經由人工變數篩選後,僅剩餘779區間資料(變數)之人工處理資料,此兩筆皆為高維度資料,皆約有650個觀察值。高維度資料因變數過多,除了分析不易外,亦造成運算時間較長。故本研究目的即探討在有效的維度縮減方式下,找出最小化分錯率的方法。
    本研究先比較分類方法-支持向量機、類神經網路和分類迴歸樹之優劣,再將較優的分類方法:支持向量機和類神經網路,應用於維度縮減資料之分類。本研究採用之維度縮減方法,包含離散小波分析、主成份分析和主成份分析網路。根據分析結果,離散小波分析和主成份分析表現較佳,而主成份分析網路差強人意。
    本研究除探討以上維度縮減方法對此病例資料庫分類之成效外,亦結合線性維度縮減-主成份分析,非線性維度縮減-主成份分析網路,希望能藉重疊法再改善僅做單一維度縮減方法之病例篩檢分錯率,根據分析結果,重疊法對原始資料改善效果不明顯,但對人工處理資料卻有明顯的改善效果。
    In this paper, we study the serum protein data set of prostate cancer, which acquired by Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS) technique. The data set, with four populations of prostate cancer patients, includes both raw data and preprocessed data. There are around 48000 variables in raw data and 779 variables in preprocessed data. The sample size of each data is around 650. Because of the high dimensionality, this data set provokes higher level of difficulty and computation time. Therefore, the goal of this study is to search efficient dimension reduction methods.
    We first compare three classification methods: support vector machine, artificial neural network, and classification and regression tree. And, we use discrete wavelet transform, principal component analysis and principal component analysis networks to reduce the data dimension.
    Then, we discuss the dimension reduction methods and propose overlap method that combines the linear dimension reduction method-principal component analysis, and the nonlinear dimension reduction method-principal component analysis networks to improve the classification result. We find that the improvement of overlap method is significant in the preprocessed data, but not significant in the raw data.
    Reference: 【中文部分】
    [01] 行政院衛生署,「中華民國九十三年臺灣地區死因統計結果摘要」。
    網址:http://www.doh.gov.tw/statistic/data/死因摘要/93年/93.htm
    [02] 彭文正譯,Michael J.A. Berry與Gordon S. Linoff著,資料採礦-顧客關係管理暨電子行銷之應用,數博網資訊股份有限公司,2001年。
    [03] 葉怡成,應用類神經網路,儒林圖書公司,1997年。
    [04] 潘荔錞、蔡志彥和簡志青,「蛋白質體學在臨床醫學之應用」,化工資訊與商情月刊第3期,2003年9月號。
    [05] 賴基銘,「癌症篩檢未來的展望:SELDI血清蛋白指紋圖譜的應用」,國家衛生研究院電子報,第52期,2004年6月25日。
    網址:http://sars.nhri.org.tw/enews/enews_list_new3.php?volume_indx=
    52&enews_dt=2004-06-25
    【英文部分】
    [06] Alpaydin, E. (2004), Introduction to Machine Learning. MIT Press.
    [07] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984), Classification and Regression Trees, Wadsworth.
    [08] Cottrell, G. W., Munro, P. and Zipser, D. (1987), “Learning Internal Representations from Gray-Scale Images: An Example of Extensional Programming”, In Ninth Annual Conference of the Cognitive Science Society, 462-473. Hillsdale, NJ:Erlbsum.
    [09] Cybenko, G. (1989), “Approximation by Superpositions of a Sigmoidal Function,” Mathematical Control Signal Systems, vol.2, 303-314.
    [10] Donoho, D. L. and Johnstone, I. M. (1994), “Ideal Spatial Adaptation by Wavelet Shrinkage”, Biometrika, vol.81, 245-455.
    [11] Donoho, D. L. and Johnstone, I. M. (1995), “Adapting to Unknown Smoothness via Wavelet Shrinkage”, Journal of the American Statistical Association, vol.90, 1200-1224.
    [12] Donoho, D. L. and Johnstone, I. M. (1998), “Minimax Estimation via Wavelet Shrinkage,” Annals of Statistics, vol.26, 879-921.
    [13] Daubechies, I. (1992), Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM:Philadelphia.
    [14] Hornik, K., Stinchcombe, M. and White, H. (1989), Multilayer Feedforward Networks Are Universal Approximations, Neural Networks, vol.2, 336-359.
    [15] Hsu, C-W., Chang, C-C. and Lin, C-J. (2003), “A Practical Guide to Support Vector Classification”.
    Paper available at http://www.csie.ntu.edu.tw/~cjlin/papers.html.
    [16] Huang, T-K., Weng, R. C. and Lin, C-J. (July 2004), “A Generalized Bradley-Terry Model: From Group Competition to Individual Skill”. A short version appears in NIPS.
    [17] Johnson, D. E. (1998), Applied Multivariate Methods for Data Analysts, Pacific Grove, Calif. Dluxbury Press.
    [18] Mallat, S. G. (1989), “A Theory for Multiresolution Signal Decomposition: the Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vo1.11, No.7, 674-693.
    [19] Qu, Y., Adam, B-L., Thornquist, M., Potter, J. D., Thompson, M. L., Yasui, Y., Davis, J., Schellhammer, P. F., Cazares, L., Clements, M. A., Wright, G. L., Jr. and Feng, Z. (March 2003), “Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data”, BIOMETRICS, vol.59, 143-151.
    [20] Rumelhart E., Hinton G. E., and Williams R. J. (1986), Learning Internal Representations by Error Propagation in Parallel Distributed Processing, MIT Press, Cambridge, MA, vol.1, 318-362.
    [21] Vapnik V. N. (1995), The Nature of Statistical Learning Theory, Springer, New York.
    Description: 碩士
    國立政治大學
    統計研究所
    92354012
    93
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0923540121
    Data Type: thesis
    Appears in Collections:[Department of Statistics] Theses

    Files in This Item:

    File SizeFormat
    index.html0KbHTML2401View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback