政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/110650
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113303/144284 (79%)
造访人次 : 50817973      在线人数 : 734
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/110650


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/110650


    题名: 以逐步SVM縮減p大n小資料型態之維度
    Dimension reduction of large p small n data set based on stepwise SVM
    作者: 柯子惟
    Ko, Tzu Wei
    贡献者: 周珮婷
    柯子惟
    Ko, Tzu Wei
    关键词: 維度縮減
    特徵選取
    p大n小資料型態
    逐步SVM
    Stepwise SVM
    Dimension reduction
    Feature selection
    Large p small n data set
    日期: 2017
    上传时间: 2017-07-03 14:35:01 (UTC+8)
    摘要: 本研究目的為p大n小資料型態的維度縮減,提出逐步SVM方法,並與未刪減任何變數之研究資料和主成份分析 (PCA)、皮爾森積差相關係數(PCCs)以及基於隨機森林的遞迴特徵消除(RF-RFE) 維度縮減法進行比較,並探討逐步SVM是否能篩選出較能區別樣本類別的特徵集合。研究資料為六筆疾病相關的基因表現以及生物光譜資料。
    首先,本研究以監督式學習下使用逐步SVM做特徵選取,從篩選的結果來看,逐步SVM確實能有效從所有變數中萃取出對於樣本的分類上擁有較高重要性之特徵。接著將研究資料分為訓練和測試集,再以半監督式學習下使用逐步SVM、PCA、PCCs和RF-RFE縮減各研究資料之維度,最後配適SVM模型計算預測率,重複以上動作100次取平均當作各維度縮減法的最終預測正確率。觀察計算結果,本研究發現使用逐步SVM所得之預測正確率均優於未處理之原始資料,而與其他方法相比,逐步SVM的穩定度優於PCA和RF-RFE,和PCCs相比則較難看出差異。本研究認為對p大n小資料型態進行維度縮減是必要的,因其能有效消除資料中的雜訊以提升模型整體的預測準確率。
    參考文獻: Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.
    Bellman, R. E. (2015). Adaptive Control Processes: A Guided Tour: Princeton University Press.
    Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory, Pittsburgh, Pennsylvania, USA.
    Boulesteix, A.-L. (2004). PLS Dimension Reduction for Classification with Microarray Data Statistical applications in genetics and molecular biology (Vol. 3, pp. 1).
    Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018
    Cunningham, P. (2008). Dimension Reduction. In M. Cord & P. Cunningham (Eds.), Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval (pp. 91-112). Berlin, Heidelberg: Springer Berlin Heidelberg.
    Dai, J. J., Lieu, L., & Rocke, D. (2006). Dimension reduction for classification with gene expression microarray data. Statistical applications in genetics and molecular biology, 5(1), 1147.
    Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., . . . Bueno, R. (2002). Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research, 62(17), 4963-4967.
    Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83-90.
    Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
    Guyon, I., Gunn, S. R., Ben-Hur, A., & Dror, G. (2004). Result Analysis of the NIPS 2003 Feature Selection Challenge. Paper presented at the NIPS.
    Hedenfalk , I., Duggan , D., Chen , Y., Radmacher , M., Bittner , M., Simon , R., . . . Trent , J. (2001). Gene-Expression Profiles in Hereditary Breast Cancer. New England Journal of Medicine, 344(8), 539-548. doi:10.1056/nejm200102223440801
    Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., . . . Peterson, C. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine, 7(6), 673-679.
    Lee Rodgers, J., & Nicewander, W. A. (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42(1), 59-66. doi:10.1080/00031305.1988.10475524
    Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297-2307.
    Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2(11), 559-572. doi:10.1080/14786440109462720
    Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.
    Bellman, R. E. (2015). Adaptive Control Processes: A Guided Tour: Princeton University Press.
    Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory, Pittsburgh, Pennsylvania, USA.
    Boulesteix, A.-L. (2004). PLS Dimension Reduction for Classification with Microarray Data Statistical applications in genetics and molecular biology (Vol. 3, pp. 1).
    Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018
    Cunningham, P. (2008). Dimension Reduction. In M. Cord & P. Cunningham (Eds.), Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval (pp. 91-112). Berlin, Heidelberg: Springer Berlin Heidelberg.
    Dai, J. J., Lieu, L., & Rocke, D. (2006). Dimension reduction for classification with gene expression microarray data. Statistical applications in genetics and molecular biology, 5(1), 1147.
    Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., . . . Bueno, R. (2002). Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research, 62(17), 4963-4967.
    Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83-90.
    Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
    Guyon, I., Gunn, S. R., Ben-Hur, A., & Dror, G. (2004). Result Analysis of the NIPS 2003 Feature Selection Challenge. Paper presented at the NIPS.
    Hedenfalk , I., Duggan , D., Chen , Y., Radmacher , M., Bittner , M., Simon , R., . . . Trent , J. (2001). Gene-Expression Profiles in Hereditary Breast Cancer. New England Journal of Medicine, 344(8), 539-548. doi:10.1056/nejm200102223440801
    Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., . . . Peterson, C. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine, 7(6), 673-679.
    Lee Rodgers, J., & Nicewander, W. A. (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42(1), 59-66. doi:10.1080/00031305.1988.10475524
    Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297-2307.
    Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2(11), 559-572. doi:10.1080/14786440109462720
    Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C., . . . Pinkus, G. S. (2002). Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 8(1), 68-74.
    Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. Data Classification: Algorithms and Applications, 37.
    Tin Kam, H. (1995, 14-16 Aug 1995). Random decision forests. Paper presented at the Proceedings of 3rd International Conference on Document Analysis and Recognition.
    Xu, X., & Wang, X. (2005). An Adaptive Network Intrusion Detection Method Based on PCA and Support Vector Machines. In X. Li, S. Wang, & Z. Y. Dong (Eds.), Advanced Data Mining and Applications: First International Conference, ADMA 2005, Wuhan, China, July 22-24, 2005. Proceedings (pp. 696-703). Berlin, Heidelberg: Springer Berlin Heidelberg.
    Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17(9), 763-774. doi:10.1093/bioinformatics/17.9.763
    林宗勳,Support Vector Machine簡介
    描述: 碩士
    國立政治大學
    統計學系
    104354021
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0104354021
    数据类型: thesis
    显示于类别:[統計學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    402101.pdf186784KbAdobe PDF2163检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈