English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113822/144841 (79%)
Visitors : 51789178      Online Users : 625
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 企業管理學系 > 學位論文 >  Item 140.119/67457
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/67457


    Title: 預測模型中遺失值之選填順序研究
    Research of acquisition order of missing values in predictive model
    Authors: 施雲天
    Contributors: 唐揆
    施雲天
    Keywords: 遺失值
    決策樹分類
    uncertainty score
    decision tree
    missing value acquisition
    Date: 2013
    Issue Date: 2014-07-14 11:26:37 (UTC+8)
    Abstract: 預測模型已經被廣泛運用在日常生活中,例如銀行信用評比、消費者行為或是疾病的預測等等。然而不論在建構或使用預測模型的時候,我們都會在訓練資料或是測試資料中遇到遺失值的問題,因而降低預測的表現。面對遺失值有很多種處理方式,刪除、填補、模型建構以及機器學習都是可以使用的方法;除此之外,直接用某個成本去取得遺失值也是一個選擇。
    本研究著重的議題是用某成本去取得遺失值,並且利用決策樹(因為其在建構時可以容納遺失值)來當作預測模型,希望可以找到用較低的成本的填值方法達到較高的準確率。我們延續過去Error Sampling中Uncertainty Score的概念與邏輯。提出U-Sampling來判斷不同特徵值的「重要性排序」。相較於過去Error Sampling用「受試者」(row-based)的重要性來排序。U-Sampling是根據「特徵值」(column-based)的重要性來排序。
    我們用8組UCI machine Learning Repository的資料進行兩組實驗,分別讓訓練資料以及測試資料含有一定比例的遺失值。再利用U-Sampling、Random Sampling以及過去文獻所提及的Error Sampling作準確率和錯誤減少率的比較。實驗結果顯示在訓練資料有遺失值的情況,U-Sampling在70%以上的檔案表現較佳;而在測試資料有遺失值的情況,U-Sampling則是在87.5%的檔案表現較佳。
    另外,我們也研究了對於不同的遺失比例對於上述方法的效果是否有影響,可以用來判斷哪種情況比較適用哪一種選值方法。希望透過U-Sampling,可以先挑選重要的特徵值來填補,用較少的遺失值取得就得到較高的準確率,也因此可以節省處理遺失值的成本。
    Reference: 1. Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.

    2. Alpaydın, E. (2010). Introduction to machine learning. London, England: The MIT Press.

    3. Bennett, D. A. (2001). How can I deal with missing data in my study? Australian and New Zealand Journal of Public Health, 25(5), 464–469.

    4. Giks, Walter R ; Richardson, Sylvia; Spiegelhalter, David J. (1996). Introducing Markov chain Monte Carlo. In Markov chain Monte Carlo in practice (pp. 1-19). London: Chapman & hall/CRC.

    5. Graham, J. W. (2003). Adding missing-data-relevant variables to FIML basedstructural equation models. Structural Equation Modeling, pp. 10, 80–100.

    6. Jackson, J. (2002). Overview, data mining: a conceptual. Communications of the Association for Information Systems.

    7. Kohavi, R. (1995). A study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI, (Vol.14, No.2, pp. 1137-1145).

    8. Levin, N., & Zahav, J. (2001, Spring). Predictive modeling using segmentation. Journal of Interactive Marketing, 15(2), 2-22.

    9. Melville, P., Saar-Tsechansky, M., Provost, F., & Mooney, R. (2004). Active Feature-Value Acquisition for Classifier Induction. Proceedings of the 4th IEEE International Conference on Data Mining, (pp. 483-486). Brighton, UK.

    10. Pallant, J. (2007). SPSS survival manual (3rd ed.). New York, NY: Open University Press.

    11. Pedro J. Garcı´a-Laencina Æ Jose´-Luis Sancho-Go´mez Æ, A. R.-V. (2010). Pattern classification with missing data: a review. Neural Comput & Applic.

    12. Peng, C. Y. J., Harwell, M., Liou, S.M., & Ehman, L.H. (2006). Advances in missing data methods and implications for educational research. In Real data analysis, 31-78. North Carolina,US : Information Age Publishing.

    13. Quinlan, J. R. (1989). Unknown attribute values in induction., In ML (pp. 164-168).

    14. Rubin, D. B. (1987). Multiple imputation for non-response in surveys. New York: John Wiley & Sons.

    15. Saar-Tsechansky, M., Melville, P., & Provost, F. (2009, 4). Active Feature-Value Acqusition. Management Science, 55(4), 664-684.

    16. Schafer, J. L. (1999). Multiple imputation: a primer. Statiscal methods in medical research, 8(1), 3-15.

    17. Schlomer, G. L., Bauman, S., & Card, N. A. (2010). Best Practices for Missing Data Management in Counseling Psychology. Journal of Counseling Psychology, 57(1), 1-10.

    18. Simon, H. A., & Lea, G. (1974). Problem solving and rule induction: A unified view. Knowledge and cognition. Oxford, England: Lawrence Erlbaum.

    19. Simon, H., & Lea, G. (1974). Problem solving and rule induction: A unified view.

    20. Turney, P. (2000). Types of Cost in Inductive Concept Learning. Proceedings of the Cost-Sensitive Learning Workshop at the 17th ICML-2000 Conference. Stanford, CA.




    21. Vinod, N. C., & Punithavalli, D. M. (2011). Classification of Incomplete Data Handling Techniques-An Overview. International Journal on Computer Science and Engineering, 3(1), 340-344.

    22. Zheng, Z., & Padmanabhan, B. (2002). On Active Learning for Data Acquisition. Proceedings of IEEE International Condference on Data Mining, (pp. 562-569).

    網路資料

    1. UCI machine Learning Repository. (n.d.). Retrieved from http://archive.ics.uci.edu/ml/
    Description: 碩士
    國立政治大學
    企業管理研究所
    101355055
    102
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0101355055
    Data Type: thesis
    Appears in Collections:[企業管理學系] 學位論文

    Files in This Item:

    File SizeFormat
    505501.pdf1107KbAdobe PDF2718View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback