English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113148/144119 (79%)
Visitors : 50709142      Online Users : 287
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/142646
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/142646


    Title: 概念飄移下的監督式學習:時序特徵與訓練策略
    Supervised learning under concept drift: time series features and training strategies
    Authors: 黃羽婕
    Huang, Yu-Chieh
    Contributors: 莊皓鈞
    Chuang, Hao-Chun
    黃羽婕
    Huang, Yu-Chieh
    Keywords: 機器學習
    訓練策略
    概念飄移
    時間序列特徵
    Machine learning
    Training strategies
    Concept drift
    Time series features
    Date: 2022
    Issue Date: 2022-12-02 15:21:40 (UTC+8)
    Abstract: 近年來,企業仰賴於機器學習模型的比例提升,而在資料量大幅提升及模型建置環境隨時間變遷的情形下,容易產生概念飄移的狀況,在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下,如何幫助分析者快速地分析時間序列資料,且透過資料特徵鑑別出較好的模型訓練策略,進而改善模型預測效果,其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法,加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段,會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態,同時搭配四種模型的訓練策略手法,觀察模型表現。第二階段中,透過萃取出的時間序列特徵,搭配四種訓練策略的模型表現,找出特定時間序列特徵及訓練策略的關係。根據研究結果,本論文採納的訓練策略手法,在特定時間序列特徵存在的情形下,能有效提升模型預測效果。
    In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features.
    Reference: Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83.

    Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79.

    Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652.

    Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295.

    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37.

    Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629.

    Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI.

    Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363.

    Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128.

    Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92.

    Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301.

    Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142.

    Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205.

    Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522.

    Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16.

    Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67.

    Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/
    Description: 碩士
    國立政治大學
    資訊管理學系
    110356010
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0110356010
    Data Type: thesis
    DOI: 10.6814/NCCU202201663
    Appears in Collections:[資訊管理學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    601001.pdf2163KbAdobe PDF20View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback