Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/142646
|
Title: | 概念飄移下的監督式學習:時序特徵與訓練策略 Supervised learning under concept drift: time series features and training strategies |
Authors: | 黃羽婕 Huang, Yu-Chieh |
Contributors: | 莊皓鈞 Chuang, Hao-Chun 黃羽婕 Huang, Yu-Chieh |
Keywords: | 機器學習 訓練策略 概念飄移 時間序列特徵 Machine learning Training strategies Concept drift Time series features |
Date: | 2022 |
Issue Date: | 2022-12-02 15:21:40 (UTC+8) |
Abstract: | 近年來,企業仰賴於機器學習模型的比例提升,而在資料量大幅提升及模型建置環境隨時間變遷的情形下,容易產生概念飄移的狀況,在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下,如何幫助分析者快速地分析時間序列資料,且透過資料特徵鑑別出較好的模型訓練策略,進而改善模型預測效果,其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法,加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段,會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態,同時搭配四種模型的訓練策略手法,觀察模型表現。第二階段中,透過萃取出的時間序列特徵,搭配四種訓練策略的模型表現,找出特定時間序列特徵及訓練策略的關係。根據研究結果,本論文採納的訓練策略手法,在特定時間序列特徵存在的情形下,能有效提升模型預測效果。 In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features. |
Reference: | Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83.
Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79.
Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652.
Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37.
Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629.
Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI.
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363.
Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128.
Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92.
Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301.
Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142.
Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205.
Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522.
Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16.
Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67.
Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/ |
Description: | 碩士 國立政治大學 資訊管理學系 110356010 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0110356010 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202201663 |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
601001.pdf | | 2163Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|