政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/146310
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113822/144841 (79%)
造访人次 : 51832458      在线人数 : 519
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/146310


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/146310


    题名: 異常偵測方法比較分析
    Comparative Analysis of Anomaly Detection Methods
    作者: 林映孝
    Lin, Ying-Hsiao
    贡献者: 周珮婷
    陳怡如

    Chou, Pei-Ting
    Chen,Yi-Ju

    林映孝
    Lin, Ying-Hsiao
    关键词: 異常偵測
    實證實驗
    效果評估
    模型比較
    集成投票
    Anomaly Detection
    Empirical Experiment
    Performance Evaluation
    Model Comparison
    Ensemble Voting
    日期: 2023
    上传时间: 2023-08-02 13:05:05 (UTC+8)
    摘要: 異常偵測是機器學習和數據分析領域的重要挑戰之一,目前在實務上多數應用於欺詐偵測、網絡安全和故障診斷等不同領域。
    首先,本研究探討各種異常偵測方法的運作原理、優點和缺點。例如,One-Class SVM適用於高維度數據,但需要仔細選擇kernal function和參數。Gaussian Mixture Model能夠擬合複雜的資料分佈,但需要大量的參數估計。
    接著,本研究比較分析了六種不同的異常偵測技術,分別是One-Class SVM, Gaussian Mixture Model, Autoencoder, Isolation Forest, Local Outlier Factor,以及Ensemble Voting前五種方法。並將六種模型應用在五個不同的數據集上進行了實證實驗,以F1-score和Balanced Accuracy,評估每種模型方法在不同數據上的表現。
    最後,研究結果顯示,Isolation Forest在特定某些數據集上表現出相當的性能,但是Ensemble Voting的模型在每個數據集上皆表現優異。
    Anomaly detection is one of the significant challenges in the fields of machine learning and data analysis. It is primarily applied in various practical domains like fraud detection, cybersecurity, and fault diagnosis.
    Initially, this study explores the operational principles, advantages, and disadvantages of various anomaly detection methods. For instance, the One-Class SVM is suitable for high-dimensional data, yet careful selection of the kernel function and parameters is required. The Gaussian Mixture Model can fit complex data distributions, but it requires numerous parameter estimations.
    Subsequently, this research conducts comparative analyses of six different anomaly detection techniques, namely One-Class SVM, Gaussian Mixture Model, Autoencoder, Isolation Forest, Local Outlier Factor, and Ensemble Voting of the former five methods. The six models are tested empirically on five different datasets, with their performance on each dataset evaluated using F1-score and Balanced Accuracy.
    Ultimately, the research findings indicate that while the Isolation Forest demonstrates substantial performance on certain specific datasets, the Ensemble Voting model performs excellently across all datasets.
    參考文獻: Berk, R. A. (2006). An introduction to ensemble methods for data analysis. Sociological methods research, 34(3):263–295.
    Breiman, L. (1996). Bagging predictors. Machine learning, 24:123–140.
    Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). Lof: identifying density-
    based local outliers. SIGMOD Rec., 29(2):93–104.
    Chalapathy, R. and Chawla, S. (2019). Deep learning for anomaly detection: A survey.
    arXiv preprint arXiv:1901.03407.
    Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM
    computing surveys (CSUR), 41(3):1–58.
    Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from in- complete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–38.
    Gandhi, I. and Pandey, M. (2015). Hybrid ensemble of classifiers using voting. In 2015 international conference on green computing and Internet of Things (ICGCIoT), pages 399–404. IEEE.
    Ghahramani, Z. (2004). Unsupervised learning. In Advanced Lectures on Machine Learn- ing: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003, Revised Lectures, pages 72–112.
    Han, S., Hu, X., Huang, H., Jiang, M., and Zhao, Y. (2022). Adbench: Anomaly detection benchmark.
    Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507.
    Khan, S. and Madden, M. (2014). One-class classification: Taxonomy of study and review of techniques. The Knowledge Engineering Review, 29(3):345–374.
    Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., and Bringas, P. G. (2014). Study on the effectiveness of anomaly detection for spam filtering. Information Sci- ences, 277:421–444.
    Learned-Miller, E. G. (2014). Introduction to supervised learning. I: Department of Com- puter Science, University of Massachusetts. 3.
    Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413–422.
    Markou, M. and Singh, S. (2003). Novelty detection: a review—part 1: statistical ap- proaches. Signal Processing, 83:2481–2497.
    Rushe, E. and Namee, B. M. (2019). Anomaly detection in raw audio using deep autore- gressive networks. In ICASSP 2019 - 2019 IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), pages 3597–3601.
    Schapire, R. E. (1999). A brief introduction to boosting. In IJCAI, volume 99, pages 1401–1406.
    Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Compu- tation, 13(7):1443–1471.
    Scrucca, L. (2023). Entropy-based anomaly detection for gaussian mixture modeling. Algorithms, 16(4):195.
    Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
    van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of Ma- chine Learning Research, 9(86):2579–2605.
    Vareldzhan, G., Yurkov, K., and Ushenin, K. (2021). Anomaly detection in image datasets using convolutional neural networks, center loss, and mahalanobis distance. In 2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Tech- nology (USBEREIT), pages 0387–0390.
    描述: 碩士
    國立政治大學
    統計學系
    110354025
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0110354025
    数据类型: thesis
    显示于类别:[統計學系] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    402501.pdf2293KbAdobe PDF20检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈