English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113822/144841 (79%)
Visitors : 51778038      Online Users : 431
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/141035
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/141035


    Title: 基於圖神經網路提取惡意程式家族序列特徵
    Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network
    Authors: 朱柏瑜
    Chu, Po-Yu
    Contributors: 蕭舜文
    Hsiao, Shun-Wen
    朱柏瑜
    Chu, Po-Yu
    Keywords: 圖神經網路
    注意力機制
    序列型資料
    馬可夫模型
    Graph neural network
    Attention
    Sequential data
    Markov model
    Date: 2022
    Issue Date: 2022-08-01 17:22:11 (UTC+8)
    Abstract: 由於惡意程式對我們的生活及電子裝置帶來許多危害,因此我們迫切的想了解惡意程式的行為及他們可能造成的危害。惡意程式所產生的紀錄檔大多是帶有時間戳記的不定長度文字型資料,像是事件紀錄檔或是動態分析紀錄檔。我們可以利用時間戳記將紀錄檔排序成序列型資料以利後續分析。然而,要處理這種可變長度的文字型序列資料是非常困難的。除此之外,在資訊安全領域中大多數的序列型資料都有特殊的屬性或是結構,例如:迴圈、重複調用及雜訊等自然語言中不會有的特性與結構。為了深入分析應用程式介面(API)調用序列及結構,本研究使用圖(如馬可夫模型)來深究隱含在序列中的資訊與結構。因此本研究設計並實作了注意力感知圖神經網路(AWGCN)來分析應用程式介面調用序列。透過注意力感知圖神經網路的訓練,我們可以得到序列嵌入用以分析惡意程式之行為。此外,在調用類型資料集的家族分類實驗中,注意力感知圖神經網路的準確度優於其他分類器,且序列嵌入也能增進經典模型的表現。
    Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model’s performance.
    Reference: T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.

    Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International conference on machine learning. PMLR, 2014, pp. 1188–1196.

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.

    C. Beek, T. Dunton, J. Fokker, S. Grobman, T. Hux, T. Polzer, M. Rivero, T. Roccia, J. Saavedra-Morales, R. Samani et al., “Mcafee labs threats report: August 2019,” McAfee Labs, 2019.

    “Malware statistics &; trends report: Av-test.” [Online]. Available: https://www.av-test.org/en/statistics/malware/

    S. Alam, R. N. Horspool, I. Traore, and I. Sogukpinar, “A framework for metamorphic malware analysis and real-time detection,” computers & security, vol. 48, pp.212–233, 2015.

    M. Akbanov, V. G. Vassilakis, and M. D. Logothetis, “Wannacry ransomware: Analysis of infection, persistence, recovery prevention and propagation mechanisms,” Journal of Telecommunications and Information Technology, 2019.

    H. Sinanović and S. Mrdovic, “Analysis of mirai malicious software,” in 2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), 2017, pp. 1–5.

    Y. Pan, X. Ge, C. Fang, and Y. Fan, “A systematic literature review of android malware detection using static analysis,” IEEE Access, vol. 8, pp. 116 363–116 379, 2020.

    M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A survey on automated dynamic malware-analysis techniques and tools,” ACM computing surveys (CSUR), vol. 44, no. 2, pp. 1–42, 2008.

    R. C. Edgar and S. Batzoglou, “Multiple sequence alignment,” Current opinion in structural biology, vol. 16, no. 3, pp. 368–373, 2006.

    R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi, “Microsoft malware classification challenge,” arXiv preprint arXiv:1802.10135, 2018.

    M. K. Shankarapani, S. Ramamoorthy, R. S. Movva, and S. Mukkamala, “Malware detection using assembly and api call sequences,” Journal in computer virology, vol. 7, no. 2, pp. 107–119, 2011.

    Y. Ki, E. Kim, and H. K. Kim, “A novel approach to detect malware based on api call sequence analysis,” International Journal of Distributed Sensor Networks, vol. 11, no. 6, p. 659101, 2015.

    “Bert (language model),” https://en.wikipedia.org/wiki/BERT_(language_model), accessed Jun. 26, 2022.

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

    “Transformer (machine learning model),” https://en.wikipedia.org/wiki/Transformer_(machine_learning_model), accessed Jun. 26, 2022.

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

    M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An end-to-end deep learning architecture for graph classification,” in Thirty-second AAAI conference on artificial intelligence, 2018.

    J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” 2013. [Online]. Available: https://arxiv.org/abs/1312.6203

    Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.

    P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.

    W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.

    M. Reddy, API Design for C++. Elsevier, 2011.

    J. Wulf and I. Blohm, “Fostering value creation with digital platforms: A unified theory of the application programming interface design,” Journal of Management Information Systems, vol. 37, no. 1, pp. 251–281, 2020.

    E. Amer and I. Zelinka, “A dynamic windows malware detection and prediction method based on contextual understanding of api call sequence,” Computers & Security, vol. 92, p. 101760, 2020.

    M. Alazab, S. Venkatraman, P. Watters, M. Alazab et al., “Zero-day malware detection based on supervised learning algorithms of api call signatures,” 2010.

    “Markov chain,” https://en.wikipedia.org/wiki/Markov_chain, accessed Jun. 26, 2022.

    N. C. for High-performance Computing(NCHC) and T. C. S. I. R. Team(TWCSIRT)., “Malware knowledge base,” https://owl.nchc.org.tw/about.php, accessed May. 22, 2022.

    S.-W. Hsiao and Y.-J. Lee, “Nn-based feature selection for text-based sequential data,” 2020.

    A. Oliveira, “Malware analysis datasets: Api call sequences,” 2019. [Online]. Available: https://dx.doi.org/10.21227/tqqm-aq14

    “Cuckoo,” https://cuckoosandbox.org/, accessed Jun. 28, 2022.

    “Adware.loadmoney,” https://blog.malwarebytes.com/detections/ adware-loadmoney/, accessed Jun. 29, 2022.

    “Adware.graftor,” https://blog.malwarebytes.com/detections/adware-graftor/, accessed Jun. 29, 2022.
    Description: 碩士
    國立政治大學
    資訊管理學系
    109356020
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0109356020
    Data Type: thesis
    DOI: 10.6814/NCCU202200886
    Appears in Collections:[資訊管理學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    602001.pdf2546KbAdobe PDF20View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback