Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/125535
|
Title: | filterNN: 基於神經網路之序列資料特徵選取方法 filterNN: NN-based Feature Selection from Sequential Data |
Authors: | 吳皓銘 Wu, Hao-Ming |
Contributors: | 蕭舜文 Hsiao, Shun-Wen 吳皓銘 Wu, Hao-Ming |
Keywords: | 神經網路 特徵擷取 序列資料 Neural Network Feature extraction Feature selection Sequential data |
Date: | 2019 |
Issue Date: | 2019-09-05 15:45:56 (UTC+8) |
Abstract: | 我們設計了一個新的神經網絡架構,它由兩部分組成,過濾器和分類器。我們實現了三種過濾器,可以過濾掉不必要的輸入數據。過濾後的數據將被輸入後一種分類器,以實現最高的訓練精度。由於過濾器和分類器一起訓練,因此,過濾器將保持輸入,這有助於分類器執行分類。因此,剩餘的輸入數據可以被視為該類的特徵。我們還設計了三個成本函數來實現不同的目的,1)濾波輸入可以盡可能少,2)濾波輸入可以更連續,3)分類器可以實現最高的訓練精度。這三個學習目標相互衝突,因此我們在本研究中展示了調整過程以實現最佳性能。我們使用基於文本的順序數據來測試所提出的神經網路架構的有用性。使用基於文本的順序數據是從現實世界收集的惡意軟件執行API調用。研究表明,所提出的神經網路架構有助於處理基於文本的序列數據,並將過濾域專家的特徵以進行進一步分析。 We design a new Neural Network architecture which consists of two parts, filter, and classifier. We implement three kinds of filters, which can filter out unnecessary input data. The filtered data will be fed into the latter classifier to achieve the highest training accuracy. Because of the filter and classifier are trained together, thus, the filter will keep the inputs which help classifier to perform the classification. Therefore, the remaining input data can be viewed as the characteristic of the class. We also design three cost function to achieve different purpose, 1) the filtered inputs could be as less as possible, 2) the filtered inputs could be more consecutive as possible, 3) the classifier could achieve the highest training accuracy as possible. The three learning goals are in conflict with each other, so we demonstrate the tuning process in this research to achieve the best performance. We use text-based sequential data to test the usefulness of the proposed NN architecture. The use of text-based sequential data is malware execution API calls which are collected from the real world. The research shows that the proposed NN architecture is helpful for dealing with text-based sequential data and will filter the characteristic for domain experts to perform further analyze. |
Reference: | [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proc. of the IEEE, 1998, pp. 2278-2324. [2] S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff, “A sense of self for unix processes,” in Proc 1996 IEEE Symp. on Security and Privacy (S&P) , 1996, pp. 120-128. [3] S. Hofmeyr, S. Forrest and A. Somayaji, “Intrusion detection using sequences of system calls,” Journal of Computer Security, vol. 6, no. 3, pp. 151-180, 1998. [4] W. Lee and S. J. Stolfo, “Data Mining Approaches for Intrusion Detection,” in Proc. USENIX Security Symp., 1998, pp. 79-93. [5] C. Kruegel, D. Mutz, F. Valeur, and G. Vigna,“On the Detection of Anomalous System Call Arguments,” in Proc. European Symp. on Research in Computer Security (ESORICS), 2003, pp. 101-118. [6] U. Bayer, C. Kruegel, and E. Kirda, “TTAnalyze: A Tool for Analyzing Malware,” in Proc. European Institute for Computer Antivirus Research Annual Conference (EICAR), 2006, pp. 180-192. [7] U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda, “Scalable, Behavior-Based Malware Clustering,” in Proc. Network and Distributed System Security Symp. (NDSS) , 2009, pp. 8-11. [8] C. Willems, T. Holz, and F. Freiling, “Toward Automated Dynamic Malware Analysis Using CWSandbox,” in IEEE Security and Privacy Magazine, vol. 5, no. 2, pp. 32-39, 2007. [9] S. W. Hsiao, Y. N. Chen, Y. S. Sun, and M. C. Chen, “A Cooperative Botnet Profiling and Detection in Virtualized Environment,” in Proc. IEEE Conf. on Communications and Network Security (IEEE CNS), 2013, pp. 154-162. [10] S. Hou, Y. Ye, Y. Song, and M. Abdulhayoglu, “Hindroid: An intelligent android malware detection system based on structured heterogeneous information network,” in Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 1507-1515. [11] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial perturbations against deep neural networks for malware classification,” arXiv: 1606.04435, 2016. [12] Q. Wang, W. Guo, K. Zhang, A. G. Ororbia II, X. Xing, X. Liu, and C. L. Giles, “Adversary resistant deep neural networks with an application to malware detection,” in Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 1145-1153. [13] G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale Malware Classification Using Random Projections and Neural Networks,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 3422-3426. [14] D. Gibert, “Convolutional neural networks for malware classification,” Master’s thesis, Universitat Politcnica de Catalunya, 2016. [Online]. Available: https://pdfs.semanticscholar.org/b692/8c4317c295f884fc70385fa9177a0b9fe1fb.pdf [15] G. J. Tesauro, J. O. Kephart and G.B. Sorkin, “Neural networks for computer virus recognition,” in IEEE Expert , vol. 11, no. 4, pp. 5-6, 1996. [16] O. Sornil and C. Liangboonprakong, “Malware Classification Using N-grams Sequential Pattern Feature,” International Journal of Information Processing and Management, vol. 4, no. 5, pp. 59-67, 2013. [17] C. Ravi and R. Manoharan, “Malware Detection using Windows API Sequence and Machine Learning,” International Journal of Computer Applications, vol. 43, no. 17, pp. 12-16, 2012. [18] R. Veeramani and N. Rai, “Windows api based malware detection and framework analysis,” in Int. Conf. on networks and cyber security, 2012, pp. 25-29. [19] J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional binary program features,” in Proc. Int. Conf. on Malicious and Unwanted Software (MALCON), 2015, pp. 11-20. [20] B. Athiwaratkun and J. Stokes, “Malware classification with LSTM and GRU language models and a character-level CNN”, in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 2482-2486. [21] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, and T. Yagi, “Malware detection with deep neural network using process behavior,” in 2016 IEEE 40th Annual Computer Software and Applications Conf. (COMPSAC), 2016, pp. 577-582. [22] W. Huang, and J. W. Stokes, “MtNet: a multi-task neural network for dynamic malware classification,” in Int. Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), 2016, pp. 399-418. [23] S. Seok and H. Kim, “Visualized Malware Classification Based-on Convolutional Neural Network,” Journal of the Korea Institute of Information Security and Cryptology, vol. 26, no. 1, pp. 197-208, 2016. [24] J. Vaughan, A. Sudjianto, E. Brahimi, J. Chen and V. N. Nair, “Explainable neural networks based on additive index models,”arXiv preprint arXiv:1806.01933, 2018. [25] J. Masci, U. Meier, D. Cirean and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” in International Conference on Artificial Neural Networks, Springer, Berlin, Heidelberg, 2011, pp. 52-59. [26] “Malware Knowledge Base”, Owl.nchc.org.tw, 2018. [Online]. Available: https://owl.nchc.org.tw/. [Accessed: 14- Apr- 2018]. [27] “VirusTotal”, Virustotal.com, 2018. [Online]. Available: https://www.virustotal.com/zh-tw/. [Accessed: 14- Apr- 2018]. [28] “Build a Convolutional Neural Network using Estimators — TensorFlow”, TensorFlow, 2019. [Online]. Available: https://www.tensorflow.org/tutorials/layers. [Accessed: 24- Feb- 2019]. https://www.tensorflow.org/tutorials/layers [29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097-1105. [30] U. Bioinformatics Laboratory, “Orange Data Mining Fruitful & Fun”, Orange.biolab.si, 2019. [Online]. Available: https://orange.biolab.si/. [Accessed: 24- Feb- 2019]. [31] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” in Journal of Molecular Biology, vol. 147, no. 1, pp. 195-197, 1981. [32] W. J. Chiu, “Automated Malware Family Signature Generation based on Runtime API Call Sequence,”M.S. thesis, Dept. of Info. Mngmt, National Taiwan Univ., Taipei, 2018. Accessed on: Jul. 10, 2019. [33] Saul B. Needleman and Christian D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, 48 (3): 44353., doi:10.1016/0022-2836(70)90057-4., PMID 5420325, 1970. [34] R. W. Hamming, “Error detecting and error correcting codes,” The Bell System Technical Journal, 29 (2): 147160., doi:10.1002/j.1538- 7305.1950.tb00463.x., ISSN 0005-8580, April 1950. [35] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. |
Description: | 碩士 國立政治大學 資訊管理學系 106356005 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G1063560051 |
Data Type: | thesis |
DOI: | 10.6814/NCCU201900745 |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
005101.pdf | 1957Kb | Adobe PDF2 | 99 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|