Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/124874
|
Title: | 以兩層式機器學習進行連網設備識別 Two-Level Machine Learning for Network Enabled Devices Identification |
Authors: | 吳明倫 Wu, Ming-Lun |
Contributors: | 胡毓忠 Hu, Yuh-Jong 吳明倫 Wu, Ming-Lun |
Keywords: | 物聯網 連網設備 資訊安全 兩層式機器學習 半監督式學習 網路掃描資料 支援向量機 隨機森林 二元分類器 IoT Network Enabled Devices Cyber Security Two-level Machine Learning Semi-supervised Learning Censys Network Scan Data Support Vector Machine Random Forest Binary Classifier |
Date: | 2019 |
Issue Date: | 2019-08-07 16:36:36 (UTC+8) |
Abstract: | 隨著物聯網技術的蓬勃發展,網路上連網設備數量呈現爆炸性的成長,提供的服務也更為多元,使人們的生活更方便。然連網設備產品的設計不良及資安防護能力的缺乏,使設備漏洞遭駭客利用的事件層出不窮,導致充斥連網設備的家庭及企業網路環境面臨重大資安威脅。為了瞭解目標網路內連接有多少具有潛在風險的連網設備,藉由連網設備識別來瞭解網路狀況便是資安防護的第一步。本研究希望探索以兩層式機器學習(Two-level Machine Learning)的技術,用於處理量體龐大且具有階層式資料(Hierarchical Structure Data)特性的連網設備資料上,並比較與目前常用的單層式機器學習間的差異,加上結合半監督式學習的概念,探索自動處理受歸類為未知設備的可能性。
本研究使用 Censys 網路掃描資料集來進行支援向量機(Support Vector Machine)及隨機森林(Random Forest)兩種分類演算法的二元分類器訓練,進而對連網設備資料進行分類;並採半監督式學習概念,嘗試找出以基於密度的分群演算法來處理受歸類為未知類別設備的最佳參數。最後透過多項模擬實驗來驗證與比較在這個應用問題中,兩種分類演算法及單層與兩層式機器學習之間的差異,並就實驗成果提出相關量化與質化的觀察結果。 With the rapid development of Internet of Things technology, the number of network enabled devices on the Internet has exploded and the services provided have become more diverse, making people`s lives more convenient. However, the poor design of network enabled devices and the lack of security protection capabilities have led to an endless stream of equipment exploits by hackers, which has led to major security threats to home and corporate network environments that are full of network enabled devices. In order to understand how many potentially network enabled devices are connected to the target network, it is the first step of security protection to understand the network status through network enabled devices identification. This study hopes to explore the technology of two-level machine learning, which is used to process network enabled devices with large volume and hierarchical structure data characteristics, then compare differences with common single-level machine learning. Combined with the concept of semi-supervised learning to explore the possibility of automatically classifying objects which are classified as unknown device.
This study uses the Censys network scan dataset to perform binary classifier training with Support Vector Machine and Random Forest classification algorithms, and then classifies the network enabled devices. With semi-supervised learning concepts, trying to find out the best parameters for classified unknown devices by density-based clustering algorithms. Finally, through a number of simulation experiments to verify and compare the differences between the two classification algorithms and single-level and two-level machine learning in this application problem, then provides relevant quantitative and qualitative observations on the experimental results. |
Reference: | [1] Y. Yuchen et al. A survey on security and privacy issues in internet-of-things. IEEE Internet of Things Journal, 4(5):1250-1258, 2017. [2] A. Gupta et al. dkk.,(2013), vulnerability assessment and penetration testing. International Journal of Engineering Trends and Technology, 4(3-2013), 2013. [3] Susan Dumais and Hao Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256-263. ACM, 2000. [4] Huei Chen Wu. A study on multi-layered automatic book classification system using data mining. Master`s thesis, National Chung Hsing University, 2015. [5] O. Papadopoulou et al. A Two-Level Classification Approach for Detecting Clickbait Posts using Text-Based Features. arXiv preprint arXiv:1710.08528, 2017. [6] O. Chapelle et al. Semi-Supervised Learning. The MIT Press, 1st edition, 2010. [7] Levi Lelis and Jörg Sander. Semi-supervised density-based clustering. In 2009 Ninth IEEE International Conference on Data Mining, pages 842-847. IEEE, 2009. [8] M. Ester et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231, 1996. [9] Kishore Angrishi. Turning internet of things (iot) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681, 2017. [10] Keaton Mower and Hovav Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, pages 1-12, 2012. [11] Z. Durumeric et al. Zmap: Fast internet-wide scanning and its security applications. In Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), pages 605-620, 2013. [12] Z. Durumeric et al. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 542-553. ACM, 2015. [13] D. Arora et al. Big Data Analytics for Classification of Network Enabled Devices. In 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pages 708-713, March 2016. [14] M. Miettinen et al. IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2177-2184, June 2017. [15] B. Genge et al. ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services. Security and communication networks, 9(15):2696-2714, 2016. [16] S. Shaikh et al. Implementation of dbscan algorithm for internet traffic classification. International Journal of Computer Science and Information Technology Research (IJCSITR), pages 25-32, 2013. [17] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861-874, 2006. [18] Arie Ben-David. About the relationship between roc curves and cohen`s kappa. Engineering Applications of Artificial Intelligence, 21(6):874-882, 2008. [19] Ka Yee Yeung and Walter L Ruzzo. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763-774, 2001. [20] Leland McInnes and John Healy. Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 33-42. IEEE, 2017. |
Description: | 碩士 國立政治大學 資訊科學系 106753015 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0106753015 |
Data Type: | thesis |
DOI: | 10.6814/NCCU201900635 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
301501.pdf | 2085Kb | Adobe PDF2 | 1 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|