Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/130078
|
Title: | 使用深度學習於RGB-D影像之無人飛行載具避障模型 Collision Avoidance Based on RGB-D Images in Unmanned Aerial Vehicles Using Deep Learning Techniques |
Authors: | 林宗賢 Lin, Tsung-Hsien |
Contributors: | 廖文宏 Liao, Wen-Hung 林宗賢 Lin, Tsung-Hsien |
Keywords: | 無人機 避障 深度學習 RGB-D影像 UAV Obstacle avoidance Deep learning RGB-D image |
Date: | 2020 |
Issue Date: | 2020-06-02 11:12:29 (UTC+8) |
Abstract: | 無人機的相關應用越來越廣泛,從原本的國防領域,逐漸被推廣到商業、農業和救災等領域上,使人們的生活日趨便利,在這些應用當中,避障是一個不可或缺的功能,然而使用人為操控的方式無法大規模普及,因此本研究以RGB-D影像與深度學習為基礎,分別為沒有搭載深度攝影機的無人機和有搭載深度攝影機的無人機,提出自動避障的方法。
對於沒有搭載深度攝影機的無人機,本研究從開放的碰撞資料集,使用深度估計模型預測出對應的深度資訊,透過深度資訊在彩色影像中分割出危險、安全等區域,並使用即時語義分割模型進行訓練,將從彩色影像中預測出來的區域分布,透過我們提出的避障機制,使無人機找到一個合適的避障方向。
對於搭載深度攝影機的無人機,本研究使用即時語義分割模型和分群演算法,得到物體的類別和位置資訊,接著使用路徑規劃演算法幫助無人機找出最佳的避障路徑。
本研究所訓練的深度學習模型可以在嵌入式裝置上進行推論,因此我們提出的避障方法將可應用於運算資源有限的無人機。 UAV applications have been extended from the defense sector to commercial, agricultural and disaster relief in recent years. Obstacle avoidance is an essential component for UAV navigation. However, manual manipulation of UAVs is costly in terms of training and human resources. In the thesis, we propose automatic obstacle avoidance mechanisms for UAVs without depth sensors and UAVs with a depth camera based on deep learning techniques. For UAVs not equipped with depth sensors, we employ depth estimation models to compute depth maps from 2D images. The depth information is then used to partition an image into dangerous and safe zones by a real-time semantic segmentation model. Given the zone distribution, the UAV can determine a suitable obstacle avoidance direction to guarantee a collision-free flight. For UAVs with a depth camera, we combine semantic segmentation model and clustering algorithm to obtain the class and location of the obstacles. We then apply path planning algorithm to construct the optimal obstacle avoidance path. All the deep learning models employed in this work meet the requirement of being able to perform inference on embedded systems efficiently. This will ensure the proposed obstacle avoidance algorithms to work on UAVs with limited computing resources. |
Reference: | [1] ImageNet. http://www.image-net.org/, last visited on Dec 2018. [2] ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://www.image-net.org/challenges/LSVRC/, last visited on Dec 2018. [3] Warren S. McCulloch, Walter H. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115-133, 1943. [4] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6), 386-408, 1958. [5] Rumelhart, D. E., Hinton, G. E., Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536, 1986. [6] Michael Nielsen. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/index.html. Last visited on Dec 2018. [7] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/, last visited on Dec 2018 [8] Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, Liangliang Cao, Thomas Huang. Large-scale image classification: Fast feature extraction and SVM training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1689-1696, 2011. [9]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in neural information processing systems, pages 1097-1105, 2012. [10] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Going Deeper with Convolutions. arXiv:1409.4842v1, 2014. [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. IEEE, pages 770-778, 2016. [12] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,”J. Physiol. London 148, 574–591 (1959). [13] F. Chollet. Xception: Deep learning with depth wise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [14] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. arXiv:1707.07012, 2017. [15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 4510–4520, 2018. [16] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. arXiv:1905.02244, 2019. [17] Keras Documentation. https://keras.io/applications/, last visited on Feb 2020. [18] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. [19] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception architecture for computer vision. arXiv:1512.00567, 2015. [20] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations(ICLR), 2017. [21] CIFAR-10. https://www.cs.toronto.edu/~kriz/cifar.html, last visited on Dec 2019. [22] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv:1709.01507, 2017. [23] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. arXiv:1808.00897, 2018. [24] Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin. HarDNet: A low memory traffic network. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019. [25] Real-Time Semantic Segmentation on Cityscapes test. https://paperswithcode.com/sota/real-time-semantic-segmentation-on-cityscapes/, last visited on Feb 2020. [26] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431-3440, 2015. [27] A. Loquercio, A. I. Maqueda, C. R. del-Blanco, and D. Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters 3, 1088-1095, 2018. [28] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learning Res. 15, 1929–1958, 2014. [29] Glorot, X., Bordes, A., Bengio. Y. Deep sparse rectifier neural networks. Proc. 14th International Conference on Artificial Intelligence and Statistics 315–323, 2011. [30] Udacity. An Open Source Self-Driving Car. https://www.udacity.com/self-driving-car, 2016. Last visited on Dec 2018. [31] A. Giusti, J. Guzzi, D. C. Cirean, F. L. He, J. P. Rodrguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 2016. [32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. [33] Zhengqi Li, Noah Snavely. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [34] W. Chen, Z. Fu, D. Yang, J. Deng. Single-image depth perception in the wild. Neural Information Processing Systems, pages 730–738, 2016. [35] J. L. Schonberger, J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016. [36] J. L. Schonberger, E. Zheng, J.-M. Frahm, M. Pollefeys. Pixelwise view selection for unstructured multi-view stereo. In Proc. European Conf. on Computer Vision (ECCV), pages 501–518, 2016. [37] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [38] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [39] D. Eigen, R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. Int. Conf. on Computer Vision (ICCV), pages 2650–2658, 2015. [40] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. on 3D Vision (3DV), pages 239–248, 2016. [41] D. Eigen, C. Puhrsch, R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Neural Information Processing Systems, pages 2366–2374, 2014. [42] A. Saxena, S. H. Chung, A. Y. Ng. Learning depth from single monocular images. In Neural Information Processing Systems, volume 18, pages 1–8, 2005. [43] C. Godard, O. Mac Aodha, G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [44] Geiger, Andreas, Lenz, Philip, Stiller, Christoph, and Urtasun, Raquel. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research, 32(11), 2013. [45] R. P. Mihail, S. Workman, Z. Bessinger, and N. Jacobs. Sky segmentation in the wild: An empirical study. In Proceedings of IEEE Winter Conference on Applications of Computer Vision(WACV), pages 1–6, 2016. [46] Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [47] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018. [48] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. IJCV, pages 303–338, 2010. [49] D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pages 727–734, 2000. [50] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. arXiv:1610.02391, 2016. [51] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. |
Description: | 碩士 國立政治大學 資訊科學系 106753008 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0106753008 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202000432 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
300801.pdf | | 6573Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|