Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/142123
|
Title: | 基於深度學習之衛星圖像變遷偵測優化 Optimization of Deep Learning-based Change Detection in Satellite Images |
Authors: | 陳湘淇 Chen, Hsiang-Chi |
Contributors: | 廖文宏 Liao, Wen-Hung 陳湘淇 Chen, Hsiang-Chi |
Keywords: | 深度學習 卷積神經網路 轉換器 衛星影像 地景變遷偵測 Deep learning Convolutional neural networks Transformer Satellite images Change detection |
Date: | 2022 |
Issue Date: | 2022-10-05 09:14:53 (UTC+8) |
Abstract: | 地景變遷偵測為遙測影像分析的基本應用之一,該任務須自給定之兩張同一地區、不同時間點之衛星影像,偵測出變遷部位,廣泛被運用於環境監控、災害評估、土地資源規劃等範疇。深度學習引入地景變遷偵測任務,能夠輔助資料標註人員加速工作流程;近幾年,除了在電腦視覺領域發展越趨成熟的卷積神經網路,基於轉換器的視覺任務架構大放異彩,本研究分別選用基於卷積網路、純轉換器、混合結構作為編碼的SNUNet、ChangeFormer與BIT地景變遷偵測模型進行探討,針對不同條件評估模型影響,並以此優化偵測表現。 為維持模型面對不同變遷性質,或來自不同資料集之樣本的適應能力,本研究從訓練資料方面調整,增加一倍輸入時序交換的資料量或合併資料集進行訓練;另外我們也從目標函數端修改提出雙向損失,在不更動資料集之情況下,讓模型同時學習到「出現、消失」類型之變遷。上述訓練方式皆能有效提升模型泛化能力,在LEVIR-CD測試集上,IoU-1自不及0.1上升至超越0.7,達到接近基準之表現(0.7631);在S2Looking測試集上超越基準(0.4184),從小於0.1的IoU-1提升到0.4422。 Change detection (CD), one of the fundamental applications in remote sensing (RS) image analysis, aims to identify surface changes based on bitemporal images of the same area. It is widely used in environmental monitoring, disaster assessment and land resource planning. Introducing deep learning approaches for change detection could help geographic data annotation workers improve workflow efficiency. In addition to convolutional neural network (CNN), the deep learning framework that has achieved remarkable performance on a variety of computer vision applications in recent years is transformer. To compare and improve the performance of change detection, this research investigates modern change detection models, namely, SNUNet, ChangeFormer and BIT, which are CNN-based, pure transformer-based and CNN-transformer hybrid encoding model, respectively. In this work, we attempt to maintain the adaptability of the CD model when processing input image pairs which have different changed types or are from another datasets. In terms of training data, we can either double the number of training pairs d by adding the same bitemporal images in reverse order or merge CD datasets to build a larger training data. In terms of objective function, we propose a bidirectional loss, which considers not only newly built but also demolished areas without the need for data augmentation. Experimental results show that the above approaches attain significant accuracy improvements (over 0.7 from less than 0.1 of the IoU-1 on the LEVIR-CD test sets; from below 0.1 of the IoU-1 increased to 0.4422 on the S2Looking test sets) and greatly enhance the model’s generalization capability. |
Reference: | [1] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. [2] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. [3] Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat`s striate cortex. The Journal of physiology, 148(3), 574-591. [4] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105. [5] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [6] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). [7] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). [8] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587). [9] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969). [10] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440). [11] Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12), 2481-2495. [12] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062. [13] Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. [14] Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558. [15] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. [16] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. [17] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems (pp. 1877-1901). Curran Associates, Inc.. [18] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. [19] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536. [20] Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P., & Zhang, L. (2021). Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6881-6890). [21] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213-3223). [22] Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3), 302-321. [23] Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., & Yuille, A. (2014). The role of context for object detection and semantic segmentation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 891-898). [24] Valanarasu, J. M. J., Oza, P., Hacihaliloglu, I., & Patel, V. M. (2021, September). Medical transformer: Gated axial-attention for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 36-46). Springer, Cham. [25] Xiao, X., Lian, S., Luo, Z., & Li, S. (2018, October). Weighted res-unet for high-quality retina vessel segmentation. In 2018 9th international conference on information technology in medicine and education (ITME) (pp. 327-331). IEEE. [26] Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., & Liang, J. (2018). Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support (pp. 3-11). Springer, Cham. [27] Fang, S., Li, K., Shao, J., & Li, Z. (2021). SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geoscience and Remote Sensing Letters, 19, 1-5. [28] Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3-19). [29] Lebedev, M. A., Vizilter, Y. V., Vygolov, O. V., Knyaz, V. A., & Rubis, A. Y. (2018). Change detection in remote sensing images using conditional adversarial networks. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, 42(2). [30] Chen, H., Qi, Z., & Shi, Z. (2021). Remote sensing image change detection with transformers. IEEE Transactions on Geoscience and Remote Sensing. [31] Chen, H., & Shi, Z. (2020). A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sensing, 12(10), 1662. [32] Ji, S., Wei, S., & Lu, M. (2018). Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on Geoscience and Remote Sensing, 57(1), 574-586. [33] Zhang, C., Yue, P., Tapete, D., Jiang, L., Shangguan, B., Huang, L., & Liu, G. (2020). A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 166, 183-200. [34] Bandara, W. G. C., & Patel, V. M. (2022). A Transformer-Based Siamese Network for Change Detection. arXiv preprint arXiv:2201.01293. [35] Shen, L., Lu, Y., Chen, H., Wei, H., Xie, D., Yue, J., Chen, R., Lv, S., & Jiang, B. (2021). S2Looking: A satellite side-looking dataset for building change detection. Remote Sensing, 13(24), 5094.. [36] Dong, C., Loy, C. C., He, K., & Tang, X. (2015). Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2), 295-307. [37] Khan, B., Fraz, M. M., & Mumtaz, A. (2021, December). Enhanced Super-Resolution via Squeeze-and-Residual-Excitation in Aerial Imagery. In 2021 International Conference on Frontiers of Information Technology (FIT) (pp. 19-24). IEEE. [38] imgaug.augmenters.weather.CloudLayer. Imgaug documentation. https://imgaug.readthedocs.io/en/latest/source/api_augmenters_weather.html#imgaug.augmenters.weather.CloudLayer |
Description: | 碩士 國立政治大學 資訊科學系 109753114 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109753114 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202201612 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
311401.pdf | | 3964Kb | Adobe PDF2 | 58 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|