English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 118204/149236 (79%)
造訪人次 : 74194401      線上人數 : 126
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/159413
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/159413


    題名: 協同多層次描述方法之無人機定位機制
    Cooperative Localization Mechanism for Drones Using Multi-level Representations
    作者: 黃曉柔
    Huang, Hsiao-Jou
    貢獻者: 廖文宏
    Liao, Wen-Hung
    黃曉柔
    Huang, Hsiao-Jou
    關鍵詞: GNSS受限環境
    無人機視覺定位
    特徵擷取與比對
    語意分割
    旋轉邊界框
    GNSS-denied Environment
    UAV Visual Localization
    Feature Extraction and Matching
    Semantic Segmentation
    FOriented Bounding Box
    日期: 2025
    上傳時間: 2025-09-01 16:57:13 (UTC+8)
    摘要: 無人機技術的快速發展為軍事和民用領域帶來了革命性的變革,隨著無人機應用於地面觀測、災防勘查與智慧城市巡檢等任務日益普及,對其在無 GNSS 環境下之自主定位能力提出更高需求。傳統依賴全球衛星導航系統(Global Navigation Satellite System, GNSS)者,在都會遮蔽區、室內或極端氣候下常面臨訊號遮斷與誤差放大問題,亟需可靠之替代方案。為解決此挑戰,本研究提出一套多層次 UAV 視覺定位系統,建構於語意遮罩、幾何物件與影像特徵等不同層級之比對策略,提升定位準確性與穩健性。

    本系統架構具高度模組化,依據是否保留原始影像資源,分為「有實景參照」與「無實景參照」兩種模式。語意層利用經微調之 Segment Anything Model(SAM)進行語意遮罩生成,搭配結構加權 S-IoU 與九宮格語意佈局過濾影像;物件層則以 OBB 幾何對齊與 Vector IoU 分析物件配置一致性;特徵層則結合 GIM 與 LightGlue 模型進行局部特徵匹配。實驗結果顯示,本系統在無 GNSS 輔助條件下仍能準確辨識 UAV 所在位置,其中語意與物件層已涵蓋大部分判斷任務,特徵層作為進階補償機制展現潛在延伸應用價值。
    The rapid advancement of unmanned aerial vehicle (UAV) technology has brought transformative changes to both military and civilian sectors. As UAVs are increasingly deployed for tasks such as ground observation, disaster response, and smart city inspection, there is a growing demand for reliable autonomous localization systems that function in GNSS-denied environments. Traditional systems that rely on Global Navigation Satellite Systems (GNSS) often suffer from signal loss or amplified errors in urban canyons, indoor settings, or adverse weather conditions, highlighting the need for robust alternatives. To address this challenge, this study proposes a multi-level UAV visual localization framework that integrates semantic, geometric, and feature-based matching strategies to enhance both accuracy and resilience.

    The proposed system is highly modular and adapts its pipeline based on the availability of reference imagery, operating in either a reference-based or reference-free mode. At the semantic level, segmentation masks are generated using a fine-tuned Segment Anything Model (SAM), followed by structural-weighted Semantic IoU (S-IoU) and a grid-based semantic layout check to filter candidate images. The geometric level utilizes Oriented Bounding Boxes (OBB) and Vector IoU to assess object configuration consistency, while the feature level employs Generalizable Image Matching (GIM) and LightGlue for detailed local feature matching. Experimental results demonstrate that the system can accurately estimate UAV positions even without GNSS support, with the semantic and geometric layers handling the majority of cases and the feature layer serving as a robust fallback mechanism for challenging scenarios.
    參考文獻: [1] AI, M. (2023). Segment anything: The first foundation model for image segmentation. https://ai.meta.com/blog/segment-anything-foundation-model-imagesegmentation.
    [2] Aqel, M. O., Marhaban, M. H., Saripan, M. I., and Ismail, N. B. (2016). Review of visual odometry: types, approaches, challenges, and applications. SpringerPlus, 5:1–26.
    [3] Ayodeji Olalekan Salau, S. J. (2019). Feature extraction: A survey of the types, techniques, applications. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    [4] Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9, pages 404–417. Springer.
    [5] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S.(2020). End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer.
    [6] Chan, C. and Tan, S. (2001). Determination of the minimum bounding box of an arbitrary solid: an iterative approach. Computers & Structures, 79(15):1433–1449.
    [7] Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848.
    [8] Cheng, J., Deng, C., Su, Y., An, Z., and Wang, Q. (2024). Methods and datasets on semantic segmentation for unmanned aerial vehicle remote sensing images: A review. ISPRS Journal of Photogrammetry and Remote Sensing, 211:1–34.
    [9] Cui, L. and Ma, C. (2020). Sdf-slam: Semantic depth filter slam for dynamic environments. IEEE Access, 8:95301–95311.
    [10] DeTone, D., Malisiewicz, T., and Rabinovich, A. (2017). Superpoint: Selfsupervised interest point detection and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    [11] Developer, N. (2020). Detecting rotated objects using the odtk. https://developer.nvidia.com/blog/detecting-rotated-objects-using-the-odtk.
    [12] Durrant-Whyte, H. and Bailey, T. (2006). Simultaneous localization and mapping: part i. IEEE robotics & automation magazine, 13(2):99–110.
    [13] Gleize, P., Wang, W., and Feiszli, M. (2023). Silk: Simple learned keypoints. In Proceedings of the IEEE/CVF international conference on computer vision, pages 22499–22508.
    [14] Guo, Y., Liu, Y., Georgiou, T., and Lew, M. S. (2018). A review of semantic segmentation using deep neural networks. International journal of multimedia information retrieval, 7:87–93.
    [15] Hou, J.-B., Zhu, X., and Yin, X.-C. (2021). Self-adaptive aspect ratio anchor for oriented object detection in remote sensing images. Remote Sensing, 13(7):1318.
    [16] Inc., K. (2025). Our technology: Visual slam. https://www.kudan.io/ourtechnology.
    [17] Jasim, W. N. and Mohammed, R. J. (2021). A survey on segmentation techniques for image processing. Iraqi Journal for Electrical & Electronic Engineering, 17(2).
    [18] Jeong, J., Yoon, T. S., and Park, J. B. (2018). Towards a meaningful 3d map using a 3d lidar and a camera. Sensors, 18(8):2571.
    [19] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. (2023). Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026.
    [20] Li, H., Zhao, R., and Wang, X. (2014). Highly efficient forward and backward propagation of convolutional neural networks for pixelwise classification. arXiv preprint arXiv:1412.4526.
    [21] Lindenberger, P., Sarlin, P.-E., and Pollefeys, M. (2021). Lightglue: Local feature matching at light speed. Conference on Neural Information Processing Systems.
    [22] Liu, L., Pan, Z., and Lei, B. (2017). Learning a rotation invariant detector with rotatable bounding box. arXiv preprint arXiv:1711.09405.
    [23] Long, J., Shelhamer, E., and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440.
    [24] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110.
    [25] Lyu, Y., Vosselman, G., Xia, G.-S., Yilmaz, A., and Yang, M. Y. (2020). Uavid: A semantic segmentation dataset for uav imagery. ISPRS journal of photogrammetry and remote sensing, 165:108–119.
    [26] Mount, D. M., Netanyahu, N. S., and Le Moigne, J. (1999). Efficient algorithms for robust feature matching. Pattern recognition, 32(1):17–38.
    [27] Mur-Artal, R., Montiel, J. M. M., and Tardos, J. D. (2015). Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163.
    [28] Nigam, I., Huang, C., and Ramanan, D. (2018). Ensemble knowledge transfer for semantic segmentation. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1499–1508. IEEE.
    [29] Nistér, D., Naroditsky, O., and Bergen, J. (2004). Visual odometry.
    [30] NVIDIA (2024). Jetson orin developer kit.
    [31] Oktay, O., Schlemper, J., Folgoc, L. L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N. Y., Kainz, B., et al. (2018). Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999.
    [32] Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer.
    [33] Sarlin, P.-E., DeTone, D., Malisiewicz, T., and Rabinovich, A. (2020). Superglue: Learning feature matching with graph neural networks. Conference on Computer Vision and Pattern Recognition.
    [34] Shen, X., Cai, Z., Yin, W., Müller, M., Li, Z., Wang, K., Chen, X., and Wang, C. (2023). Gim: Learning generalizable image matcher from internet videos. IEEE Transactions on Pattern Analysis and Machine Intelligence.
    [35] Wang, J., Ding, J., Guo, H., Cheng, W., Pan, T., and Yang, W. (2019). Mask obb: A semantic attention-based mask oriented bounding box representation for multicategory object detection in aerial images. Remote Sensing, 11(24):2930.
    [36] Wen, L., Cheng, Y., Fang, Y., and Li, X. (2023). A comprehensive survey of oriented object detection in remote sensing images. Expert Systems with Applications, 224:119960.
    [37] Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., and Luo, P. (2021a). Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems, 34:12077–12090.
    [38] Xie, X., Cheng, G., Wang, J., Yao, X., and Han, J. (2021b). Oriented r-cnn for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3520–3529.
    [39] Xin, G.-x., Zhang, X.-t., Wang, X., and Song, J. (2015). A rgbd slam algorithm combining orb with prosac for indoor mobile robot. In 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), volume 1, pages 71–74. IEEE.
    [40] Yang, X., Yan, J., Feng, Z., and He, T. (2021). R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 3163–3171.
    [41] Yi, K. M., Trulls, E., Lepetit, V., and Fua, P. (2016). Lift: Learned invariant feature transform. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pages 467–483. Springer.
    [42] Zhang, S., Long, J., Xu, Y., and Mei, S. (2024). Pmho: Point-supervised oriented object detection based on segmentation-driven proposal generation. IEEE Transactions on Geoscience and Remote Sensing.
    [43] Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890.
    [44] Zhao, X., Wu, X., Chen, W., Chen, P. C. Y., Xu, Q., and Li, Z. (2023). Aliked: A lighter keypoint and descriptor extraction network via deformable transformation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
    [45] Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P. H., et al. (2021). Rethinking semantic segmentation from a sequence-tosequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890.
    [46] Zhou, Z., Wu, Q. J., Wan, S., Sun, W., and Sun, X. (2020). Integrating sift and cnn feature matching for partial-duplicate image detection. IEEE Transactions on Emerging Topics in Computational Intelligence, 4(5):593–604.
    [47] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159.
    [48] Zou, Z., Chen, K., Shi, Z., Guo, Y., and Ye, J. (2023). Object detection in 20 years: A survey. Proceedings of the IEEE, 111(3):257–276.
    描述: 碩士
    國立政治大學
    資訊科學系
    112753117
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0112753117
    資料類型: thesis
    顯示於類別:[資訊科學系] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    311701.pdf44396KbAdobe PDF0檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋