Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/153375
|
Title: | NoSQL 資料庫子集查詢的學習索引 Learned Index for Subset Query of NoSQL Databases |
Authors: | 許軒祥 Hsu, Hsuan-Hsiang |
Contributors: | 沈錳坤 Shan, Man-Kwan 許軒祥 Hsu, Hsuan-Hsiang |
Keywords: | 學習索引 NoSQL資料庫 子集查詢 Learned Index NoSQL Database Subset Query |
Date: | 2024 |
Issue Date: | 2024-09-04 14:59:08 (UTC+8) |
Abstract: | NoSQL資料庫處理半結構化或非結構化資料,子集查詢是NoSQL資料庫中常見的查詢。近年來,運用機器學習的學習索引技術為資料庫的索引技術開闢了新途徑。與傳統的B-Tree相比,學習索引在查詢時間上具有顯著優勢。傳統索引的查詢時間主要是記憶體擷取時間,而學習索引的查詢時間主要是CPU運算時間。現有學習索引的研究主要針對傳統關聯式資料庫的查詢。針對子集查詢,僅有近期基於Deep Sets的DGM。DGM主要在記憶體空間效率方面節省空間,但在查詢速度上仍有提升的空間。 本研究提出了兩種創新的學習索引技術:LI4Subset-D和LI4Subset-P以提升NoSQL資料庫子集查詢的效能。LI4Subset-D與LI4Subset-P分別運用DeepSets與學習索引的PGM-index。實驗結果顯示LI4Subset-D在查詢速度上比DGM提升近149倍,記憶體空間僅增加約 7倍。LI4Subset-P在查詢速度比DGM快約3235倍,而記憶體空間約增加4倍。 NoSQL databases target at semi-structured or unstructured data, and subset queries are common in NoSQL databases. In recent years, learned index techniques based on machine learning have opened new avenues for database indexing. Compared to traditional B-Trees, learned indexes offer significant advantages in query time. Traditional indexes is memory intensive while learned index is CPU intensive. Existing research on learned indexes mainly focuses on traditional relational databases queries. For subset queries, the only recent development is the DGM approach based on Deep Sets. DGM is designed for space efficiency but still has room for improvement in time efficiency. This thesis proposes two novel learned index techniques, LI4Subset-D and LI4Subset-P, to enhance the performance of subset queries in NoSQL databases. LI4Subset-D and LI4Subset-P leverage Deep Sets and the PGM-index of learning indexes, respectively. Experimental results show that LI4Subset-D improves query speed by nearly 149 times compared to DGM, with the expense of 7 times increase in memory space. LI4Subset-P is approximately 3235 times faster than DGM in query speed, with the expense of 4 times increase in memory space. |
Reference: | [1] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, The Case for Learned Index Structures, in Proceedings of the ACM 2018 International Conference on Management of Data (SIGMOD), pp. 489-504, 2018. [2] A. Davitkova, D. Gjurovski, and S. Michel, Learning over Sets for Databases, in Proceedings of the 27th International Conference on Extending Database Technology (EDBT), pp. 68-80, 2024. [3] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep Sets, in Proceedings of Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017. [4] P. Ferragina and G. Vinciguerra, The PGM-index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds, in Proceedings of the VLDB Endowment, vol. 13, no. 8, pp. 1162-1175, 2020. [5] U. Deppisch, S-tree: A Dynamic Balanced Signature Index for Office Retrieval, in Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77-87, 1986. [6] M. Morzy, T. Morzy, A. Nanopoulos, and Y. Manolopoulos, Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes, in Proceedings of 7th East European Conference on Advances in Databases and Information Systems:: Springer, pp. 236-252, 2003. [7] S. Helmer, R. Aly, T. Neumann, and G. Moerkotte, Indexing set-valued attributes with a multi-level extendible hashing scheme, in Proceedings of 18th International Conference on Database and Expert Systems Applications:: Springer, pp. 98-108, 2007. [8] S. Bevc and I. Savnik, Using Tries for Subset and Superset Queries, in Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces: IEEE, pp. 147-152, 2009. [9] I. Savnik, Efficient Subset and Superset Queries, in DB&Local Proceedings: Citeseer, pp. 45-57, 2012. [10] I. Savnik, Index Data Structure for Fast Subset and Superset Queries, in Proceedings of International Conference on Availability, Reliability, and Security: Springer, pp. 134-148, 2013. [11] A. Galakatos, M. Markovitch, C. Binnig, R. Fonseca, and T. Kraska, Fiting-tree: A Data-Aware Index Structure, in Proceedings of the 2019 ACM International Conference on Management of Data (SIGMOD), pp. 1189-1206, 2019. [12] J. Rao and K. A. Ross, Cache Conscious Indexing for Decision-Support in Main Memory, in Proceedings of the 25th VLDB Conference, 1999. [13] A. Kipf et al., RadixSpline: A Single-Pass Learned Index, in Proceedings of the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pp. 1-5, 2020. [14] R. Marcus et al., Benchmarking Learned Indexes, Proceedings of the VLDB Endowment, Volume 14, Issue 1, 2020. |
Description: | 碩士 國立政治大學 資訊科學系 111753122 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0111753122 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
312201.pdf | 1232Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|