Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/32621
|
Title: | 子樹查詢的索引結構設計 PCS-trie: An Index Structure for Sub-Tree Query |
Authors: | 張詩宜 Shih-i Chang |
Contributors: | 沈錳坤 Shan,Man-Kwan 張詩宜 Shih-i Chang |
Keywords: | 索引結構 樹 子樹查詢 資訊檢索 可延伸式標記語言 Index tructure Tree Sub-tree query Information Retrieval XML |
Date: | 2003 |
Issue Date: | 2009-09-17 13:52:32 (UTC+8) |
Abstract: | 隨著電腦以及網際網路的普及,越來越多各領域的資料被數位化,利用電腦幫助儲存及管理資料。有許多資料在數位化的過程中,採用tree的資料結構來表達以及儲存。也因此,如何查詢這些龐大的資料,就成為重要的課題。在本論文中,我們針對具有rooted、labeled以及ordered或unordered等特性的tree結構資料的索引問題,提出稱之為PCS-trie之全新的主記憶體資料庫(Main Memory Database)索引結構,並提出相關的增刪資料及搜尋演算法,以達成加速處理sub-tree query之目標。此索引方法的基礎,在於將tree database中的tree編碼為可完整代表其結構的PREOD code字串,之後再以我們所提出的PCS-trie加以索引。PCS-trie索引結構支援資料庫的動態增刪,且我們也提供了有效率的新增及刪除資料的演算法。本論文中也提出了多種搜尋演算法,使能夠在PCS-trie中進行exact matching、處理query tree含有don’t care部分之查詢、以及fault tolerant等不同類型的查詢。最後,我們以實驗的方法,配合人工產生的實驗資料,來對PCS-trie索引方法的時間及空間等各方面的效率加以檢驗。 With the popularization of computer and the Internet, more and more data in various domains are digitized in order to take advantage of the power of computing and storing, and use the Internet to spread these informationz. Many data use tree structure to store them in the process of digitization. For this reason, it is a challenge to deal with these enormous data. In this paper, we present an approach to the search problem for these rooted labeled trees. We show a novel index structure, PREOD code Search trie (PCS-trie), and related algorithms for construction and search of PCS-trie, to speed up the sub-tree query to a tree database. The fundamental of this reaseach is to encode trees in tree database by PREOD code, in which the structure information of a tree can be reserved completely. Then we index these PREOD codes by PCS-trie. PCS-trie supports dynamic insertion and deleteion of PREOD codes. PCS-trie can handle three different types of query requirements: exact sub-tree query, query with don’t cares, and fault-tolerant query. Finally, we have conducted a series of experiments to evaluate the performance of PCS-trie and related algorithms. Experimental results obtained by running our techniques on synthetic data demonstrate the good performance of the proposed approach. |
Reference: | [1] E. N. Adams. Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21:390-397, 1972. [2] S. Amer-Yahia, S. Cho, L. V. S. Lakshmanan, and D. Srivastava. Minimization of tree pattern queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 497-508, 2001. [3] A. Apostolico. The myriad virtues of sub-word trees. In A. Apostolico and Z. Galil, editors, Combinatorics on Words, pages 85-96, Springer-Verlag, Nato ASI series vol. 112, 1985. [4] A. Berglund, S. Boag, D. Chamberlin, M. F. Fernandez, M. Kay, J. Robie, and J. Simon. XML path language (XPath) 2.0 W3C working draft 16. Technical Report WD-wpath20-20020816, World Wide Web Consortium, 2002. [5] V. Berry and D. Bryant. Faster reliable phylogenetic analysis. In Proceedings of the 3rd Annual International Conference on Computational Molecular Biology, pages 59-68, 1999. [6] S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simon. XQuery 1.0: An XML query language W3C working draft 16. Technical Report WD-xquery-20020816, World Wide Web Consortium, 2002. [7] P. Buneman, S. B. Davidson, M. F. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proceedings of the 6th International Conference on Database Theory, pages 336-350, 1997. [8] J. H. Camin and R. R. Sokal. A method for deducing branching sequences in phylogeny. Evolution 19:311-326, 1965. [9] G. Chang, M. J. Healey, J. A. M. McHugh, and J. T. L. Wang. Mining the World Wide Web: An Information Search Approach. Kluwer Academic Publishers, Norwell, Massachusetts, 2001. [10] S. S. Chawathe, S. Abiteboul, and J. Widom. Representing and querying changes in semistructured data. In Proceedings of the IEEE International Conference on Data Engineering, pages 4-13, 1998. [11] S. S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 26-37, 1997. [12] S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 493-504, 1996. [13] Z. Chen, H. V. Jagadish, G. Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. In Proceedings of the IEEE International Conference on Data Engineering, pages 595-604, 2001. [14] B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proceedings of the 27th International Conference on Very Large Data Bases, pages 341-350, 2001. [15] M. H. Eich. Main memory database research directions. In Proceedings of the 6th International Workshop on Database Machines, pages 251-268, 1989. [16] A. Ferro, G. Gallo, R. Giugno, and A. Pulvirenti. Best-match retrieval for structured images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(7):707-718, 2001. [17] R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, pages 436-455, 1997. [18] L. Gruenwald and S. Liu. Main memory database system. ACM SIGMOD Record, 22(4):38-44, 1993. [19] L. C. K. Hui. Color set size problem with application to string matching. In Proceedings of Combinatorial Pattern Matching, Third Annual Symposium, Lecture Notes in Computer Science, vol 644, Springer, Berlin, 1992, pages 230-243. [20] H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: A tree algebra for XML. In Proceedings of the 8th Workshop on Data Bases and Programming Languages, pages 149-164, 2001. [21] S. Kannan, E. Lawler, and T. Warnow. Determining the evolutionary tree. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 475-484, 1990. [22] S. R. Kosaraju and A. L. Delcher. Large-scale assembly of DNA strings and space-efficient construction of suffix trees. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing, pages 169-177, 1995. [23] E. Kotsakis. XSD: A hierarchical access method for indexing XML schemata. Knowledge and Information Systems, 4(2): 168-201, 2002. [24] T. W. Leung, G. Mitchell, B. Subramanian, B. Vance, S. L. Vandenberg, and S. B. Zdonik. The AQUA data model and algebra. In Proceedings of the 4th Workshop on Data Bases and Programming Languages, pages 157-175, 1993. [25] H. Lu, Y. Y. Ng, and Z. Tian. T-Tree or B-Tree: Main memory database index structure revisited. In Proceedings of the 11th Australasian Database Conference, pages 65-73, 2000. [26] U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 319-327, 1990. [27] E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262-272, 1976. [28] T. Milo and D. Suciu. Index structures for path expressions. In Proceedings of the 7th International Conference on Database Theory, pages 277-295, 1999. [29] F. Neven and T. Schwentick. Expressive and efficient pattern languages for tree-structured data. In Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 145-156, 2000. [30] A. S. Noetzel and S. M. Selkow. An analysis of the general tree-editing problem. In D. Sankoff and J. B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, pages 237-252. Addison-Wesley, Reading, MA, 1983. [31] D. Shasha, J. T. L. Wang, H. Shan, and K. Zhang. ATreeGrep: Approzimate searching in unordered trees. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 89-98, 2002. [32] B. Subramanian, T. W. Leung, S. L. Vandenberg, and S. B. Zdonik. Ordered types in the AQUA data model. In Proceedings of the 4th Workshop on Data Bases and Programming Languages, pages 115-135, 1993. [33] B. Subramanian, T. W. Leung, S. L. Vandenberg, and S. B. Zdonik. The AQUA approach to querying lists and trees in object-oriented databases. In Proceedings of the IEEE International Conference on Data Engineering, pages 80-89, 1995. [34] W. Szpankowski. Asymptotic properties of data compression and suffix trees. IEEE Transactions on Information Theory, 39(5): 1647-1659, 1993. [35] V. Tseng and W. Lin. A new method for indexing XML documents. In Proceedings of the 12th Workshop on Object-Oriented Technology and Applications, pages 39-46, 2001. [36] E. Ukkonen. On-line construction of suffix-trees. Algorithmica, 14(3):249-260, 1995. [37] H. Wang, S. Park, W. Fan, and P. S. Yu. ViST: A dynamic index method for querying XML data by tree structures. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 110-121, 2003. [38] P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Symposium and Automata Theory, pages 1-11, 1973. [39] K. Zhang, R. Statman, and D. Shasha. On the editing distance between unordered labeled trees. Information Processing Letters, 42(3):133-139, 1992. |
Description: | 碩士 國立政治大學 資訊科學學系 90753006 92 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0090753006 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|