English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 112880/143846 (78%)
Visitors : 50182044      Online Users : 554
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/32621
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/32621


    Title: 子樹查詢的索引結構設計
    PCS-trie: An Index Structure for Sub-Tree Query
    Authors: 張詩宜
    Shih-i Chang
    Contributors: 沈錳坤
    Shan,Man-Kwan
    張詩宜
    Shih-i Chang
    Keywords: 索引結構

    子樹查詢
    資訊檢索
    可延伸式標記語言
    Index tructure
    Tree
    Sub-tree query
    Information Retrieval
    XML
    Date: 2003
    Issue Date: 2009-09-17 13:52:32 (UTC+8)
    Abstract: 隨著電腦以及網際網路的普及,越來越多各領域的資料被數位化,利用電腦幫助儲存及管理資料。有許多資料在數位化的過程中,採用tree的資料結構來表達以及儲存。也因此,如何查詢這些龐大的資料,就成為重要的課題。在本論文中,我們針對具有rooted、labeled以及ordered或unordered等特性的tree結構資料的索引問題,提出稱之為PCS-trie之全新的主記憶體資料庫(Main Memory Database)索引結構,並提出相關的增刪資料及搜尋演算法,以達成加速處理sub-tree query之目標。此索引方法的基礎,在於將tree database中的tree編碼為可完整代表其結構的PREOD code字串,之後再以我們所提出的PCS-trie加以索引。PCS-trie索引結構支援資料庫的動態增刪,且我們也提供了有效率的新增及刪除資料的演算法。本論文中也提出了多種搜尋演算法,使能夠在PCS-trie中進行exact matching、處理query tree含有don’t care部分之查詢、以及fault tolerant等不同類型的查詢。最後,我們以實驗的方法,配合人工產生的實驗資料,來對PCS-trie索引方法的時間及空間等各方面的效率加以檢驗。
    With the popularization of computer and the Internet, more and more data in various domains are digitized in order to take advantage of the power of computing and storing, and use the Internet to spread these informationz. Many data use tree structure to store them in the process of digitization. For this reason, it is a challenge to deal with these enormous data. In this paper, we present an approach to the search problem for these rooted labeled trees. We show a novel index structure, PREOD code Search trie (PCS-trie), and related algorithms for construction and search of PCS-trie, to speed up the sub-tree query to a tree database. The fundamental of this reaseach is to encode trees in tree database by PREOD code, in which the structure information of a tree can be reserved completely. Then we index these PREOD codes by PCS-trie. PCS-trie supports dynamic insertion and deleteion of PREOD codes. PCS-trie can handle three different types of query requirements: exact sub-tree query, query with don’t cares, and fault-tolerant query. Finally, we have conducted a series of experiments to evaluate the performance of PCS-trie and related algorithms. Experimental results obtained by running our techniques on synthetic data demonstrate the good performance of the proposed approach.
    Reference: [1] E. N. Adams. Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21:390-397, 1972.
    [2] S. Amer-Yahia, S. Cho, L. V. S. Lakshmanan, and D. Srivastava. Minimization of tree pattern queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 497-508, 2001.
    [3] A. Apostolico. The myriad virtues of sub-word trees. In A. Apostolico and Z. Galil, editors, Combinatorics on Words, pages 85-96, Springer-Verlag, Nato ASI series vol. 112, 1985.
    [4] A. Berglund, S. Boag, D. Chamberlin, M. F. Fernandez, M. Kay, J. Robie, and J. Simon. XML path language (XPath) 2.0 W3C working draft 16. Technical Report WD-wpath20-20020816, World Wide Web Consortium, 2002.
    [5] V. Berry and D. Bryant. Faster reliable phylogenetic analysis. In Proceedings of the 3rd Annual International Conference on Computational Molecular Biology, pages 59-68, 1999.
    [6] S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simon. XQuery 1.0: An XML query language W3C working draft 16. Technical Report WD-xquery-20020816, World Wide Web Consortium, 2002.
    [7] P. Buneman, S. B. Davidson, M. F. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proceedings of the 6th International Conference on Database Theory, pages 336-350, 1997.
    [8] J. H. Camin and R. R. Sokal. A method for deducing branching sequences in phylogeny. Evolution 19:311-326, 1965.
    [9] G. Chang, M. J. Healey, J. A. M. McHugh, and J. T. L. Wang. Mining the World Wide Web: An Information Search Approach. Kluwer Academic Publishers, Norwell, Massachusetts, 2001.
    [10] S. S. Chawathe, S. Abiteboul, and J. Widom. Representing and querying changes in semistructured data. In Proceedings of the IEEE International Conference on Data Engineering, pages 4-13, 1998.
    [11] S. S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 26-37, 1997.
    [12] S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 493-504, 1996.
    [13] Z. Chen, H. V. Jagadish, G. Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. In Proceedings of the IEEE International Conference on Data Engineering, pages 595-604, 2001.
    [14] B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proceedings of the 27th International Conference on Very Large Data Bases, pages 341-350, 2001.
    [15] M. H. Eich. Main memory database research directions. In Proceedings of the 6th International Workshop on Database Machines, pages 251-268, 1989.
    [16] A. Ferro, G. Gallo, R. Giugno, and A. Pulvirenti. Best-match retrieval for structured images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(7):707-718, 2001.
    [17] R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, pages 436-455, 1997.
    [18] L. Gruenwald and S. Liu. Main memory database system. ACM SIGMOD Record, 22(4):38-44, 1993.
    [19] L. C. K. Hui. Color set size problem with application to string matching. In Proceedings of Combinatorial Pattern Matching, Third Annual Symposium, Lecture Notes in Computer Science, vol 644, Springer, Berlin, 1992, pages 230-243.
    [20] H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: A tree algebra for XML. In Proceedings of the 8th Workshop on Data Bases and Programming Languages, pages 149-164, 2001.
    [21] S. Kannan, E. Lawler, and T. Warnow. Determining the evolutionary tree. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 475-484, 1990.
    [22] S. R. Kosaraju and A. L. Delcher. Large-scale assembly of DNA strings and space-efficient construction of suffix trees. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing, pages 169-177, 1995.
    [23] E. Kotsakis. XSD: A hierarchical access method for indexing XML schemata. Knowledge and Information Systems, 4(2): 168-201, 2002.
    [24] T. W. Leung, G. Mitchell, B. Subramanian, B. Vance, S. L. Vandenberg, and S. B. Zdonik. The AQUA data model and algebra. In Proceedings of the 4th Workshop on Data Bases and Programming Languages, pages 157-175, 1993.
    [25] H. Lu, Y. Y. Ng, and Z. Tian. T-Tree or B-Tree: Main memory database index structure revisited. In Proceedings of the 11th Australasian Database Conference, pages 65-73, 2000.
    [26] U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 319-327, 1990.
    [27] E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262-272, 1976.
    [28] T. Milo and D. Suciu. Index structures for path expressions. In Proceedings of the 7th International Conference on Database Theory, pages 277-295, 1999.
    [29] F. Neven and T. Schwentick. Expressive and efficient pattern languages for tree-structured data. In Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 145-156, 2000.
    [30] A. S. Noetzel and S. M. Selkow. An analysis of the general tree-editing problem. In D. Sankoff and J. B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, pages 237-252. Addison-Wesley, Reading, MA, 1983.
    [31] D. Shasha, J. T. L. Wang, H. Shan, and K. Zhang. ATreeGrep: Approzimate searching in unordered trees. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 89-98, 2002.
    [32] B. Subramanian, T. W. Leung, S. L. Vandenberg, and S. B. Zdonik. Ordered types in the AQUA data model. In Proceedings of the 4th Workshop on Data Bases and Programming Languages, pages 115-135, 1993.
    [33] B. Subramanian, T. W. Leung, S. L. Vandenberg, and S. B. Zdonik. The AQUA approach to querying lists and trees in object-oriented databases. In Proceedings of the IEEE International Conference on Data Engineering, pages 80-89, 1995.
    [34] W. Szpankowski. Asymptotic properties of data compression and suffix trees. IEEE Transactions on Information Theory, 39(5): 1647-1659, 1993.
    [35] V. Tseng and W. Lin. A new method for indexing XML documents. In Proceedings of the 12th Workshop on Object-Oriented Technology and Applications, pages 39-46, 2001.
    [36] E. Ukkonen. On-line construction of suffix-trees. Algorithmica, 14(3):249-260, 1995.
    [37] H. Wang, S. Park, W. Fan, and P. S. Yu. ViST: A dynamic index method for querying XML data by tree structures. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 110-121, 2003.
    [38] P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Symposium and Automata Theory, pages 1-11, 1973.
    [39] K. Zhang, R. Statman, and D. Shasha. On the editing distance between unordered labeled trees. Information Processing Letters, 42(3):133-139, 1992.
    Description: 碩士
    國立政治大學
    資訊科學學系
    90753006
    92
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0090753006
    Data Type: thesis
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    75300601.pdf44KbAdobe PDF2829View/Open
    75300602.pdf86KbAdobe PDF27977View/Open
    75300603.pdf63KbAdobe PDF2939View/Open
    75300604.pdf103KbAdobe PDF2937View/Open
    75300605.pdf126KbAdobe PDF21067View/Open
    75300606.pdf232KbAdobe PDF21302View/Open
    75300607.pdf103KbAdobe PDF22493View/Open
    75300608.pdf124KbAdobe PDF21194View/Open
    75300609.pdf810KbAdobe PDF21397View/Open
    75300610.pdf359KbAdobe PDF2876View/Open
    75300611.pdf204KbAdobe PDF21139View/Open
    75300612.pdf75KbAdobe PDF2840View/Open
    75300613.pdf42KbAdobe PDF21167View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback