政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153373
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 112878/143845 (78%)
造访人次 : 49993960      在线人数 : 251
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/153373


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/153373


    题名: BigBigTree2 :基於Nextflow中DSL2語法改進BigBigTree的大規模基因樹建構方法
    BigBigTree2: Advanced Large-Scale Gene Tree Construction, Improving BigBigTree with Nextflow DSL2 Integration
    作者: 邱顯安
    Chiu, Hsien-An
    贡献者: 張家銘
    邱顯安
    Chiu, Hsien-An
    关键词: 大規模基因演化樹
    多基因家族
    分群串接
    演化樹定位
    Nextflow
    日期: 2024
    上传时间: 2024-09-04 14:58:44 (UTC+8)
    摘要: BigBigTree是由蔡漢龍在2020年發表的碩士論文,主要的方法是基於Nextflow框架,藉由分群的方式來建構演化樹,主要針對的資料為多基因家族 (如:果蠅嗅覺感受器),這類型的資料透過傳統最精確的最大似然方法(Maximum likelihood),在計算資源有限的情況下,似乎已經沒辦法準確的建構。
    在本研究中,我們提出了BigBigTree2,自從Nextflow在2022年後全面移除了對DSL1語法的支持,BigBigTree2也需要將整個語法改寫成DSL2,而DSL2語法也使得BigBigTree2有更好的拓展性,可以方便快速的新增功能以及維護,並且我們基於原本的建樹流程中新增了一個步驟 - 演化樹定位 (Phylogenetic Placement),解決之前BigBigTree沒辦法處理輸入資料中有低同一性(low identity score)序列問題,雖然會額外增加運行時間,但可以透過計算的方式更好地把這些序列放到對於演化樹似然性分數(likelihood score)最高的位置。BigBigTree2即是一個使用分群串接的方法,並基於Nextflow架構實現平行運算的大規模基因演化樹建樹流程。
    BigBigTree2在輸入為尚未比對的序列經過BLAST、序列比對、演化樹定位最終輸出樹型,相較現在主流的建樹方法在大約在六千筆序列資料快了兩倍,一萬筆序列資料的情況下快了將近三倍並且可以達到相似的似然性分數,而在序列數量越多的情況下越明顯。
    參考文獻: Park, M.; Zaharias, P.; Warnow, T. Disjoint Tree Mergers for Large-Scale Maximum Likelihood Tree Estimation. Algorithms 2021, 14
    Smirnov V, Warnow T. Unblended disjoint tree merging using GTM improves species tree estimation. BMC Genomics. 2020 Apr
    蔡漢龍.BigBigTree: a divide and concatenate strategy for the phylogenetic reconstruction of large orthologous datasets using Nextflow framework.〔未出版之碩士論文〕。國立政治大學資訊科學系(2020)
    iTOL - https://itol.embl.de/
    Kannan, L., Wheeler, W.C. Maximum Parsimony on Phylogenetic networks. Algorithms Mol Biol 7, 9 (2012).
    Roychoudhury, Arindam. “Consistency of the Maximum Likelihood Estimator of Evolutionary Tree.” arXiv: Populations and Evolution (2014)
    Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981
    Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217.
    Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002 Jul
    Sievers F, Higgins DG. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 2018 Jan;27(1)
    Zaharias Paul and Warnow Tandy.Recent progress on methods for estimating and updating large phylogenies. 2022 Phil. Trans. R. Soc. B37720210244
    Difference of Orthology and Paralogy -http://petang.cgu.edu.tw/Bioinfomatics/MANUALS/NCBIblast/Orthology.html
    BLAST - https://blast.ncbi.nlm.nih.gov/Blast.cgi
    hcluster - https://pypi.python.org/pypi/hcluster
    TreeBeST - http://treesoft.sourceforge.net/treebest.shtml
    Guindon S., Dufayard J.F., Lefort V., Anisimova M., Hordijk W., Gascuel O. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 59(3):307-21, 2010.
    Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319.
    Ewels, P.A., Peltzer, A., Fillinger, S. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278 (2020).
    Björn E. Langer, Andreia Amaral, Marie-Odile Baudement,et al.the nf-core community. Empowering bioinformatics communities with Nextflow and nf-core,bioRxiv 2024.05.10.59291
    Elizabeth Koning, Malachi Phillips, and Tandy Warnow. pplacerDC: a new scalable phylogenetic placement method. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB '21).
    Matsen, F.A., Kodner, R.B. & Armbrust, E. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010).
    Alexandros Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, Volume 30, Issue 9, May 2014, Pages 1312–1313
    Berger, S. A., Krompass, D., & Stamatakis, A. (2011). Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology, 60(3), 291-302.
    Pierre Barbera, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, Alexandros Stamatakis 2019; EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences, Systematic Biology, syy054
    Metin Balaban and others, APPLES: Scalable Distance-Based EPAPhylogenetic Placement with or without Alignments, Systematic Biology, Volume 69, Issue 3, May7 2020, Pages 566–578
    Treedist - https://pypi.org/project/treedist/
    Robinson DF, Foulds LR Math Biosci 1981, Comparison of phylogenetic trees.
    Bui Quang Minh, Heiko A Schmidt, Olga Chernomor, Dominik Schrempf, Michael D Woodhams, Arndt von Haeseler, Robert Lanfear, IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era, Molecular Biology and Evolution, Volume 37, Issue 5, May 2020
    Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009
    Tata Consultancy Services (2024). TCS ADD™ - Advanced Drug Development Suite. Retrieved from [TCS Official Website](https://www.tcs.com)
    Emms, D.M. and Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology.2019.
    Altenhoff, A.M., Train, C.M., Seluanov, A., & Dessimoz, C. OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Research.2021.
    描述: 碩士
    國立政治大學
    資訊科學系
    110753110
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0110753110
    数据类型: thesis
    显示于类别:[資訊科學系] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    311001.pdf10757KbAdobe PDF0检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈