政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153373

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 118575/149625 (79%)
Visitors : 79261163 Online Users : 369

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/153373

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/153373

Title:	BigBigTree2 ：基於Nextflow中DSL2語法改進BigBigTree的大規模基因樹建構方法 BigBigTree2: Advanced Large-Scale Gene Tree Construction, Improving BigBigTree with Nextflow DSL2 Integration
Authors:	邱顯安 Chiu, Hsien-An
Contributors:	張家銘邱顯安 Chiu, Hsien-An
Keywords:	大規模基因演化樹多基因家族分群串接演化樹定位 Nextflow
Date:	2024
Issue Date:	2024-09-04 14:58:44 (UTC+8)
Abstract:	BigBigTree是由蔡漢龍在2020年發表的碩士論文，主要的方法是基於Nextflow框架，藉由分群的方式來建構演化樹，主要針對的資料為多基因家族 (如：果蠅嗅覺感受器)，這類型的資料透過傳統最精確的最大似然方法(Maximum likelihood)，在計算資源有限的情況下，似乎已經沒辦法準確的建構。在本研究中，我們提出了BigBigTree2，自從Nextflow在2022年後全面移除了對DSL1語法的支持，BigBigTree2也需要將整個語法改寫成DSL2，而DSL2語法也使得BigBigTree2有更好的拓展性，可以方便快速的新增功能以及維護，並且我們基於原本的建樹流程中新增了一個步驟 - 演化樹定位 (Phylogenetic Placement)，解決之前BigBigTree沒辦法處理輸入資料中有低同一性(low identity score)序列問題，雖然會額外增加運行時間，但可以透過計算的方式更好地把這些序列放到對於演化樹似然性分數(likelihood score)最高的位置。BigBigTree2即是一個使用分群串接的方法，並基於Nextflow架構實現平行運算的大規模基因演化樹建樹流程。 BigBigTree2在輸入為尚未比對的序列經過BLAST、序列比對、演化樹定位最終輸出樹型，相較現在主流的建樹方法在大約在六千筆序列資料快了兩倍，一萬筆序列資料的情況下快了將近三倍並且可以達到相似的似然性分數，而在序列數量越多的情況下越明顯。
Reference:	Park, M.; Zaharias, P.; Warnow, T. Disjoint Tree Mergers for Large-Scale Maximum Likelihood Tree Estimation. Algorithms 2021, 14 Smirnov V, Warnow T. Unblended disjoint tree merging using GTM improves species tree estimation. BMC Genomics. 2020 Apr 蔡漢龍.BigBigTree: a divide and concatenate strategy for the phylogenetic reconstruction of large orthologous datasets using Nextflow framework.〔未出版之碩士論文〕。國立政治大學資訊科學系(2020) iTOL - https://itol.embl.de/ Kannan, L., Wheeler, W.C. Maximum Parsimony on Phylogenetic networks. Algorithms Mol Biol 7, 9 (2012). Roychoudhury, Arindam. “Consistency of the Maximum Likelihood Estimator of Evolutionary Tree.” arXiv: Populations and Evolution (2014) Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981 Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002 Jul Sievers F, Higgins DG. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 2018 Jan;27(1) Zaharias Paul and Warnow Tandy.Recent progress on methods for estimating and updating large phylogenies. 2022 Phil. Trans. R. Soc. B37720210244 Difference of Orthology and Paralogy -http://petang.cgu.edu.tw/Bioinfomatics/MANUALS/NCBIblast/Orthology.html BLAST - https://blast.ncbi.nlm.nih.gov/Blast.cgi hcluster - https://pypi.python.org/pypi/hcluster TreeBeST - http://treesoft.sourceforge.net/treebest.shtml Guindon S., Dufayard J.F., Lefort V., Anisimova M., Hordijk W., Gascuel O. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 59(3):307-21, 2010. Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. Ewels, P.A., Peltzer, A., Fillinger, S. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278 (2020). Björn E. Langer, Andreia Amaral, Marie-Odile Baudement,et al.the nf-core community. Empowering bioinformatics communities with Nextflow and nf-core,bioRxiv 2024.05.10.59291 Elizabeth Koning, Malachi Phillips, and Tandy Warnow. pplacerDC: a new scalable phylogenetic placement method. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB '21). Matsen, F.A., Kodner, R.B. & Armbrust, E. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010). Alexandros Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, Volume 30, Issue 9, May 2014, Pages 1312–1313 Berger, S. A., Krompass, D., & Stamatakis, A. (2011). Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology, 60(3), 291-302. Pierre Barbera, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, Alexandros Stamatakis 2019; EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences, Systematic Biology, syy054 Metin Balaban and others, APPLES: Scalable Distance-Based EPAPhylogenetic Placement with or without Alignments, Systematic Biology, Volume 69, Issue 3, May7 2020, Pages 566–578 Treedist - https://pypi.org/project/treedist/ Robinson DF, Foulds LR Math Biosci 1981, Comparison of phylogenetic trees. Bui Quang Minh, Heiko A Schmidt, Olga Chernomor, Dominik Schrempf, Michael D Woodhams, Arndt von Haeseler, Robert Lanfear, IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era, Molecular Biology and Evolution, Volume 37, Issue 5, May 2020 Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009 Tata Consultancy Services (2024). TCS ADD™ - Advanced Drug Development Suite. Retrieved from [TCS Official Website](https://www.tcs.com) Emms, D.M. and Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology.2019. Altenhoff, A.M., Train, C.M., Seluanov, A., & Dessimoz, C. OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Research.2021.
Description:	碩士國立政治大學資訊科學系 110753110
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0110753110
Data Type:	thesis
Appears in Collections:	[資訊科學系] 學位論文

Files in This Item:

File	Description	Size	Format
311001.pdf		10757Kb	Adobe PDF	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback