政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/142120
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113451/144438 (79%)
Visitors : 51308632      Online Users : 881
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/142120


    Title: 總體基因Hi-C交互作用圖之網路分析與其組裝
    The network analysis of the metagenomic Hi-C contact map and its downstream metagenome assemble binning
    Authors: 許育庭
    Hsu, Yu-Ting
    Contributors: 張家銘
    Chang, Jia-Ming
    許育庭
    Hsu, Yu-Ting
    Keywords: Hi-C
    總體基因組裝基因組
    網路模型
    社群發現
    Hi-C
    Metagenome-assembled genomes
    Network models
    Community detection
    Date: 2022
    Issue Date: 2022-10-05 09:14:07 (UTC+8)
    Abstract: 背景:總體基因組學是對微生物群體進行基因分析研究,相對於傳統總體基因組分裝,近來利用染色體構象捕獲技術進行恢復個別物種的總體基因組,可以得到更好的分裝結果。在先前鄭惟文的碩士論文「HiCBin: 利用 Hi-C 交互網路對總體基因組裝進行反捲積」中,以bin3C的流程為主並提出了一個利用智慧局部移動法(SLM)分群的基因組分裝方法。
    結果:除了利用Hi-C資料進行基因組分裝,我們對總體基因組Hi-C連結網路進行分析,並發現在高品質的網路有較多的小世界網路特性,於是我們利用這些特性進行分裝網路品質的預測。此外我們也以bin3C的流程,並替換其他不同的社群發現演算法,去測試是否改善分群結果,而調整解析度後的SLM在兩個資料中表現較好。
    結論:我們的研究主要依據先前碩士論文但針對網路做更多分析,並多測試了三個資料集。雖較難得出以何種分群方法更好,但對於網路特性的發現可以為未來的研究提供一個新的觀點。實驗原始碼可以於以下連結中取得: https://github.com/changlabtw/Bin3C_SLM。
    Background: Metagenomics is the genomic analysis of microbial communities. Current approaches to metagenome-assembled-genomes (MAGs) recovery draw on chromosome conformation capture techniques and have been shown to outperform traditional genome binning methods. In the previous Cheng’s thesis, `HiCBin: Deconvoluting metagenomic assemblies via Hi-C connect networks`, she based on bin3C pipeline and described a Hi-C-based metagenomic deconvolution method using smart local movement algorithm (SLM) for genome binning.
    Results: In addition to using Hi-C data for genome binning, we further analyze the contact networks of metagenomic Hi-C and discover that the networks get higher quality to have more small-world characteristics. Therefore, we use the properties to predict the qualities of the clusters. We also follow the bin3C process and replace the clustering step with different community detection algorithms to check if it improves the outcome. SLM performs better in two datasets after adjusting the resolution parameter.
    Conclusion: In this work, we mainly followed Cheng’s thesis but did more analyses on the metagenomic Hi-C networks and tested three more datasets. Though it is hard to conclude a better cluster algorithm from this work, the discovery of the network properties might provide a new aspect for future works. The source code for the experiments is publicly available at https://github.com/changlabtw/Bin3C_SLM.
    Reference: 1. Howe AC, Jansson JK, Malfatti SA, Tringe SG, Tiedje JM, Brown CT. Tackling soil diversity with the assembly of large, complex metagenomes. Proc National Acad Sci. 2014;111:4904–9.
    2. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science. 2004;304:66–74.
    3. Oh J, Byrd AL, Deming C, Conlan S, Barnabas B, Blakesley R, et al. Biogeography and individuality shape function in the human skin metagenome. Nature. 2014;514:59–64.
    4. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65.
    5. Handelsman J. Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol Mol Biol R. 2004;68:669–85.
    6. Rappé MS, Giovannoni SJ. THE UNCULTURED MICROBIAL MAJORITY. Annu Rev Microbiol. 2003;57:369–94.
    7. Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. Peerj. 2014;2:e415.
    8. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, et al. A new view of the tree of life. Nat Microbiol. 2016;1:16048.
    9. Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol. 2017;2:1533–42.
    10. Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, et al. A genomic catalog of Earth’s microbiomes. Nat Biotechnol. 2020;1–11.
    11. Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol. 2021;39:105–14.
    12. Thomas T, Gilbert J, Meyer F. Metagenomics - a guide from sampling to data analysis. Microb Informatics Exp. 2012;2:3.
    13. Hugerth LW, Larsson J, Alneberg J, Lindh MV, Legrand C, Pinhassi J, et al. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol. 2015;16:279.
    14. Burton JN, Liachko I, Dunham MJ, Shendure J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda, Md). 2014;4:1339–46.
    15. Iverson V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV. Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science. 2012;335:587–90.
    16. Mitra S, Förster-Fromme K, Damms-Machado A, Scheurenbrand T, Biskup S, Huson DH, et al. Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing. Bmc Genomics. 2013;14:S16.
    17. Narasingarao P, Podell S, Ugalde JA, Brochier-Armanet C, Emerson JB, Brocks JJ, et al. De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities. Isme J. 2012;6:81–93.
    18. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499:431–7.
    19. Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009;10:R85.
    20. Hug LA, Castelle CJ, Wrighton KC, Thomas BC, Sharon I, Frischkorn KR, et al. Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling. Microbiome. 2013;1:22.
    21. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 2013;23:111–20.
    22. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31:533–8.
    23. Mallawaarachchi V, Wickramarachchi A, Lin Y. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinform Oxf Engl. 2020;36:3307–13.
    24. Alneberg J, Bjarnason BS, Bruijn I de, Schirmer M, Quick J, Ijaz UZ, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11:1144–6.
    25. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2:26.
    26. Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605–7.
    27. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. Peerj. 2015;3:e1165.
    28. Lin H-H, Liao Y-C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep-uk. 2016;6:24175.
    29. DeMaere MZ, Darling AE. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology. 2019;20:46.
    30. Press MO, Wiser AH, Kronenberg ZN, Langford KW, Shakya M, Lo C-C, et al. Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions. Biorxiv. 2017;198713.
    31. Lieberman-Aiden E, Berkum NL van, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science. 2009;326:289–93.
    32. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing Chromosome Conformation. Science. 2002;295:1306–11.
    33. Marbouty M, Cournac A, Flot J-F, Marie-Nelly H, Mozziconacci J, Koszul R. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife. 2014;3:e03318.
    34. Rosvall M, Axelsson D, Bergstrom CT. The map equation. European Phys J Special Top. 2009;178:13–23.
    35. Domenico MD, Lancichinetti A, Arenas A, Rosvall M. Identifying Modular Flows on Multilayer Networks Reveals Highly Overlapping Organization in Interconnected Systems. Phys Rev X. 2015;5:011027.
    36. Baudry L, Foutel-Rodier T, Thierry A, Koszul R, Marbouty M. MetaTOR: A Computational Pipeline to Recover High-Quality Metagenomic Bins From Mammalian Gut Proximity-Ligation (meta3C) Libraries. Frontiers Genetics. 2019;10:753.
    37. Du Y, Sun F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol. 2022;23:63.
    38. Du Y, Laperriere SM, Fuhrman J, Sun F. Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression. J Comput Biol. 2022;29:106–20.
    39. C IUq, C Q. TAXAassign v0. 4 [Internet]. 2013. Available from: https://github.com/umerijaz/TAXAassign
    40. Marbouty M, Thierry A, Millot GA, Koszul R. MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. Elife. 2021;10:e60608.
    41. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–34.
    42. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Arxiv. 2013;
    43. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England). 2009;25:2078–9.
    44. Knight PA, Ruiz D. A fast algorithm for matrix balancing. Ima J Numer Anal. 2012;33:1029–47.
    45. Erdos P, Renyi A. On the Evolution of Random Graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences. 1960. p. 17–61.
    46. Tëmkin I, Eldredge N. Macroevolution, Explanation, Interpretation and Evidence. Interdisc Evol Res. 2015;183–226.
    47. Barabási A-L, Ravasz E, Oltvai Z. Statistical Mechanics of Complex Networks. Lect Notes Phys. 2003;46–65.
    48. Pombo A, Nicodemi M. Physical mechanisms behind the large scale features of chromatin organization. Biochem Soc Symp. 2014;5:e28447.
    49. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome research. 2014;24:999–1011.
    50. Liu T, Wang Z. Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks. Bmc Bioinformatics. 2018;19:496.
    51. Pigolotti S, Jensen MH, Zhan Y, Tiana G. Bifractal nature of chromosome contact maps. Biorxiv. 2020;686279.
    52. Kan T-C. Apply graph theory to visualizing and analyzing Hi-C contact network. 2018.
    53. Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Review. 2009;4:661–703.
    54. Broido AD, Clauset A. Scale-free networks are rare. Nat Commun. 2019;10:1017.
    55. Gillespie CS. Fitting Heavy Tailed Distributions: The poweRlaw Package. J Stat Softw. 2015;64.
    56. Alstott J, Bullmore E, Plenz D. powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions. Plos One. 2014;9:e85777.
    57. Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2.
    58. Humphries MD, Gurney K, Prescott TJ. The brainstem reticular formation is a small-world, not scale-free, network. Proc Royal Soc B Biological Sci. 2006;273:503–11.
    59. Telesford QK, Joyce KE, Hayasaka S, Burdette JH, Laurienti PJ. The Ubiquity of Small-World Networks. Brain Connectivity. 2011;1:367–75.
    60. Emmons S, Kobourov S, Gallant M, Börner K. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. Plos One. 2016;11:e0159161.
    61. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008.
    62. Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471.
    63. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23.
    64. Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E. 2007;76:036106.
    65. Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms. Phys Rev E. 2008;78:046110.
    66. Rotta R, Noack A. Multilevel local search algorithms for modularity clustering. J Exp Algorithmics Jea. 2011;16:2.3.
    67. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.
    68. Reichardt J, Bornholdt S. Statistical mechanics of community detection. Phys Rev E. 2006;74:016110.
    69. Traag VA, Waltman L, Eck NJ van. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep-uk. 2019;9:5233.
    70. Lancichinetti A. Louvain [Internet]. Available from: https://sites.google.com/site/andrealancichinetti/
    71. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55.
    72. Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207.
    73. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5.
    74. Hunter JD. Matplotlib: A 2D Graphics Environment. Comput Sci Eng. 2007;9:90–5.
    75. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Arxiv. 2012;
    76. Stevens J-L, Rudiger P, Bednar J. HoloViews: Building Complex Visualizations Easily for Reproducible Science. Proceedings of the 14th Python in Science Conference. 2015. p. 59–66.
    77. Cheng W-W. HiCBin: Deconvoluting metagenomic assemblies by Hi-C connect network. 2020.
    Description: 碩士
    國立政治大學
    資訊科學系
    108753127
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0108753127
    Data Type: thesis
    DOI: 10.6814/NCCU202201535
    Appears in Collections:[Department of Computer Science ] Theses

    Files in This Item:

    File Description SizeFormat
    312701.pdf2178KbAdobe PDF20View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback