政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153385
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113318/144297 (79%)
Visitors : 50954490      Online Users : 932
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/153385


    Title: 解碼 PC1 的力量:一種快速準確並基於共變異的 Hi-C 資料 A/B 染色體區室辨別方法
    Decoding the Power of PC1: A Fast and Accurate Covariance-Based Method for A/B Compartment Identification in Hi-C Data
    Authors: 程至榮
    Cheng, Zhi-Rong
    Contributors: 張家銘
    Chang, Jia-Ming
    程至榮
    Cheng, Zhi-Rong
    Keywords: 高通量染色體捕獲技術
    染色質區室分析
    主成份分析
    Hi-C
    Chromatin compartments analysis
    Principal Component Analysis (PCA)
    Date: 2024
    Issue Date: 2024-09-04 15:00:57 (UTC+8)
    Abstract: 在 Hi-C 皮爾森相關矩陣中識別 A 和 B 染色體區室的標準作法是基於主成份分析,然而其運作原理卻鮮少被討論。對於 Hi-C 皮爾森相關矩陣,我們提出其第一主成份的變異解釋率通常很高,並且該解釋率反應了 PC1 與皮爾森相關矩陣上之區室的匹配程度。此外,我們提出了一種啟發式算法,透過 Hi-C 皮爾森相關矩陣的共變異矩陣估計出第一主成份的型態,而不需要直接進行主成份分析。我們的啟發式算法可以使用隨機抽樣有效的實現以加快計算速度,為了解決高解析度下的記憶體瓶頸,我們使用一種最近發表的區室識別工具 POSSUMM 改進了算法,它接受稀疏的 Hi-C O/E 矩陣作為輸入。在我們的實驗中,我們的算法在時間或是記憶體使用上,其基準測試的表現優於使用 Scikit-learn 和 POSSUMM 等軟體工具的幂迭代法(Power iteration),同時與作為基準答案的第一主成份有高相似度。程式碼公開於下列網址 https://github.com/ZhiRongDev/HiCPEP。
    The PCA-based method is the standard for identifying A and B compartments in the Hi-C Pearson matrix. However, the reason why it works is rarely discussed. For the Hi-C Pearson matrix, we propose that the explained variance ratio of PC1 is usually high, and the ratio will reflect how the PC1 matches the compartments on the Pearson matrix. Besides, we propose a heuristic algorithm to estimate the pattern of PC1 according to the Hi-C Pearson's covariance matrix without explicitly performing PCA. Our method can be implemented efficiently using random sampling techniques to accelerate calculations. To address the memory bottleneck at finer matrix resolutions, we adapt the algorithm using principles from POSSUMM, a recently published compartment identification tool that takes the sparse Hi-C O/E matrix as input. In our experiments, our algorithm outperforms Power iteration methods, such as those implemented in Scikit-learn and POSSUMM, in terms of the time or memory usage, while maintaining a high degree of similarity to the ground truth PC1. The code is freely available at
    https://github.com/ZhiRongDev/HiCPEP.
    Reference: [1] Erez Lieberman-Aiden*, Nynke L. van Berkum*, et al. “Comprehensive mapping of long-range interactions reveals folding principles of the human genome.”Science 326 (2009). GScholar Citations: 1626. Cover Article.

    [2] Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002 Feb 15;295(5558):1306-11. doi: 10.1126/science 1067799. PMID: 11847345.

    [3] Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380.

    [4] Rao, S., Huang, S.-C., Glenn, St., Hilaire, B., Engreitz, J. M., Perez, E. M., etal. (2017). Cohesin loss eliminates all loop domains. Cell 171, 305 – 320.e24. doi:10.1016/j.cell.2017.09.026

    [5] Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014 Dec 18;159(7):1665-80. doi: 10.1016/j.cell.2014.11.021. Epub 2014 Dec 11. Erratum in: Cell. 2015 Jul 30;162(3):687-8. PMID: 25497547; PMCID: PMC5635824.

    [6] Harris, H.L., Gu, H., Olshansky, M. et al. Chromatin alternates between A and B compartments at kilobase scale for subgenic organization. Nat Commun 14, 3303 (2023). https://doi.org/10.1038/s41467-023-38429-1

    [7] Yaffe, E., and Tanay, A. (2011). Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43 (11), 1059–1065. doi:10.1038/ng.947

    [8] Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C. J., Vert, J. P., et al. (2015). HiC-pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259. doi:10.1186/s13059-015-0831-x

    [9] Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B.R., et al. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9 (10), 999–1003. doi:10.1038/nmeth.2148

    [10] Knight, P. A., and Daniel, R. (2013). A fast algorithm for matrix balancing. IMA J. Numer. Analysis 33 (3), 1029–1047. doi:10.1093/imanum/drs019

    [11] Kalluchi A, Harris HL, Reznicek TE, Rowley MJ. Considerations and caveats for analyzing chromatin compartments. Front Mol Biosci. 2023 Apr 5;10:1168562. doi: 10.3389/fmolb.2023.1168562. PMID: 37091873; PMCID: PMC10113542.

    [12] Jolliffe Ian T. and Cadima Jorge 2016 Principal component analysis: a review and recent developments Phil. Trans. R. Soc. A.3742015020220150202 http://doi.org/10.1098/rsta.2015.0202

    [13] Kruse, K., Hug, C.B. & Vaquerizas, J.M. FAN-C: a feature-rich framework for the analysis and visualization of chromosome conformation capture data. Genome Biol 21, 303 (2020). https://doi.org/10.1186/s13059-020-02215-9

    [14] Heinz S, Benner C, Spann N, Bertolino E et al. Simple Combinations of LineageDetermining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell 2010 May 28;38(4):576-589. PMID: 20513432

    [15] Abdennur, N., and Mirny, L.A. (2020). Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. doi: 10.1093/bioinformatics/btz540.

    [16] Neva C. Durand, Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. ”Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments.” Cell Systems 3(1), 2016.

    [17] Zheng X, Zheng Y. CscoreTool: fast Hi-C compartment analysis at high resolution. Bioinformatics. 2018 May 1;34(9):1568-1570. doi: 10.1093/bioinformatics/btx802. PMID: 29244056; PMCID: PMC5925784.

    [18] Xiong, K., and Ma, J. (2019). Revealing Hi-C subcompartments by imputing interchromosomal chromatin interactions. Nat. Commun. 10 (1), 5069. doi:10.1038/s41467- 019-12954-4.

    [19] Wen, Z., Zhang, W., Zhong, Q., Xu, J., Hou, C., Qin, Z. S., et al. (2022). Extensive chromatin structure-function associations revealed by accurate 3D compartmentalization characterization. Front. Cell Dev. Biol. 10, 845118. doi:10. 3389/fcell.2022.845118

    [20] van Berkum NL, Lieberman-Aiden E, Williams L, Imakaev M et al. Hi-C: a method to study the three-dimensional architecture of genomes. J Vis Exp 2010 May 6;(39). PMID: 20461051

    [21] Sanborn AL, Rao SS, Huang SC, Durand NC et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci U S A 2015 Nov 24;112(47):E6456-65. PMID: 26499245

    [22] Jonathon Shlens. A Tutorial on Principal Component Analysis. 2014. arXiv:1404.1100

    [23] Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. arXiv:1201.0490

    [24] Baglama, J. & Lothar, R. Augmented implicitly restarted lanczos bidiagonalization methods. SIAM J. Sci. Comput 27, 19–42 (2005). https://doi.org/10.1137/04060593X

    [25] Free Software Foundation, I. (2014). GNU Datamash. Retrieved from https://www.gnu.org/software/datamash/
    Description: 碩士
    國立政治大學
    資訊科學系
    111753151
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0111753151
    Data Type: thesis
    Appears in Collections:[Department of Computer Science ] Theses

    Files in This Item:

    File Description SizeFormat
    315101.pdf5331KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback