資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/159414
|
題名: | Hi-CMFR: 一種非基於深度學習的稀疏單細胞染色體三維交聯資料增強技術 Hi-CMFR: A Non-Deep-Learning Enhancement Method for Sparse Single-Cell Hi-C Data |
作者: | 謝皓雲 Hsieh, Hao-Yun |
貢獻者: | 張家銘 Chang, Jia-Ming 謝皓雲 Hsieh, Hao-Yun |
關鍵詞: | 高通量染色體結構捕獲技術 Hi-C 染色質分析 單細胞 Hi-C Hi-C 解析度增強 Hi-C Chromatin Analysis Single-Cell Hi-C Hi-C Resolution Enhancement |
日期: | 2025 |
上傳時間: | 2025-09-01 16:57:28 (UTC+8) |
摘要: | Hi-C (高通量染色體結構捕獲技術)是理解基因體結構的重要工具。然而,高解析度的 Hi-C 實驗通常需要相當高的成本。因此較低解析度且成本更具效益的單細胞 Hi-C (single-cell Hi-C, scHi-C) 已被廣泛應用。不過 scHi-C 資料往往具有高度稀疏的特性,這使如何增強與補全稀疏的 scHi-C 資料成為一項重要課題。雖然已有多項研究嘗試解決此問題,但多數方法依賴於深度學習,換言之,這些既有方法高度仰賴預訓練模型中的訓練資料。導致在應用於未包含於訓練資料的細胞株時可能引入幻覺或偏差。此外,基於深度學習的方法通常需要大量的計算資源與運算時間。 為了克服這些挑戰,本研究提出 Hi-CMFR,一種專為處理稀疏的低解析度 bulk Hi-C 或 scHi-C 資料設計的非深度學習計算方法。該方法結合卷積核心 (convolution kernels)、隨機矩陣理論 (random matrix theory)、矩陣分解 (matrix factorization) 與隨機回填 (reinsertion) 等數學操作實現了類似Diffusion模型「添加噪聲再降噪」的概念架構,能高效運行,並適用於任何細胞株的低解析度 bulk Hi-C 及稀疏 scHi-C 資料。實驗結果顯示Hi-CMFR 在多項評估指標,如結構相似度 (structural similarity, SSIM) 等等,皆優於傳統方法,如 SVD 與 平均濾波器 (mean filters)。同時,其表現亦可與基於深度學習的方法(如 DeepHiC 與 Higashi)相媲美,尤其在處理極度稀疏的資料集時更具優勢。 在使用Louvain 演算法進行拓樸相關區域 (topologically associated domain, TAD) 鑑別時,Hi-CMFR 可達到平均 SSIM ≈ 0.7 與 AMI ≈ 0.9的成效。其高效率、靈活性與低運算成本,展示了非深度學習方法於基因體資料增強上的巨大潛力。作為一個穩健且高效的解決方案,Hi-CMFR 不僅能補足現有深度學習工具的不足,也為低解析度基因體資料的處理開啟了新的研究方向。 Hi-C is an essential tool for understanding genomic structures. However, conducting high-resolution Hi-C experiments typically requires considerable costs. As a result, lower-resolution and more cost-effective single-cell Hi-C (scHi-C) has become widely used. Nevertheless, scHi-C data is often characterized by high sparsity, making the task of enriching and enhancing sparse scHi-C data an important issue. While several studies have addressed this problem, their solutions often involve deep learning, which heavily relies on training data used in pre-trained models. This reliance may introduce biases when applying such models to cell lines not included in the training data. Furthermore, deep learning-based methods require extensive computational time. To address these challenges, this paper proposes Hi-CMFR, a non-deep-learning computational solution specifically designed for handling sparse scHi-C data. By integrating mathematical operations such as convolution kernels, random matrix theory, matrix factorization, and stochastic reinsertion, Hi-CMFR operates efficiently and is applicable to both low-resolution bulk Hi-C and sparse scHi-C data from any cell line. Experimental results demonstrate that Hi-CMFR outperforms traditional methods, such as SVD and mean filters, in multiple evaluation metrics like structural similarity (SSIM) and adjusted mutual information (AMI). Additionally, it provides competitive performance compared to deep-learning-based methods, such as DeepHiC and Higashi, particularly excelling in processing extremely sparse datasets. Hi-CMFR achieves an average SSIM of approximately 0.7 and an AMI score of around 0.9 for topologically associated domain (TAD) identification using the Louvain algorithm. Its efficiency, flexibility, and low computational cost highlight the immense potential of non-deep-learning approaches in genomic data enhancement. As a robust and efficient solution, Hi-CMFR serves as a valuable complement to existing scHi-C data analysis tools and opens new directions for handling low-resolution genomic data. |
參考文獻: | Erez Lieberman-Aiden, Nynke L. van Berkum, et al. “Comprehensive mapping of long-range interactions reveals folding principles of the human genome.” Science 326 (2009). GScholar Citations: 1626. Cover Article. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014 Dec 18;159(7):1665-80. doi: 10.1016/j.cell.2014.11.021. Epub 2014 Dec 11. Erratum in: Cell. 2015 Jul 30;162(3):687-8. PMID: 25497547; PMCID: PMC5635824. Gavrilov, A., Eivazova, E., Pirozhkova, I., Lipinski, M., Razin, S., Vassetzky, Y. (2009). Chromosome Conformation Capture (from 3C to 5C) and Its ChIP-Based Modification. In: Collas, P. (eds) Chromatin Immunoprecipitation Assays. Methods in Molecular Biology, vol 567. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-414-2_12 Dixon, J., Selvaraj, S., Yue, F. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). https://doi.org/10.1038/nature11082 Nora, E., Lajoie, B., Schulz, E. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012). https://doi.org/10.1038/nature11049 Harris, H.L., Gu, H., Olshansky, M. et al. Chromatin alternates between A and B compartments at kilobase scale for subgenic organization. Nat Commun 14, 3303 (2023). https://doi.org/10.1038/s41467-023-38429-1 Nagano, T., Lubling, Y., Stevens, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013). https://doi.org/10.1038/nature12593 Erhu Liu, Hongqiang Lyu, Yuan Liu, Laiyi Fu, Xiaoliang Cheng, Xiaoran Yin, Identifying TAD-like domains on single-cell Hi-C data by graph embedding and changepoint detection, Bioinformatics, Volume 40, Issue 3, March 2024, https://doi.org/10.1093/bioinformatics/btae138 J. Zhou, J. Ma, Y. Chen, C. Cheng, B. Bao, J. Peng, T.J. Sejnowski, J.R. Dixon, J.R. Ecker, Robust single-cell Hi-C clustering by convolution- and random-walk–based imputation, Proc. Natl. Acad. Sci. U.S.A. 116 (28) 14011-14018, https://doi.org/10.1073/pnas.1901423116 (2019). Zhang, R., Zhou, T. & Ma, J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat Biotechnol 40, 254–261 (2022). https://doi.org/10.1038/s41587-021-01034-y Lin, M.-H.; Hou, Z.-X.; Cheng, K.-H.; Wu, C.-H.; Peng, Y.-T. Image Denoising Using Adaptive and Overlapped Average Filtering and Mixed-Pooling Attention Refinement Networks. Mathematics 2021, 9, 1130. https://doi.org/10.3390/math9101130
Yang, C., Liang, L., Su, Z., Real-World Denoising via Diffusion Model, arXiv 2305.04457, https://doi.org/10.48550/arXiv.2305.04457 (2023). Zhang, Y., An, L., Xu, J. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9, 750 (2018). https://doi.org/10.1038/s41467-018-03113-2 Tong Liu, Zheng Wang, HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data, Bioinformatics, Volume 35, Issue 21, November 2019, Pages 4222–4228, https://doi.org/10.1093/bioinformatics/btz251 Hong H, Jiang S, Li H, Du G, Sun Y, Tao H, Quan C, Zhao C, Li R, Li W, Yin X, Huang Y, Li C, Chen H, Bo X. DeepHiC: A generative adversarial network for enhancing Hi-C data resolution. PLoS Comput Biol. 2020 Feb 21;16(2):e1007287. doi: 10.1371/journal.pcbi.1007287. PMID: 32084131; PMCID: PMC7055922. Norton HK, Emerson DJ, Huang H, Kim J, Titus KR, Gu S, Bassett DS, Phillips-Cremins JE. Detecting hierarchical genome folding with network modularity. Nat Methods. 2018 Feb;15(2):119-122. doi: 10.1038/nmeth.4560. Epub 2018 Jan 15. PMID: 29334377; PMCID: PMC6029251. Elyanow R, Dumitrascu B, Engelhardt BE, Raphael BJ. netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis. Genome Res. 2020 Feb;30(2):195-204. doi: 10.1101/gr.251603.119. Epub 2020 Jan 28. PMID: 31992614; PMCID: PMC7050525. Q. Guo, C. Zhang, Y. Zhang and H. Liu, "An Efficient SVD-Based Method for Image Denoising," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 5, pp. 868-880, May 2016, doi: 10.1109/TCSVT.2015.2416631. Marchenko, V. A., & Pastur, L. A. (1967). “Distribution of Eigenvalues for Some Sets of Random Matrices.” Mathematics of the USSR-Sbornik, 1(4), 457-483. Dimitrova, D. S., Kaishev, V. K., & Tan, S. (2020). Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed, or Continuous. Journal of Statistical Software, 95(10). https://doi.org/10.18637/jss.v095.i10 Rao SS, Huntley MH, Durand NC, Stamenova EK et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014 Dec 18;159(7):1665-80. PMID: 25497547 Sanborn AL, Rao SS, Huang SC, Durand NC et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci U S A 2015 Nov 24;112(47):E6456-65. PMID: 26499245 HiC-straw, https://github.com/igvteam/hic-straw Knight PA, Ruiz D. A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis. 2013;33(3):1029–47. Blondel, Vincent D., et al. "Fast unfolding of communities in large networks."1 Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, 2008, p. P10008. doi:10.1088/1742-5468/2008/10/P10008 Crane, E., Bian, Q., McCord, R. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015). https://doi.org/10.1038/nature14450 Erdmann-Pham, D.D., Batra, S.S., Turkalo, T.K. et al. Tracing cancer evolution and heterogeneity using Hi-C. Nat Commun 14, 7111 (2023). https://doi.org/10.1038/s41467-023-42651-2 Wang, S., Lee, S., Chu, C. et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol 21, 73 (2020). https://doi.org/10.1186/s13059-020-01986-5 |
描述: | 碩士 國立政治大學 資訊科學系 112753120 |
資料來源: | http://thesis.lib.nccu.edu.tw/record/#G0112753120 |
資料類型: | thesis |
顯示於類別: | [資訊科學系] 學位論文
|
文件中的檔案:
檔案 |
大小 | 格式 | 瀏覽次數 |
312001.pdf | 18641Kb | Adobe PDF | 0 | 檢視/開啟 |
|
在政大典藏中所有的資料項目都受到原著作權保護.
|