政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/159414

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 118260/149296 (79%)
造訪人次 : 77145616 線上人數 : 354

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/159414

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/159414

題名:	Hi-CMFR: 一種非基於深度學習的稀疏單細胞染色體三維交聯資料增強技術 Hi-CMFR: A Non-Deep-Learning Enhancement Method for Sparse Single-Cell Hi-C Data
作者:	謝皓雲 Hsieh, Hao-Yun
貢獻者:	張家銘 Chang, Jia-Ming 謝皓雲 Hsieh, Hao-Yun
關鍵詞:	高通量染色體結構捕獲技術 Hi-C 染色質分析單細胞 Hi-C Hi-C 解析度增強 Hi-C Chromatin Analysis Single-Cell Hi-C Hi-C Resolution Enhancement
日期:	2025
上傳時間:	2025-09-01 16:57:28 (UTC+8)
摘要:	Hi-C （高通量染色體結構捕獲技術）是理解基因體結構的重要工具。然而，高解析度的 Hi-C 實驗通常需要相當高的成本。因此較低解析度且成本更具效益的單細胞 Hi-C (single-cell Hi-C, scHi-C) 已被廣泛應用。不過 scHi-C 資料往往具有高度稀疏的特性，這使如何增強與補全稀疏的 scHi-C 資料成為一項重要課題。雖然已有多項研究嘗試解決此問題，但多數方法依賴於深度學習，換言之，這些既有方法高度仰賴預訓練模型中的訓練資料。導致在應用於未包含於訓練資料的細胞株時可能引入幻覺或偏差。此外，基於深度學習的方法通常需要大量的計算資源與運算時間。為了克服這些挑戰，本研究提出 Hi-CMFR，一種專為處理稀疏的低解析度 bulk Hi-C 或 scHi-C 資料設計的非深度學習計算方法。該方法結合卷積核心 (convolution kernels)、隨機矩陣理論 (random matrix theory)、矩陣分解 (matrix factorization) 與隨機回填 (reinsertion) 等數學操作實現了類似Diffusion模型「添加噪聲再降噪」的概念架構，能高效運行，並適用於任何細胞株的低解析度 bulk Hi-C 及稀疏 scHi-C 資料。實驗結果顯示Hi-CMFR 在多項評估指標，如結構相似度 (structural similarity, SSIM) 等等，皆優於傳統方法，如 SVD 與平均濾波器 (mean filters)。同時，其表現亦可與基於深度學習的方法（如 DeepHiC 與 Higashi）相媲美，尤其在處理極度稀疏的資料集時更具優勢。在使用Louvain 演算法進行拓樸相關區域 (topologically associated domain, TAD) 鑑別時，Hi-CMFR 可達到平均 SSIM ≈ 0.7 與 AMI ≈ 0.9的成效。其高效率、靈活性與低運算成本，展示了非深度學習方法於基因體資料增強上的巨大潛力。作為一個穩健且高效的解決方案，Hi-CMFR 不僅能補足現有深度學習工具的不足，也為低解析度基因體資料的處理開啟了新的研究方向。 Hi-C is an essential tool for understanding genomic structures. However, conducting high-resolution Hi-C experiments typically requires considerable costs. As a result, lower-resolution and more cost-effective single-cell Hi-C (scHi-C) has become widely used. Nevertheless, scHi-C data is often characterized by high sparsity, making the task of enriching and enhancing sparse scHi-C data an important issue. While several studies have addressed this problem, their solutions often involve deep learning, which heavily relies on training data used in pre-trained models. This reliance may introduce biases when applying such models to cell lines not included in the training data. Furthermore, deep learning-based methods require extensive computational time. To address these challenges, this paper proposes Hi-CMFR, a non-deep-learning computational solution specifically designed for handling sparse scHi-C data. By integrating mathematical operations such as convolution kernels, random matrix theory, matrix factorization, and stochastic reinsertion, Hi-CMFR operates efficiently and is applicable to both low-resolution bulk Hi-C and sparse scHi-C data from any cell line. Experimental results demonstrate that Hi-CMFR outperforms traditional methods, such as SVD and mean filters, in multiple evaluation metrics like structural similarity (SSIM) and adjusted mutual information (AMI). Additionally, it provides competitive performance compared to deep-learning-based methods, such as DeepHiC and Higashi, particularly excelling in processing extremely sparse datasets. Hi-CMFR achieves an average SSIM of approximately 0.7 and an AMI score of around 0.9 for topologically associated domain (TAD) identification using the Louvain algorithm. Its efficiency, flexibility, and low computational cost highlight the immense potential of non-deep-learning approaches in genomic data enhancement. As a robust and efficient solution, Hi-CMFR serves as a valuable complement to existing scHi-C data analysis tools and opens new directions for handling low-resolution genomic data.
參考文獻:	Erez Lieberman-Aiden, Nynke L. van Berkum, et al. “Comprehensive mapping of long-range interactions reveals folding principles of the human genome.” Science 326 (2009). GScholar Citations: 1626. Cover Article. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014 Dec 18;159(7):1665-80. doi: 10.1016/j.cell.2014.11.021. Epub 2014 Dec 11. Erratum in: Cell. 2015 Jul 30;162(3):687-8. PMID: 25497547; PMCID: PMC5635824. Gavrilov, A., Eivazova, E., Pirozhkova, I., Lipinski, M., Razin, S., Vassetzky, Y. (2009). Chromosome Conformation Capture (from 3C to 5C) and Its ChIP-Based Modification. In: Collas, P. (eds) Chromatin Immunoprecipitation Assays. Methods in Molecular Biology, vol 567. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-414-2_12 Dixon, J., Selvaraj, S., Yue, F. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). https://doi.org/10.1038/nature11082 Nora, E., Lajoie, B., Schulz, E. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012). https://doi.org/10.1038/nature11049 Harris, H.L., Gu, H., Olshansky, M. et al. Chromatin alternates between A and B compartments at kilobase scale for subgenic organization. Nat Commun 14, 3303 (2023). https://doi.org/10.1038/s41467-023-38429-1 Nagano, T., Lubling, Y., Stevens, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013). https://doi.org/10.1038/nature12593 Erhu Liu, Hongqiang Lyu, Yuan Liu, Laiyi Fu, Xiaoliang Cheng, Xiaoran Yin, Identifying TAD-like domains on single-cell Hi-C data by graph embedding and changepoint detection, Bioinformatics, Volume 40, Issue 3, March 2024, https://doi.org/10.1093/bioinformatics/btae138 J. Zhou, J. Ma, Y. Chen, C. Cheng, B. Bao, J. Peng, T.J. Sejnowski, J.R. Dixon, J.R. Ecker, Robust single-cell Hi-C clustering by convolution- and random-walk–based imputation, Proc. Natl. Acad. Sci. U.S.A. 116 (28) 14011-14018, https://doi.org/10.1073/pnas.1901423116 (2019). Zhang, R., Zhou, T. & Ma, J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat Biotechnol 40, 254–261 (2022). https://doi.org/10.1038/s41587-021-01034-y Lin, M.-H.; Hou, Z.-X.; Cheng, K.-H.; Wu, C.-H.; Peng, Y.-T. Image Denoising Using Adaptive and Overlapped Average Filtering and Mixed-Pooling Attention Refinement Networks. Mathematics 2021, 9, 1130. https://doi.org/10.3390/math9101130 Yang, C., Liang, L., Su, Z., Real-World Denoising via Diffusion Model, arXiv 2305.04457, https://doi.org/10.48550/arXiv.2305.04457 (2023). Zhang, Y., An, L., Xu, J. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9, 750 (2018). https://doi.org/10.1038/s41467-018-03113-2 Tong Liu, Zheng Wang, HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data, Bioinformatics, Volume 35, Issue 21, November 2019, Pages 4222–4228, https://doi.org/10.1093/bioinformatics/btz251 Hong H, Jiang S, Li H, Du G, Sun Y, Tao H, Quan C, Zhao C, Li R, Li W, Yin X, Huang Y, Li C, Chen H, Bo X. DeepHiC: A generative adversarial network for enhancing Hi-C data resolution. PLoS Comput Biol. 2020 Feb 21;16(2):e1007287. doi: 10.1371/journal.pcbi.1007287. PMID: 32084131; PMCID: PMC7055922. Norton HK, Emerson DJ, Huang H, Kim J, Titus KR, Gu S, Bassett DS, Phillips-Cremins JE. Detecting hierarchical genome folding with network modularity. Nat Methods. 2018 Feb;15(2):119-122. doi: 10.1038/nmeth.4560. Epub 2018 Jan 15. PMID: 29334377; PMCID: PMC6029251. Elyanow R, Dumitrascu B, Engelhardt BE, Raphael BJ. netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis. Genome Res. 2020 Feb;30(2):195-204. doi: 10.1101/gr.251603.119. Epub 2020 Jan 28. PMID: 31992614; PMCID: PMC7050525. Q. Guo, C. Zhang, Y. Zhang and H. Liu, "An Efficient SVD-Based Method for Image Denoising," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 5, pp. 868-880, May 2016, doi: 10.1109/TCSVT.2015.2416631. Marchenko, V. A., & Pastur, L. A. (1967). “Distribution of Eigenvalues for Some Sets of Random Matrices.” Mathematics of the USSR-Sbornik, 1(4), 457-483. Dimitrova, D. S., Kaishev, V. K., & Tan, S. (2020). Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed, or Continuous. Journal of Statistical Software, 95(10). https://doi.org/10.18637/jss.v095.i10 Rao SS, Huntley MH, Durand NC, Stamenova EK et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014 Dec 18;159(7):1665-80. PMID: 25497547 Sanborn AL, Rao SS, Huang SC, Durand NC et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci U S A 2015 Nov 24;112(47):E6456-65. PMID: 26499245 HiC-straw, https://github.com/igvteam/hic-straw Knight PA, Ruiz D. A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis. 2013;33(3):1029–47. Blondel, Vincent D., et al. "Fast unfolding of communities in large networks."1 Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, 2008, p. P10008. doi:10.1088/1742-5468/2008/10/P10008 Crane, E., Bian, Q., McCord, R. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015). https://doi.org/10.1038/nature14450 Erdmann-Pham, D.D., Batra, S.S., Turkalo, T.K. et al. Tracing cancer evolution and heterogeneity using Hi-C. Nat Commun 14, 7111 (2023). https://doi.org/10.1038/s41467-023-42651-2 Wang, S., Lee, S., Chu, C. et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol 21, 73 (2020). https://doi.org/10.1186/s13059-020-01986-5
描述:	碩士國立政治大學資訊科學系 112753120
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0112753120
資料類型:	thesis
顯示於類別:	[資訊科學系] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
312001.pdf	18641Kb	Adobe PDF	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋