Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/141641
|
Title: | 利用平滑化處理與參照控制HiC資料來優化找尋基因體拷貝數變異 Improve the identification of Copy Number Variation using Smoothing Strategy and Incorporating Control HiC Data |
Authors: | 陳韋翰 Chen, Wei-Han |
Contributors: | 張家銘 Chang, Jia-Ming 陳韋翰 Chen, Wei-Han |
Keywords: | 基因體拷貝數變異 高通量染色體結構捕獲技術 全基因組定序 Copy Number Variation HiC Whole Genome Sequencing |
Date: | 2022 |
Issue Date: | 2022-09-02 15:05:31 (UTC+8) |
Abstract: | 基因體拷貝數變異多存在於不正常細胞中,如:腫瘤細胞。針對該類細胞如何偵測基因體 拷貝數變異對序列資料來說非常重要,移除了這些序列相關的偏差值可以讓下游的分析更 為準確。基因體拷貝數變異的現象也會出現在HiC資料當中,因此HiC可以作為偵測基因 體拷貝數變異的材料,而HiNT為目前利用HiC找出基因體拷貝數變異的方法中最頂尖的; 但在HiNT的正規化步驟中存在著震盪現象,因此我們藉由增加平滑化的處理以及參照HiC 控制組資料來減少震盪現象並且提升HiNT的準確度;最終我們得到更高的斯皮爾曼相關 係數(0.868 對比 0.837)、成功地預測更多的基因體拷貝數變異、更高的精准度(0.800 對比 0.750)與召回率(0.324 對比 0.243)。除此之外,我們若選擇只使用了自身染色體 的HiC資料時,在準確度略減的情況下,可以有更快的運算時間(1小時對比6分鐘)。 Copy number variation (CNV) often exists in abnormal cells such as cancer. Detecting the CNV of these cell lines is crucial for sequencing data since it makes downstream analysis more correct thanks to removing sequencing bias. The phenomenon of CNV appears on HiC data, as well. Thus HiC can be a material to identify CNV where HiNT is the state-of-the-art method. However, there exists a fluctuation phenomenon in the normalization step of HiNT. In this work, we want to eliminate the fluctuation phenomenon and further improve the performance of HiNT by adding a smoothing procedure which is a mean filter technique, and using HiC of the control cell line in the normalization step. As a result, we achieve a higher Spearman Correlation Coefficient (0.868 v.s. 0.837), more consistent CNV segments, higher precision (0.8 v.s. 0.75), and recall (0.324 v.s. 0.243). Besides, we speed up the running time ten times faster by using only intra-chromosomal information without losing too much performance. |
Reference: | 1. Rui Yin, Chee Keong Kwoh, Jie Zheng, Whole Genome Sequencing Analysis, Editor(s): Shoba Ranganathan, Michael Gribskov, Kenta Nakai, Christian Schönbach, Encyclopedia of Bioinformatics and Computational Biology, Academic Press, 2019, Pages 176-183, 2. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009 Oct 9;326(5950):289-93. 3. Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012 Feb 3;148(3):458-72. 4. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007 Jun 8;316(5830):1497-502. 5. Ashoor H, Louis-Brennetot C, Janoueix-Lerosey I, Bajic VB, Boeva V. HMCan-diff: a method to detect changes in histone modifications in cells with different genetic characteristics. Nucleic Acids Res. 2017 May 5;45(8):e58. 6. Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, Barillot E. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics. 2011 Jan 15;27(2):268-9. 7. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012 Feb 1;28(3):423-5. 8. Abyzov A, Urban AE, Snyder M, Gerstein M "CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing." Genome Res. 2011 Jun;21(6):974-84. 9. Milovan Suvakov, Arijit Panda, Colin Diesh, Ian Holmes, Alexej Abyzov, CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing, GigaScience, 2011 Nov;10(11):giab074 10. Xi R, Lee S, Xia Y, Kim TM, Park PJ. Copy number analysis of whole-genome data using BIC-seq2 and its application to detection of cancer susceptibility variants. Nucleic Acids Res. 2016 Jul 27;44(13):6274-86. 11. Harewood L, Kishore K, Eldridge MD, Wingett S, Pearson D, Schoenfelder S, Collins VP, Fraser P. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumors. Genome Biol. 2017 Jun 27;18(1):125. 12. Chakraborty A, Ay F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics. 2018 Jan 15;34(2):338-345. 13. Vidal E, le Dily F, Quilez J, Stadhouders R, Cuartero Y, Graf T, Marti-Renom MA, Beato M, Filion GJ. OneD: increasing reproducibility of HiC samples with abnormal karyotypes. Nucleic Acids Res. 2018 May 4;46(8):e49. 14. Khalil AIS, Muzaki SRBM, Chattopadhyay A, Sanyal A. Identification and utilization of copy number information for correcting Hi-C contact map of cancer cell lines. BMC Bioinformatics. 2020 Nov 7;21(1):506. 15. Wang, S., Lee, S., Chu, C. et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol 21, 73 (2020). 16. Rao SS, Huntley MH, Durand NC, Stamenova EK et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014 Dec 18;159(7):1665-80. 17. Razin SV, Gavrilov AA. Structural-Functional Domains of the Eukaryotic Genome. Biochemistry (Mosc). 2018 Apr;83(4):302-312. 18. John D. Hunter. Matplotlib: A 2D Graphics Environment. Computing in Science and Engg. 2007 May;9(3):90–95. 19. Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, Aiden EL. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016 Jul;3(1):99-101. 20. Wood, S. N. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B). 2011;73(1):3–36. 21. Arce, Gonzalo R. Nonlinear Signal Processing: A Statistical Approach. New Jersey, USA: Wiley. 2004 Nov. ISBN 0-471-67624-1. |
Description: | 碩士 國立政治大學 資訊科學系 109753144 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109753144 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202201407 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
314401.pdf | 6081Kb | Adobe PDF2 | 133 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|