政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/141641
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113318/144297 (79%)
造访人次 : 51087956      在线人数 : 880
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/141641


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/141641


    题名: 利用平滑化處理與參照控制HiC資料來優化找尋基因體拷貝數變異
    Improve the identification of Copy Number Variation using Smoothing Strategy and Incorporating Control HiC Data
    作者: 陳韋翰
    Chen, Wei-Han
    贡献者: 張家銘
    Chang, Jia-Ming
    陳韋翰
    Chen, Wei-Han
    关键词: 基因體拷貝數變異
    高通量染色體結構捕獲技術
    全基因組定序
    Copy Number Variation
    HiC
    Whole Genome Sequencing
    日期: 2022
    上传时间: 2022-09-02 15:05:31 (UTC+8)
    摘要: 基因體拷貝數變異多存在於不正常細胞中,如:腫瘤細胞。針對該類細胞如何偵測基因體 拷貝數變異對序列資料來說非常重要,移除了這些序列相關的偏差值可以讓下游的分析更 為準確。基因體拷貝數變異的現象也會出現在HiC資料當中,因此HiC可以作為偵測基因 體拷貝數變異的材料,而HiNT為目前利用HiC找出基因體拷貝數變異的方法中最頂尖的; 但在HiNT的正規化步驟中存在著震盪現象,因此我們藉由增加平滑化的處理以及參照HiC 控制組資料來減少震盪現象並且提升HiNT的準確度;最終我們得到更高的斯皮爾曼相關 係數(0.868 對比 0.837)、成功地預測更多的基因體拷貝數變異、更高的精准度(0.800 對比 0.750)與召回率(0.324 對比 0.243)。除此之外,我們若選擇只使用了自身染色體 的HiC資料時,在準確度略減的情況下,可以有更快的運算時間(1小時對比6分鐘)。
    Copy number variation (CNV) often exists in abnormal cells such as cancer. Detecting the CNV of these cell lines is crucial for sequencing data since it makes downstream analysis more correct thanks to removing sequencing bias. The phenomenon of CNV appears on HiC data, as well. Thus HiC can be a material to identify CNV where HiNT is the state-of-the-art method. However, there exists a fluctuation phenomenon in the normalization step of HiNT. In this work, we want to eliminate the fluctuation phenomenon and further improve the performance of HiNT by adding a smoothing procedure which is a mean filter technique, and using HiC of the control cell line in the normalization step. As a result, we achieve a higher Spearman Correlation Coefficient (0.868 v.s. 0.837), more consistent CNV segments, higher precision (0.8 v.s. 0.75), and recall (0.324 v.s. 0.243). Besides, we speed up the running time ten times faster by using only intra-chromosomal information without losing too much performance.
    參考文獻: 1. Rui Yin, Chee Keong Kwoh, Jie Zheng, Whole Genome Sequencing Analysis, Editor(s): Shoba Ranganathan, Michael Gribskov, Kenta Nakai, Christian Schönbach, Encyclopedia of Bioinformatics and Computational Biology, Academic Press, 2019, Pages 176-183,
    2. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009 Oct 9;326(5950):289-93.
    3. Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012 Feb 3;148(3):458-72.
    4. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007 Jun 8;316(5830):1497-502.
    5. Ashoor H, Louis-Brennetot C, Janoueix-Lerosey I, Bajic VB, Boeva V. HMCan-diff: a method to detect changes in histone modifications in cells with different genetic characteristics. Nucleic Acids Res. 2017 May 5;45(8):e58.
    6. Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, Barillot E. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics. 2011 Jan 15;27(2):268-9.
    7. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012 Feb 1;28(3):423-5.
    8. Abyzov A, Urban AE, Snyder M, Gerstein M "CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing." Genome Res. 2011 Jun;21(6):974-84.
    9. Milovan Suvakov, Arijit Panda, Colin Diesh, Ian Holmes, Alexej Abyzov, CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing, GigaScience, 2011 Nov;10(11):giab074
    10. Xi R, Lee S, Xia Y, Kim TM, Park PJ. Copy number analysis of whole-genome data using BIC-seq2 and its application to detection of cancer susceptibility variants. Nucleic Acids Res. 2016 Jul 27;44(13):6274-86.
    11. Harewood L, Kishore K, Eldridge MD, Wingett S, Pearson D, Schoenfelder S, Collins VP, Fraser P. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumors. Genome Biol. 2017 Jun 27;18(1):125.
    12. Chakraborty A, Ay F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics. 2018 Jan 15;34(2):338-345.
    13. Vidal E, le Dily F, Quilez J, Stadhouders R, Cuartero Y, Graf T, Marti-Renom MA, Beato M, Filion GJ. OneD: increasing reproducibility of HiC samples with abnormal karyotypes. Nucleic Acids Res. 2018 May 4;46(8):e49.
    14. Khalil AIS, Muzaki SRBM, Chattopadhyay A, Sanyal A. Identification and utilization of copy number information for correcting Hi-C contact map of cancer cell lines. BMC Bioinformatics. 2020 Nov 7;21(1):506.
    15. Wang, S., Lee, S., Chu, C. et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol 21, 73 (2020).
    16. Rao SS, Huntley MH, Durand NC, Stamenova EK et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014 Dec 18;159(7):1665-80.
    17. Razin SV, Gavrilov AA. Structural-Functional Domains of the Eukaryotic Genome. Biochemistry (Mosc). 2018 Apr;83(4):302-312.
    18. John D. Hunter. Matplotlib: A 2D Graphics Environment. Computing in Science and Engg. 2007 May;9(3):90–95.
    19. Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, Aiden EL. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016 Jul;3(1):99-101.
    20. Wood, S. N. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B). 2011;73(1):3–36.
    21. Arce, Gonzalo R. Nonlinear Signal Processing: A Statistical Approach. New Jersey, USA: Wiley. 2004 Nov. ISBN 0-471-67624-1.
    描述: 碩士
    國立政治大學
    資訊科學系
    109753144
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0109753144
    数据类型: thesis
    DOI: 10.6814/NCCU202201407
    显示于类别:[資訊科學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    314401.pdf6081KbAdobe PDF2134检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈