政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/128110
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 113160/144130 (79%)
造訪人次 : 50760466      線上人數 : 882
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/128110
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/128110


    題名: CycleCoopNet: 基於合作學習的神經網路進行圖片轉換
    CycleCoopNet: Image-to-Image Translation with Cooperative Learning Networks
    作者: 翁健豪
    Weng, Chien-Hao
    貢獻者: 郁方
    翁健豪
    Weng, Chien-Hao
    關鍵詞: 生成式合作網路
    圖像轉換
    深度學習
    神經網路
    Cooperative learning networks
    Image-to-Image Translation
    deep learning
    neural network
    日期: 2019
    上傳時間: 2020-01-03 15:53:39 (UTC+8)
    摘要: 本文提出了一種新的圖像到圖像轉換方法,CycleCoopNet。圖像到圖像的轉換是一種將圖片從一種樣式更改為另一種樣式的方法,透過該方法,我們可以創建不存在的新穎圖片。 CycleCoopNet採用CoopNet框架,具有兩個主要模型,稱為generator和descriptor。generator生成圖片,該圖片由descriptor通過MCMC(Markov Chain Monte Carlo)採樣進行修訂,因此可以從descriptor指導的監督式學習中讓generator學習。另一方面,descriptor透過 modified contrastive divergence從數據中學習,使得descriptor被調整為與修改後的數據和實數據輸出相同的結果。
    先前的一些論文也有實作圖像到圖像的轉換方法。 CycleGAN是與我們的作品類似的著名作品之一,它使用GAN(生成對抗網絡)的概念來演示此方法。它演示了進行圖像到圖像轉換的良好性能。但是,CycleGAN通過無監督學習生成圖片,也就是說,generator的結果在學習過程中沒有標準的生成圖片答案。 CycleGAN僅使用discriminator來確定結果是正確還是不正確。每個結果僅需要通過discriminator測試,這可以使generator僅需要找到如何通過discriminator測試,而無需嘗試找到正確的生成答案或更多可能的答案。這個問題我們稱為Mode collapse,它導致結果的變異性較小,也就是說,generator始終生成相同的圖片,以獲得更好的分數。
    在我們的實驗中,我們使用edge2handbags數據集來觀察圖片如何從草圖更改為背包。我們發現我們的模型可以產生更多不同的結果。並且這些結果可以被另一個相反的generator模型穩定地恢復到原始圖片。另一個實驗我們使用vangogh2photo數據集來觀察圖片如何從照片變為VanGogh風格的圖片。我們展示了我們的模型可以做出更好的變化。
    我們的目標是透過將discriminator更改為descriptor來升級該網路。descriptor模型改編自CoopNet(合作神經網絡)。這個想法改變discriminator(descriptor)卷積網絡的輸出尺寸。使用descriptor可以讓我們的generator標記答案來調整其模型參數,並具有將此問題更改為監督式學習問題的能力。另外,使用descriptor可以防止Mode collapse。避免generator始終生成相似的結果。
    This paper proposes a new Image-to-Image translation method, CycleCoopNet. The image-to-image translation is a method of changing pictures from one style to another style, with which we can create novel pictures that do not exist. CycleCoopNet adopts the CoopNet framework with two main models called generator and descriptor. The generator generates pictures that are revised by the descriptor with MCMC (Markov Chain Monte Carlo) sampling, thus the generator is learned from supervised learning guided by the descriptor. On the other hand, the descriptor learns from real data by modified contrastive divergence, such that the descriptor is adjusted to output the same vector from the revised data and the real data.
    Several previous works are doing the Image-to-Image translation method. CycleGAN is one of the famous work doing similar working as our work, it used the concept of GAN (generative adversarial network) to demonstrate this method. It demonstrates the nice performance of doing Image-to-Image translation. However, CycleGAN generating pictures by unsupervised learning, that is, the results of the generator does NOT have a standard generated pictures answer in the learning process. CycleGAN only uses the discriminator to decide the results are correct or incorrect. Every result only needs to pass the discriminator testing, this can make the generator only need to find how to pass the discriminator testing and NOT trying to find the correct generated answer or more possible answers. This problem we called Mode collapse, that causes the results with less variability, that is, the generator always generates the same picture cheating discriminator to getting a better score.
    In our experiments, we use the edges2handbags dataset to observe how does the picture change from sketches to bags. We found that our model can generate more diverse results. And these results can be recovered to the origin picture by another opposite generator model stably. Another experiment we use vangogh2photo dataset to observe how does the picture change from photos to VanGogh-style pictures. We show our model can make a better variety.
    Our goal is to upgrade this network by changing the discriminator to the descriptor. The descriptor model is adapted from the CoopNet(Cooperative Neural Network). The idea is changing the output dimension of the discriminator (descriptor) convolutional network. Using the descriptor can let our generator have labeled answer to adjust its model parameters, and change this problem to supervised learning. Also, using a descriptor can prevent from the Mode collapse. Avoid the generator always generate similar patterns.
    參考文獻: [1] J. Xie, Y. Lu, R. Gao, and Y. N. Wu, “Cooperative learning of energy-based model and latent variable model via mcmc teaching,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    [2] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232.
    [3] J. Hui, “GAN — Why it is so hard to train Genera- tive Adversarial Networks!” https://medium.com/@jonathan hui/ gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b, 2018.
    [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
    [5] J. Xie, Y. Lu, S.-C. Zhu, and Y. Wu, “A theory of generative convnet,” in Interna- tional Conference on Machine Learning, 2016, pp. 2635–2644.
    [6] Y. Lu, S.-C. Zhu, and Y. N. Wu, “Learning frame models using cnn filters,” arXiv preprint arXiv:1509.08379, 2015.
    [7] T. Han, Y. Lu, S.-C. Zhu, and Y. N. Wu, “Alternating back-propagation for generator network,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
    [8] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
    [9] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in European conference on computer vision. Springer, 2016, pp. 649–666.
    [10] G. Larsson, M. Maire, and G. Shakhnarovich, “Learning representations for auto- matic colorization,” in European Conference on Computer Vision. Springer, 2016, pp. 577–593.
    [11] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with si- multaneous classification,” ACM Transactions on Graphics (TOG), vol. 35, no. 4, p. 110, 2016.
    [12] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin, “Image analo- gies,” in Proceedings of the 28th annual conference on Computer graphics and inter- active techniques. ACM, 2001, pp. 327–340.
    [13] A. A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” in Proceedings of the seventh IEEE international conference on computer vision, vol. 2. IEEE, 1999, pp. 1033–1038.
    [14] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
    [15] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
    [16] P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays, “Scribbler: Controlling deep image synthesis with sketch and color,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5400–5409.
    [17] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learn- ing with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
    [18] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
    [19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image com- puting and computer-assisted intervention. Springer, 2015, pp. 234–241.
    [20] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural computation, vol. 14, no. 8, pp. 1771–1800, 2002.
    [21] R. Salakhutdinov and G. Hinton, “Deep boltzmann machines,” in Artificial intelli- gence and statistics, 2009, pp. 448–455.
    [22] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
    [23] T. Kim and Y. Bengio, “Deep directed generative models with energy-based proba- bility estimation,” arXiv preprint arXiv:1606.03439, 2016.
    [24] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
    [25] Y. Lu, S.-C. Zhu, and Y. N. Wu, “Learning frame models using cnn filters,” arXiv preprint arXiv:1509.08379, 2015.
    [26] A. Dosovitskiy, J. Tobias Springenberg, and T. Brox, “Learning to generate chairs with convolutional neural networks,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, 2015, pp. 1538–1546.
    [27] A. Adam, E. Rivlin, and I. Shimshoni, “Robust fragments-based tracking using the integral histogram,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1. IEEE, 2006, pp. 798–805.
    [28] Y. Ma, X. Gu, and Y. Wang, “Histogram similarity measure using variable bin size distance,” Computer Vision and Image Understanding, vol. 114, no. 8, pp. 981–989, 2010.
    [29] L.-M. Po and K.-M. Wong, “A new palette histogram similarity measure for mpeg-7 dominant color descriptor,” in 2004 International Conference on Image Processing, 2004. ICIP’04., vol. 3. IEEE, 2004, pp. 1533–1536.
    [30] N. Krawetz, “Kind of like that,” The Hacker Factor Blog, 2013.
    [31] ——, “Looks like it,” The Hacker Factor Blog, 2011.
    [32] C. Zauner, “Implementation and benchmarking of perceptual image hash functions,” 2010.
    [33] K. R. Rao and P. Yip, Discrete cosine transform: algorithms, advantages, applica-
    tions. Academic press, 2014.
    [34] C.-H. Weng, “Github of our work, CycleCoopNet,” https://github.com/howarder3/ CycleCoopNet, 2019.
    描述: 碩士
    國立政治大學
    資訊管理學系
    106356034
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0106356034
    資料類型: thesis
    DOI: 10.6814/NCCU201901290
    顯示於類別:[資訊管理學系] 學位論文

    文件中的檔案:

    檔案 大小格式瀏覽次數
    603401.pdf7761KbAdobe PDF2127檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋