Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/125644
|
Title: | 基於深度學習之行草中文古文辨識 Cursive Chinese Calligraphy Recognition For Historical Documents—A Deep Learning Approach |
Authors: | 戎諒 Jung, Liang |
Contributors: | 廖文宏 Liao, Wen-Hung 戎諒 Jung, Liang |
Keywords: | 草書中文字 文字辨識 深度學習 Cursive Chinese calligraphy Text recognition Deep learning |
Date: | 2019 |
Issue Date: | 2019-09-05 16:15:17 (UTC+8) |
Abstract: | 書法是中國古代重要的書寫工具,亦是一種藝術形式。其中,草書書法在規範與結構上相較其他書體更為自由且能顯露出書法家個性。然而,此一藝術體現使得草書書法的文字更難以被辨識,即便是於人文專家,進行歷史文本數位化作業的仍是一項曠日廢時的工作。然而,光學文字辨識 (OCR)在結構簡化、風格迥異的草書中文字上的效果無法滿足實務需求。因此,協助草書書法辨識的輔助工具需求被出。
在這項研究裡,我們使用基於深度學習的方法進行草書書法辨識的研究。目前並沒有一套公開可被檢視的草書中文字資料集,我們經過網路蒐集並以人力進行資料整理後,彙整了一套包含 5301字、42862張圖片的草書中文字資料集。
由於針對草書書法的相關研究相當稀少,我們將草書辨識延伸思考為手寫中文字辨識的子問題並 探討相關研究 。我們以過去在手寫中文字辨識上表現優異的M6網路架構為基礎,提出加入 Batch Normalization與額外的全連接層的EM6、由DenseNet-121簡化而來的 DenseNet-18,以及考慮中文手寫字特性的三叉網路框架。雖然這幾種架構在訓練階段的準確度相近,但 EM6網路有最高的測試準確度。我們最後選擇使用 EM6模型,以二南堂法帖作為測試資料,在18668張測試圖片的辨識任務上達到64.3%的Top-1準確度及80.5% Top-5準確度。 Calligraphy is one of the most important writing tool as well as cultural art in ancient China. Compared with other calligraphy styles, the cursive script is least restricted and oftentimes exhibits the personality of calligraphers. However, this style-oriented expression makes the cursive script hard to recognize even for trained experts. Furthermore, optical character recognition (OCR) systems are designed for printed texts and perform poorly on cursive scripts. The call for auxiliary tools for cursive Chinese calligraphy text recognition has thus arisen. In this study, we employ the deep learning-based approach to the recognition of cursive Chinese calligraphy. As there are currently no open datasets for cursive Chinese calligraphy, we collected 42862 images of 5301 different Chinese characters written in cursive format to train our neural network. Since there exists little previous research on this topic, we consider the cursive Chinese calligraphy recognition task as a variant of offline handwriting recognition. We proposed and investigated three different neural network architectures, namely, Enhanced M6 (EM6), DenseNet-18, and 3-way neural network. EM6 is constructed by adding batch normalization and an additional fully connected layer to decrease the impact of overfitting; The DenseNet-18 is simplified from DensetNet-121 with shallower network depth. The 3-way neural network is devised based on our observation of Chinese writing. These networks achieved similar performance during the training phase. However, the EM6 outperforms the others in terms of test accuracy and hence becomes our model of choice. We evaluate the proposed EM6 model on 18668 cursive Chinese calligraphy images extracted from BiSouth model calligraphy and achieve 64.3% Top-1 accuracy and 80.5% Top-5 accuracy, respectively. |
Reference: | [1] Ivakhnenko, Alekseĭ Grigorʹevich, and Valentin Grigorévich Lapa. Cybernetic predicting devices. No. TR-EE66-5. PURDUE UNIV LAFAYETTE IND SCHOOL OF ELECTRICAL ENGINEERING, 1966. [2] ImageNet. http://www.image-net.org/ [3] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/ [4] Liu, Cheng-Lin, et al. "CASIA online and offline Chinese handwriting databases." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011 [5] Huang, Yi-Fan. “Recognition of low resolution text using deep learning approach”. MS Thesis. National Chengchi University, , <https://hdl.handle.net/11296/pz3mh8> [6] De Mulder, Wim, Steven Bethard, and Marie-Francine Moens. "A survey on the application of recurrent neural networks to statistical language modeling." Computer Speech & Language 30.1 (2015): 61-98. [7] Deng, Jia, et al. "Imagenet large scale visual recognition competition 2012 (ILSVRC2012)." See net. org/challenges/LSVRC (2012). [8] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. [9] Park, E., et al. "ILSVRC-2017." URL http://www. image-net. org/challenges/LSVRC/2017 (2017). [10] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." arXiv preprint arXiv:1709.01507 7 (2017). [11] Waibel, Alexander, et al. "Phoneme recognition using time-delay neural networks." Readings in speech recognition. 1990. 393-404. [12] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324. [13] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). [14] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [15] Huang, Gao, et al. "Densely connected convolutional networks." CVPR. Vol. 1. No. 2. 2017. [16] Karpathy, Andrej, F. F. Li, and J. Johnson. "CS231n: Convolutional neural networks for visual recognition, 2016." URL http://cs231n. github. io (2017). [17] Cireşan, Dan, Ueli Meier, and Jürgen Schmidhuber. "Multi-column deep neural networks for image classification." arXiv preprint arXiv:1202.2745 (2012). [18] Cireşan, Dan, and Ueli Meier. "Multi-column deep neural networks for offline handwritten Chinese character classification." Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015. [19] C.-L. Liu, F. Yin, Q.-F. Wang, D.-H. Wang, ICDAR 2011 Chinese handwriting recognition competition, in: Proceedings of the 11th ICDAR, Beijing, China, 2011, pp. 1464–1469. [20] Yin, Fei, et al. "ICDAR 2013 Chinese handwriting recognition competition." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013. [21] Zhang, Yuhao. "Deep convolutional network for handwritten chinese character recognition." Computer Science Department, Stanford University (2015). [22] Chen, Ying-Zhoug. “Segmentation and Recognition of Chinese Characters in Cursive Script in Calligraphy Documents”. MS Thesis. National Chao Tung University, 2001, <https://hdl.handle.net/11296/3x5597> [23] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML Deep Learning Workshop. Vol. 2. 2015. [24] Shufa,https://shufa.supfree.net/dity.asp [25] Abadi, Martín, et al. "Tensorflow: a system for large-scale machine learning." OSDI. Vol. 16. 2016. [26] Paszke, Adam, et al. "Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration." (2017). [27] Chollet, François. "Keras." (2015). [28] Seide, Frank, and Amit Agarwal. "CNTK: Microsoft`s open-source deep-learning toolkit." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. [29] 終身教育司-教育部4808個常用字, https://ws.moe.edu.tw/001/Upload/6/relfile/6490/38921/d190213c-7af8-45bf-b70e-48b4469aad72.pdf [30] Jones, Eric, Travis Oliphant, and Pearu Peterson. "{SciPy}: Open source scientific tools for {Python}." (2014). [31] 新北市教育局-自編國小一至六年級生字簿, https://eword.ntpc.edu.tw/ [32] Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [33] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015). [34] Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): e0130140. [35] Alber, Maximilian, et al. "iNNvestigate neural networks!." Journal of Machine Learning Research 20.93 (2019): 1-8. [36] Zhang, T. Y., and Ching Y. Suen. "A fast parallel algorithm for thinning digital patterns." Communications of the ACM 27.3 (1984): 236-239. |
Description: | 碩士 國立政治大學 資訊科學系 106753027 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0106753027 |
Data Type: | thesis |
DOI: | 10.6814/NCCU201900798 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
302701.pdf | 7357Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|