Neural network method for character sequence generation for text images training dataset synthesis

P. K. Zlobin; Злобин Павел Константинович; Y. S. Chernyshova; Чернышова Юлия Сергеевна; A. V. Sheshkus; Шешкус Александр Владимирович; V. V. Arlazarov; Арлазаров Владимир Викторович

doi:10.14357/20790279230204

Neural network method for character sequence generation for text images training dataset synthesis

Autores: Zlobin P.K.¹^,2, Chernyshova Y.S.¹^,2, Sheshkus A.V.¹^,2, Arlazarov V.V.¹^,2
Afiliações:
1. Academy of Sciences
2. Smart Engines Service, LLC
Edição: Volume 73, Nº 2 (2023)
Páginas: 40-49
Seção: Data Mining and Pattern Recognition
URL: https://bakhtiniada.ru/2079-0279/article/view/286974
DOI: https://doi.org/10.14357/20790279230204
ID: 286974

Citar

Texto integral

Resumo
Sobre autores
Bibliografia
Arquivos suplementares
Estatísticas

Resumo

The size of the training sample is an important factor in solving optical character recognition tasks. Notably, the majority of the research focuses on increasing the variety of distributions that are applied to the images. Yet the internal structure of the textual information also affects the accuracy of the resulting model. We propose a neural network based text generation method for the creation of a synthetic training dataset of images with annotations, in which we propose to operate with groups of characters – alphabetic clusters, and use the sequence of clusters to predict the following character. The proposed cluster approach allows us to create specific sequences that retain the main properties of the target language, but do not contain a full language model. Since the proposed method works on a small number of clusters, we can use a small training set and a light neural network to generate text. The results of experiments with three open datasets of identity document images demonstrate the effectiveness of our method and the possibility of improving modern results for target fields.

Palavras-chave

training data, neural network, OCR, text generation, data synthesis

Sobre autores

P. Zlobin

Academy of Sciences; Smart Engines Service, LLC

Email: p.zlobin@smartengines.com

PhD student

Rússia, 44/2 Vavilova str., Moscow, 119333; Moscow

Y. Chernyshova

Academy of Sciences; Smart Engines Service, LLC

Autor responsável pela correspondência
Email: chernyshova@smartengines.com

Mathematician

Rússia, 44/2 Vavilova str., Moscow, 119333; Moscow

A. Sheshkus

Academy of Sciences; Smart Engines Service, LLC

Email: asheshkus@smartengines.com

Researcher

Rússia, 44/2 Vavilova str., Moscow, 119333; Moscow

V. Arlazarov

Academy of Sciences; Smart Engines Service, LLC

Email: vva777@gmail.com

Head of the Department for the Federal Research Center, PhD

Rússia, 44/2 Vavilova str., Moscow, 119333; Moscow

Bibliografia

Nikolaev D.P., Polevoy D.V. and Tarasova N.A. “Sintez obuchayuschey vyborki v zadache raspoznavaniya teksta v trekhmernom prostranstve,” ITiVS (3), 82–88 (2014).
Arlazarov V.V., Bulatov K., Chernov T. and Arlazarov V. L. “MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream,” Computer Optics 43(5), 818–824 (2019). doi: 10.18287/2412-6179-2019-43-5-818-824.
Naiemi F., Ghods V., Khalesi H. An efficient character recognition method using enhanced HOG for spam image detection, Soft Computing. 23 (2019)
Bulatov K., Arlazarov V. V., Chernov T., Slavin O., Nikolaev D. Smart IDReader: Document Recognition in Video Stream // ICDAR 2017 / Manhattan, New York, U.S.: Institute of Electrical and Electronics Engineers Inc. (IEEE). 2017. Т. 6. С. 39-44. doi: 10.1109/ICDAR.2017.347.
Arlazarov V.L., Arlazarov V.V., Bulatov K.B., Chernov T.S., Nikolaev D.P., Polevoy D.V., Sheshkus A.V., Skoryukina N.S., Slavin O.A., Usilin S.A. Mobile ID Document Recognition-Coarse-to-Fine Approach// Pattern Recognit. Image Anal. 2022. Т. 32. № 1. С. 89-108. doi: 10.1134/S1054661822010023.
Chernyshova Y.S., Sheshkus A.V., Arlazarov V.V. Two-step CNN framework for text line recognition in camera-captured images // IEEE Access. 2020. Т. 8. С. 32587-32600. doi: 10.1109/ACCESS.2020.2974051.
Jaderberg M., Simonyan K., Vedaldi A. and Zisserman A. “Synthetic data and artificial neural networks for natural scene text recognition,” in Workshop on Deep Learning, NIPS. 2014.
Hula J., Mojzˇ´ısek D., Adamczyk D. and Cech R. “Acquiring Custom OCR System with Minimal Manual Annotaˇ tion,” in 2020 IEEE Third International Conference on Data Stream Mining Processing (DSMP). 2020. P. 231–236.
Ren X., Chen K. and Sun J. “A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine,” CoRR abs/1604.01891. 2016.
Chernyshova Y.S., Gayer A.V. and Sheshkus A.V. “Generation method of synthetic training data for mobile OCR system,” in ICMV 2017, A. Verikas, Radeva, D. Nikolaev, and J. Zhou, eds., 10696, 1–7, SPIE (Apr. 2018). doi: 10.1117/12.2310119.
Krishnan P. and Jawahar C.V. “Generating Synthetic Data for Text Recognition,” CoRR abs/1608.04224. 2016.
Liu Y., Wang Z., Jin H. and Wassell I. “Synthetically supervised feature learning for scene text recognition,” in Proceedings of the European Conference on Computer Vision (ECCV). 2018. P. 435–451.
Schwarcz S., Gorban A., Serra X.G. and Lee D.-S. “Adapting Style and Content for Attended Text Sequence Recognition,” in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). 2020. 1586–1595 p.
Namysl M. and Konya I. “Efficient, Lexicon-Free OCR using Deep Learning,” 2019 International Conference on Document Analysis and Recognition (ICDAR). 2019. P. 295-301. DOI: 10.1109/ ICDAR.2019.00055.
Jaderberg M., Simonyan K., Vedaldi A. and Zisserman A. “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”, Workshop on Deep Learning, NIPS. 2014.
Adnan Ul-Hasan and Thomas M. Breuel. 2013. Can we build language-independent OCR using LSTM networks? In Proceedings of the 4th International Workshop on Multilingual OCR (MOCR ‘13). Association for Computing Machinery, New York, NY, USA, Article 9, 1–5. https://doi. org/10.1145/2505377.2505394
“Tesseract OCR.” https://github.com/tesseract-ocr/tesseract. Online, Accessed: 11.08.2021.
Touseef Iqbal, Shaima Qureshi. The survey: Text generation models in deep learning, Journal of King Saud University – Computer and Information Sciences, Volume 34, Issue 6, Part A. 2022. 2515-2528. https://doi.org/10.1016/j.jksuci.2020.04.001.
Radford A., Wu J., Child R., Luan D., Amodei D. and Sutskever I. 2019. Language models are unsupervised multitask learners. OpenAI blog. 1(8). P. 9.
Gayer A.V., Sheshkus A.V., Nikolaev D.P. and Arlazarov V.V. “Improvement of U-Net Architecture for Image Binarization with Activation Functions Replacement,” in ICMV 2020, 11605, SPIE (Jan. 2021). doi: 10.1117/12.2587027.
ICAO Doc 9303 Part 3: Specifications Common to all MRTDs, Machine Readable Travel Documents – International Civil Aviation Organization. 2015.
Hartl, C. Arth, and D. Schmalstieg. “Real-time Detection and Recognition of Machine-Readable Zones with Mobile Devices,” VISAPP 2015 – 10th International Conference on Computer Vision Theory and Applications; VISIGRAPP, Proceedings 3. 2015. P. 79–87.
Bulatov K., Matalov D. and Arlazarov V.V. “MIDV-2019: Challenges of the Modern Mobile-Based Document OCR,” in ICMV 2019, W. Osten, D. Nikolaev, and J. Zhou, eds., 11433, 1–6, SPIE (Jan. 2020). doi: 10.1117/12.2558438.
Chernyshova Y.S., Emelianova E.V., Sheshkus A.V. and Arlazarov V.V. “MIDV-LAIT: a challenging dataset for recognition of IDs with Perso-Arabic, Thai, and Indian scripts,” in ICDAR. 2021.P. 1–15.

Arquivos suplementares

Ação

1. JATS XML

Baixar

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Volume 75, Nº 2 (2025)

Volume 75, Nº 2 (2025)

Neural network method for character sequence generation for text images training dataset synthesis

Texto integral

Resumo

Palavras-chave

Sobre autores

P. Zlobin

Y. Chernyshova

A. Sheshkus

V. Arlazarov

Bibliografia

Arquivos suplementares