Development of a graph neural network for processing text data
- Autores: Zakharova O.I.1, Kuleshov S.V.2
-
Afiliações:
- Povolzhskiy State University of Telecommunications and Informatics
- St. Petersburg Federal Research Center of the Russian Academy of Sciences
- Edição: Nº 4 (2024)
- Páginas: 67-78
- Seção: Analysis of Textual and Graphical Information
- URL: https://bakhtiniada.ru/2071-8594/article/view/278266
- DOI: https://doi.org/10.14357/20718594240406
- EDN: https://elibrary.ru/JFRHRU
- ID: 278266
Citar
Texto integral
Resumo
Currently, one of the main directions of information technology development is graph-based modeling of complex data structures and machine learning approaches based on graph representations. The article deals with graph modeling of text data using neural networks. The aim of the paper is to develop a graph neural network for classification and clustering of texts based on semantic content. Texts are represented as graphs, where vertices are concepts and edges are links between them. Public text corpora in Russian and English were used. A new approach to analyzing text data was proposed based on their representation in the form of oriented weighted graphs and processing by graph neural networks. The graphs were processed by a neural network with three layers of graph convolutions. The obtained results show an accuracy of more than 90% for topic group classification and text clustering, outperforming RNN, CNN and doc2vec methods.
Sobre autores
Oksana Zakharova
Povolzhskiy State University of Telecommunications and Informatics
Autor responsável pela correspondência
Email: o.zaharova@psuti.ru
Candidate of technical sciences, docent, Deputy Head, Research Laboratory of Artificial Intelligence, Associate Professor
Rússia, SamaraSergey Kuleshov
St. Petersburg Federal Research Center of the Russian Academy of Sciences
Email: kuleshov@iias.spb.su
Doctor of technical sciences, Chief Researcher, Deputy Director for Research
Rússia, Saint-PetersburgBibliografia
- Feng Z., Guo D. et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages // In Proc. of the Conference on Empirical Methods in Natural Language Processing. 2020. Р. 1536-1547.
- Kanade A. Maniatis P. et al. Learning and evaluating contextual embedding of source code // In Proc. of the 37th International Conference on Machine Learning. 2020. Р. 5110-5121. Hierarchical quasi-neural network data aggregation to build a university research and innovation management system.
- Ed. by V. Murgul, V. Pukhkal // International Scientific Conference Energy Management of Municipal Facilities and Sustainable Energy Technologies. EMMFT 2019. Advances in Intelligent Systems and Computing. V. 1259. Springer, Cham.
- Bubnov I. Chto takoe modulnoe programmirovanie i komu ono nujno [What is modular programming and who needs it] // Geek-Brains. URL: https://geekbrains.ru/posts/module_programming/.
- Vidmant O.S. Prognozirovanie volatilnosti finansovih vremennih ryadov ansamblyami derevev [Forecasting the volatility of financial time series using ensembles of trees] // Mir novoi ekonomiki [World of New Economics]. 2018. No 12. V. 3. P. 8289.
- Gafarov F. M., Galimyanov A. F. Iskusstvennie neironnie seti i prilojeniya [Artificial neural networks and applications]: textbook. allowance. Kazan: Kazan Publishing House. Univ., 2018. 121 p.
- Gorbatkov S. A., Farkhieva S. A. Gibridnii metod strukturnogo sinteza i regulyarizacii neirosetevoi dinamicheskoi modeli bankrotstv korporacii [Hybrid method of structural synthesis and regularization of a neural network dynamic model of corporate bankruptcies] // Vestnik evraziiskoi nauki [Bulletin of Eurasian Science]. 2020. No 3. P. 90-99.
- Gorbatkov S. A., Farkhieva S. A., Gorbatkova E. Yu. Metod agregirovaniya peremennih neirosetevoi modeli v obratnih zadachah vosstanovleniya zavisimosti v usloviyah visokoi razmernosti prostranstva priznakov i zashumlennosti dannih [Method for aggregating variables of a neural network model in inverse problems of dependence recovery in conditions of high dimensionality of the feature space and noisy data] // Vestnik evraziiskoi nauki [Bulletin of Eurasian Science]. 2018. No 1. 12 p.
- Kamaeva A. A. Sovremennoe sostoyanie iskusstvennih neironnih setei [Current state of artificial neural networks] // Innovacii. Nauka. Obrazovanie [Innovations. The science. Education]. 2020. No 16. P. 377-387.
- Kirichenko A. A. Neiropaketi - sovremennii intellektualnii instrument issledovatelya [Neuropackets - a modern intellectual tool for researchers]: textbook. allowance. M., 2013. 297 p.
- Kovartsev A. N., Zhidchenko V. V., Popova-Kovartseva D. Metodi i tehnologii vizualnogo programmirovaniya [Methods and technologies of visual programming]: text-book. allowance. Samara: Etching, 2017. 197 p.
- Kovartsev A. N., Zhidchenko V. V., Popova-Kovartseva D. A., Abolmasov P. V. Principi postroeniya tehnologii grafosimvolicheskogo programmirovaniya [Principles of graphosymbolic programming technology construction] // Otkritie semanticheskie tehnologii proektirovaniya intellektualnih sistem [Open semantic technologies for designing intelligent systems]. 2013. No 3. Р. 195-204.
- Kolyshkin A. V., Gilenko E. V., Dovzhenko S. E., Zhilkin S. A., Chov S. E. Prognozirovanie finansovoi nesostoyatelnosti predpriyatii [Forecasting the financial insolvency of enterprises] // Vestnik Sankt_Peterburgskogo universiteta. Ekonomika. [Bulletin of St. Petersburg University. Economy]. 2014. No 2. Р. 122-142.
- Kramarov S. O., Arapova E. A. Metodika ocenki finansovo_ekonomicheskogo sostoyaniya otrasli regiona na osnove algoritma nechetko_mnojestvennogo agregirovaniya finansovo_ekonomicheskih pokazatelei [Methodology for assessing the financial and economic state of the region’s industry based on an algorithm for fuzzy multiple aggregation of financial and economic indicators] // Vestnik SurGU [Bulletin of Surgut State University]. 2022. No 3 (37). Р. 23-34.
- Morozova T. Yu., Burlachenko T. B. Reshenie zadachi prognozirovaniya v sistemah s bolshoi stepenyu neopredelennosti [Solution of the forecasting problem in systems with a large degree of uncertainty] // Izvestia TRTU. 2006. No 9-2 (64). P. 169.
- Yamashkin S. A. Yamashkin A. A., Zanozin V. V. Formirovanie repozitoriya glubokih neironnih setei v sisteme cifrovoi infrastrukturi prostranstvennih dannih [Formation of a repository of deep neural networks in the system of digital infrastructure of spatial data] // Potencial intellektualno odarennoi molodeji _ razvitiyu nauki i obrazovaniya _ materiali IX Mejdunar. nauch. foruma molodih uchenih_ innovato_rov_ studentov i shkolnikov [Potential of intellectually gifted youth - development of science and education: materials of the IX International. scientific forum of young scientists, innovators, students and schoolchildren]. Ed. by T.V. Zolina. Astrakhan. 2020. Р. 370-375.
- Zulkarneev R.H., Yusupova N.I., Smetanina O.N., Gayanova M.M., Vulfin A.M. Metodi i modeli izvlecheniya znanii iz medicinskih dokumentov [Methods and models of knowledge extraction from medical documents] // Informatika i avtomatizaciya [Informatics and Automation]. 2022. V. 21. No 6. doi: 10.15622/ia.21.6.4.
- Programmnie sistemi i instrumenti. Tematicheskii sbornik [Program systems and tools: Thematic collection]. Edited by R.L. Smelyansky. Moscow: Izdatelskii otdel fakulteta VMK MGU imeni M.V. Lomonosova [Publishing Department of the Faculty of VMK of Lomonosov Moscow State University] (license ID № 05899 from 24.09. 2001); MAKS Press, 2023. No 23. P. 140. ISBN 978-5-89407-638-6 (Lomonosov Moscow State University VMK Department) ISBN 978-5-317-07118-9 (MAKS Press) https://doi.org/10.29003/m3791.978-5-317-07118-9.
- Kuleshov S.V., Zaitseva A.A., Levashkin S.P. Tehnologii i principi sbora i obrabotki nestrukturirovannih raspredelennih dannih s uchetom sovremennih osobennostei predostavleniya media – kontenta [Technologies and principles of collection and processing of unstructured distributed data taking into account modern peculiarities of media-content provision] // Informatizaciya i svyaz [Informatization and communication]. 2020. No 5. P. 22-28.
- Kuleshov S.V., Zaitseva A.A., Levashkin S.P. Obrabotka nestrukturirovannoi informacii_ poluchaemo iz interneta_ s ispolzovaniem associativno – ontologicheskogo podhoda [Processing of unstructured information from the Internet using associative-ontological approach] // V sbornike Problemi tehniki i tehnologii telekommunikacii PtiTT-2020. XXII Mezhdunarodnaya nauchno – tehnicheskaya konferenciya. IV Nauchnii forum Telekommunikacii Teoriya i Tehnologii TTT-2020 [In Collection: Problems of Techniques and Technologies of Telecommunications PT&T-2020. XXII International Scientific and Technical Conference, IV Scientific Forum Telecommunications: Theory and Technologies TTT-2020]. Samara, 2020. P. 7-11.
- Aleksandrov V.V., Kuleshov S.V., Tsvetkov O.V., Levashkin S.P. Koncepciya razvitiya infokommunikacii v Internet srede [Concept of development of infocommunication in the Internet environment] // Informacionno-izmeritelnie i upravlyayuschie sistemi [Information-measuring and control systems]. No 4. V. 7. 2009. P. 5-10.
- Aleksandrov V.V., Kuleshov S.V., Tsvetkov O.V., Levashkin S.P. Infologicheskaya sistema formirovaniya semanticheskih ponyatii invariantnih po otnosheniyu k estestvenno-yazikovomu okrujeniyu v Internet srede [Infological system of semantic concepts formation invariant to the natural-language environment in the Internet environment] // Program-miruemie infokommunikacionnie tehnologii [Programmable info-communication technologies]. Collection of articles. Edited by V.V.Aleksandrov, V.A.Sarychev. Moscow: Radiotekhnika. 2009. P. 5-10.
- Zulkarneev R.H., Yusupova N.I., Smetanina O.N., Gayanova M.M., Vulfin A.M. Metodi i modeli izvlecheniya znanii iz medicinskih dokumentov [Methods and models of knowledge extraction from medical documents] // Informatika i avtomatizaciya [Informatics and Automation]. 2022. V. 21 No 6. doi: 10.15622/ia.21.6.4.
- Kuleshov S. V., Zaitseva A., Aksenov A. Yu. Formirovanie yadra dokumentov v sistemah internet_monitoringa v usloviyah resursnih ogranichenii [Formation of the document core in the Internet-monitoring systems under resource constraints] // Izv. vuzov. Priborostroenie [Izv. of universities. Instrumentation]. 2022. V. 65. No 11. P. 826-832. doi: 10.17586/0021-3454-2022-65-11-826-832.
Arquivos suplementares
