A Robust Control Algorithm for Single Input Single Output Dynamic Object Based on Table-Based Q-Method of Reinforcement Learning

M. Yu Medvedev; Медведев М. Ю; V. Kh Pshikhopov; Пшихопов В. Х; I. D Evdokimov; Евдокимов И. Д

doi:10.15622/ia.24.3.1

A Robust Control Algorithm for Single Input Single Output Dynamic Object Based on Table-Based Q-Method of Reinforcement Learning

Authors: Medvedev M.Y.¹, Pshikhopov V.K.¹, Evdokimov I.D¹
Affiliations:
1. Southern Federal University (SFedU)
Issue: Vol 24, No 3 (2025)
Pages: 717-744
Section: Robotics, automation and control systems
URL: https://bakhtiniada.ru/2713-3192/article/view/350713
DOI: https://doi.org/10.15622/ia.24.3.1
ID: 350713

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

The article provides an overview in the field of dynamic object control systems based on reinforcement learning. Based on the analysis, it is concluded that the development of control methods based on reinforcement learning is relevant. The article proposes an intelligent algorithm for robust control of stable dynamic objects with one input and one output, based on the tabular Q-learning method of zero order. The algorithm stabilizes the output value of the control object with a given error if the parameters and external disturbances of the object are piecewise constant unknown quantities, and the state vector is measurable. The novelty of the proposed algorithm lies in a new incremental method of control formation, which allows, based on a set of three possible actions, to stabilize the control object. The proposed method of forming a set of control actions makes it possible to ensure the required accuracy of stabilizing the output of an object by changing the amplitude of the control increment. The proposed algorithm has high computational efficiency. After training, the control calculation is reduced to calculating indexes based on measurement results, reading data from memory based on calculated indexes, and finding the maximum value in a small vector. For a discrete description of the control object, the conditions of convergence of the learning algorithm and the limitation of the control error are investigated. The developed algorithm is demonstrated by the example of the synthesis of robust control of a DC motor with independent excitation. In the course of numerical simulation, the quality of a closed system is investigated when the parameters and the control action change. The analysis of the simulation results allows us to draw conclusions about the effectiveness of the synthesized algorithm. The article also provides the results of a real experiment that demonstrate the technical feasibility of the algorithm obtained. This issue is important, since the analysis of sources shows an almost complete lack of technical implementation of control systems for dynamic objects synthesized using reinforcement learning methods.

Keywords

robust control, reinforcement learning, Q-learning algorithm, dynamic objects, uncertain parameters, convergence of the learning algorithm

References

Sutton R., Barto A. Reinforcement Learning. An Introduction. Second Edition. Cambridge: MIT Press, 2018. vol. 1. no. 1. pp. 9–11.
Sutton R.S., Barto A.G., Williams R.J. Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine. 2002. vol. 12(2). pp. 19–22.
Pshikhopov V., Medvedev M. Multi-Loop Adaptive Control of Mobile Objects in Solving Trajectory Tracking Tasks. Automation and Remote Control. 2020. vol. 81. pp. 2078–2093. doi: 10.1134/S0005117920110090.
Shih P., Kaul B., Jagannathan S., Drallmeier J. Near Optimal Output-Feedback Control of Nonlinear Discrete-Time Systems in Nonstrict Feedback Form with Application to Engines. IEEE International Joint Conference on Neural Networks. 2007. pp. 396–401.
Xu B., Yang C., Shi Z. Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique. IEEE Transactions on Neural Networks and Learning Systems. 2014. vol. 25(3). pp. 635–641. doi: 10.1109/TNNLS.2013.2292704.
Mu C., Ni Z., Sun C., He H. Data-Driven Tracking Control with Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems. IEEE Transactions on Cybernetics. 2016. vol. 47(6). pp. 1460–1470.
Wang A., Liao X., Dong T. Event-Driven Optimal Control for Uncertain Nonlinear Systems with External Disturbance via Adaptive Dynamic Programming. Neurocomputing. 2018. vol. 281. pp. 188–195.
Kim J.W., Oh T.H., Son S.H., Jeong D.H., Lee J.M. Convergence Analysis of the Deep Neural Networks Based Globalized Dual Heuristic Programming. Automatica. 2020. vol. 122.
Luo B., Yang Y., Liu D., Wu H.-N. Event-Triggered Optimal Control with Performance Guarantees Using Adaptive Dynamic Programming. IEEE Transactions on Neural Networks and Learning Systems. 2019. vol. 31(1). pp. 76–88.
Yang X., Xu M., Wei Q. Dynamic Event-Sampled Control of Interconnected Nonlinear Systems Using Reinforcement Learning. IEEE Transactions on Neural Networks and Learning Systems. 2022. vol. 35(1). pp. 923–937. doi: 10.1109/TNNLS.2022.3178017.
Zhang H., Zhao X., Wang H., Zong G., Xu N. Hierarchical Sliding-Mode Surface-Based Adaptive Actor-Critic Optimal Control for Switched Nonlinear Systems With Unknown Perturbation. IEEE Transactions on Neural Networks and Learning Systems. 2022. vol. 35(2). pp. 1559–1571. doi: 10.1109/TNNLS.2022.3183991.
Dong C., Chen L., Dai S.-L. Performance-Guaranteed Adaptive Optimized Control of Intelligent Surface Vehicle Using Reinforcement Learning. IEEE Transactions on Intelligent Vehicles. 2023. vol. 9. no. 2. pp. 3581–3592. doi: 10.1109/TIV.2023.3338486.
Dao P.N., Phung M.H. Nonlinear Robust Integral Based Actor-Critic Reinforcement Learning Control for a Perturbed Three-Wheeled Mobile Robot with Mecanum Wheels. Computers and Electrical Engineering. 2025. vol. 121. doi: 10.1016/j.compeleceng.2024.109870.
Berkenkamp F., Turchetta M., Schoellig A., Krause A. Safe Model-Based Reinforcement Learning with Stability Guarantees. Advances in Neural Information Processing Systems. 2017. vol. 30. pp. 908–918.
Thananjeyan B., Balakrishna A., Rosolia U., Li F., McAllister R., Gonzalez J.E., Levine S., Borrelli F., Goldberg K. Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks. IEEE Robotics and Automation Letters. 2020. vol. 5(2). pp. 3612–3619.
Zanon M., Gros S. Safe Reinforcement Learning Using Robust MPC. IEEE Transactions on Automatic Control. 2020. vol. 66(8). pp. 3638–3652. doi: 10.1109/TAC.2020.3024161.
Cheng R., Orosz G., Murray R.M., Burdick J.W. End-to End Safe Reinforcement Learning through Barrier Functions for Safety Critical Continuous Control Tasks. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-19). 2019. vol. 33. no. 01. pp. 3387–3395.
Choi J., Castaneda F., Tomlin C.J., Sreenath K. Reinforcement Learning for Safety-Critical Control Under Model Uncertainty, Using Control Lyapunov Functions and Control Barrier Functions. Conference Robotics: Science and Systems. 2020.
Han M., Zhang L., Wang J., Pan W. Actor-Critic Reinforcement Learning for Control With Stability Guarantee. IEEE Robotics and Automation Letters. 2020. vol. 5(4). pp. 6217–6224.
Боровик В.С., Шидловский С.В. Обучение с подкреплением в системах управления объектами с транспортным запаздыванием. Автометрия. 2021. Т. 57(3). С. 48–57.
Галяев А.А., Медведев А.И., Насонов И.А. Нейросетевой алгоритм перехвата машиной Дубинса целей, движущихся по известным траекториям. Автоматика и телемеханика. 2023. № 3. С. 3–21.
Хапкин Д.Л., Феофилов С.В. Синтез устойчивых нейросетевых регуляторов для объектов с ограничителями в условиях неполной информации. Мехатроника, автоматизация, управление. 2024. Т. 25(7). С. 345–353. doi: 10.17587/mau.25.345-353.
Фаворская М.Н., Пахирка А.И. Восстановление аэрофотоснимков сверхвысокого разрешения с учетом семантических особенностей. Информатика и автоматизация. 2024. Т. 23(4). С. 1047–1076. doi: 10.15622/ia.23.4.5.
Чен Х., Игнатьева С.А., Богуш Р.П., Абламейко С.В. Повторная идентификация людей в системах видеонаблюдения с использованием глубокого обучения: анализ существующих методов. Автоматика и телемеханика. 2023. № 5. C. 61–112. doi: 10.31857/S0005231023050057.
Понимаш З.А., Потанин М.В. Метод и алгоритм извлечения признаков из цифровых сигналов на базе нейросетей трансформер. Известия ЮФУ. Технические науки. 2024. № 6. C. 52–64. doi: 10.18522/2311-3103-2024-6-52-64.
Голубинский А.Н., Толстых А.А., Толстых М.Ю. Автоматическая генерация аннотаций научных статей на основе больших языковых моделей. Информатика и автоматизация. 2025. Т. 24(1). С. 275–301. doi: 10.15622/ia.24.1.10.
Hamdan N., Medvedev M., Pshikhopov V. Method of Motion Path Planning Based on a Deep Neural Network with Vector Input. Mekhatronika, Avtomatizatsiya, Upravlenie. 2024. vol. 25(11). pp. 559–567. doi: 10.17587/mau.25.559-567.
Gaiduk A.R., Martjanov O.V., Medvedev M.Yu., Pshikhopov V.Kh., Hamdan N., Farhood A. Neural network based control system for robots group operating in 2-d uncertain environment. Mekhatronika, Avtomatizatsiya, Upravlenie. 2020. vol. 21(8). pp. 470–479. doi: 10.17587/mau.21.470-479.
Жилов Р.А. Постройка ПИД-регулятора с использованием нейронных сетей // Известия Кабардино-Балкарского научного центра РАН. 2022. № 5(109). С. 38–47. doi: 10.35330/1991-6639-2022-5-109-38-47.
Карапеев А.Н., Косенко Е.Ю., Медведев М.Ю., Пшихопов В.Х. Исследование интеллектуального адаптивного алгоритма управления на базе метода обучения с подкреплением. Известия ЮФУ. Технические науки. 2025. № 2. С. 162–175.

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

Vol 24, No 5 (2025)

Vol 24, No 5 (2025)

A Robust Control Algorithm for Single Input Single Output Dynamic Object Based on Table-Based Q-Method of Reinforcement Learning

Full Text

Abstract

Keywords

About the authors

M. Yu Medvedev

V. Kh Pshikhopov

I. D Evdokimov

References

Supplementary files