Application of Mathematical Programming for Selection the Optimal Structures of Multivariate Linear Regressions
- 作者: Bazilevskiy M.P.1
-
隶属关系:
- Irkutsk State Transport University
- 期: 编号 4 (2024)
- 页面: 32-45
- 栏目: Information processing and data analysis
- URL: https://bakhtiniada.ru/2071-8632/article/view/286407
- DOI: https://doi.org/10.14357/20718632240404
- EDN: https://elibrary.ru/BBFOVP
- ID: 286407
如何引用文章
详细
In this article formulates the problem of simultaneous selection of both responses and explanatory variables in multivariate linear regressions. This problem is called «key responses and relevant features selection». The ordinary least squares method is used to estimate regressions. First, the problem of selecting a given number of key responses and relevant features by the criterion of the maximum sum of the regression determination coefficients was reduced to a mixed 0-1 integer linear programming problem. Then, restrictions on the signs of the estimates were introduced into it, which made it possible to select optimal structures of multivariate regressions. After that, restrictions on the absolute contributions of regressors to the overall determinations were added, which allows controlling the number of explanatory variables. When conducting computational experiments on real data with a fixed number of key responses, the time required to construct multivariate models using the proposed method was approximately 67.3 times less than the time required to construct them using the generating all subsets method. At the same time, tightening the restrictions on the absolute contributions of regressors further reduced the time required to solve problems.
作者简介
Mikhail Bazilevskiy
Irkutsk State Transport University
编辑信件的主要联系方式.
Email: mik2178@yandex.ru
Associate Professor, Candidate of technical sciences
俄罗斯联邦, Irkutsk参考
- Joshi A., Raman B., Mohan C.K., Cenkeramaddi L.R. Application of a new machine learning model to improve earthquake ground motion predictions. Natural Hazards. 2024;120(1):729–753. doi: 10.1007/s11069-023-06230-4.
- Talukder M.A., Hasan K.F., Islam M.M., Uddin M.A., Akhter A., Yousuf M.A., Alharbi F., Moni M.A. A dependable hybrid machine learning model for network intrusion detection. Journal of Information Security and Applications. 2023;72:103405. doi: 10.1016/j.jisa.2022.103405.
- Amini M., Sharifani K., Rahmani A. Machine learning model towards evaluating data gathering methods in manufacturing and mechanical engineering. International Journal of Applied Science and Engineering Research. 2023;15(2023):349–362.
- Molnar C. Interpretable machine learning. Lulu. Com; 2020.
- Tarasova Ju.A., Fevraleva E.S. Forecasting of bankruptcy: Evidence from insurance companies in Russia. Financial Journal. 2021;13(4):75–90 (In Russ.).
- Mokhtar A., Elbeltagi A., Gyasi-Agyei Y., Al-Ansari N., Abdel-Fattah M.K. Prediction of irrigation water quality indices based on machine learning and regression models. Applied Water Science. 2022;12(4):76. doi: 10.1007/s13201-022-01590-x.
- Wang S., Chen Y., Cui Z., Lin L., Zong Y. Diabetes Risk Analysis Based on Machine Learning LASSO Regression Model. Journal of Theory and Practice of Engineering Science. 2024;4(01):58–64. doi: 10.53469/jtpes.2024.04(01).08.
- Cai W., Wen X., Li C., Shao J., Xu J. Predicting the energy consumption in buildings using the optimized support vector regression model. Energy. 2023;273:127188. doi: 10.1016/j.energy.2023.127188.
- Aivazjan S.A., Mhitarjan V.S. Applied statistics and basics of econometrics. Moscow: YUNITI; 1998. 1005 p. (In Russ.).
- Miller A. Subset selection in regression. Chapman and hall/CRC; 2002.
- Das A., Kempe D. Algorithms for subset selection in linear regression. Proceedings of the fortieth annual ACM symposium on Theory of computing. 2008:45–54. doi: 10.1145/1374376.1374384.
- Koch T., Berthold T., Pedersen J., Vanaret C. Progress in mathematical programming solvers from 2001 to 2020. EURO Journal on Computational Optimization. 2022;10:100031. doi: 10.1016/j.ejco.2022.100031.
- Konno H., Yamamoto R. Choosing the best set of variables in regression analysis using integer programming. Journal of Global Optimization. 2009;44:273–282. doi: 10.1007/s10898-008-9323-9.
- Miyashiro R., Takano Y. Mixed integer second-order cone programming formulations for variable selection in linear regression. European Journal of Operational Research. 2015;247(3):721–731. doi: 10.1016/j.ejor.2015.06.081.
- Tamura R., Kobayashi K., Takano Y., Miyashiro R., Nakata K., Matsui T. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. Journal of Global Optimization. 2019;73:431–446. doi: 10.1007/s10898-018-0713-3.
- Park Y.W., Klabjan D. Subset selection for multiple linear regression via optimization. Journal of Global Optimization. 2020;77(3):543–574. doi: 10.1007/s10898-020-00876-1.
- Saishu H., Kudo K., Takano Y. Sparse Poisson regression via mixed-integer optimization. Plos one. 2021;16(4):e0249916. doi: 10.1371/journal.pone.0249916.
- Bazilevskiy M.P. Reduction the problem of selecting informative regressors when estimating a linear regression model by the method of least squares to the problem of partial-Boolean linear programming. Modeling, Optimization and Information Technology. 2018;6(1):108–117. (In Russ.).
- Bazilevskiy M.P. Subset selection in regression models with considering multicollinearity as a task of mixed 0-1 integer linear programming. Modeling, Optimization and Information Technology. 2018;6(2):104–118. (In Russ.).
- Bazilevskiy M.P. Selection an optimal number of variables in regression models using adjusted coefficient of determination as a mixed integer linear programming problem. Applied Mathematics and Control Sciences. 2020;(2):41–54. (In Russ.).
- Bazilevskiy M.P. Construction of quite interpretable linear regression models using the method of successive increase the absolute contributions of variables to the general determination. Proceedings of Voronezh State University. Series: Systems Analysis and Information Technologies. 2022;(2):5–16. (In Russ.). doi: 10.17308/sait/1995-
- /2022/2/5-16.
- Bazilevskiy M.P. Comparative analysis of the effectiveness of methods for constructing quite interpretable linear regression models. Modelling and Data Analysis. 2023;13(4):59–83. (In Russ.). doi: 10.17759/mda.2023130404.
- Shukla S., Jain P.K., Babu C.R., Pamula R. A multivariate regression model for identifying, analyzing and predicting crimes. Wireless Personal Communications. 2020;113(4):2447–2461. doi: 10.1007/s11277-020-07335-w.
- Langenbucher A., Szentmáry N., Cayless A., Weisensee J., Wendelstein J., Hoffmann P. Prediction of corneal back surface power–deep learning algorithm versus multivariate regression. Ophthalmic and Physiological Optics. 2022;42(1):185–194. doi: 10.1111/opo.12909.
- Ferster E., Rents B. Methods of correlation and regression analysis. Moscow: Finance and Statistics; 1983. 303 p. (In Russ.).
补充文件
