Possibility of predicting the probability of thyroid cancer recurrence by machine learning methods
- Authors: Barulina M.А.1, Bendik I.Y.1, Kovalenko I.I.1, Polidanov М.A.2, Petrunkin R.P.2, Kudashkin V.N.3, Volkov K.А.4, Kravchenya A.R.4, Maslyakov V.V.4,5, Kapralov S.V.4, Aslanov H.E.4, Losyakova Y.V.3, Obukhov I.S.3, Osina A.D.4, Kurmaeva A.K.4
-
Affiliations:
- Perm State National Research University
- University «Reaviz»
- Samara State Medical University
- Saratov State Medical University named after V.I. Razumovsky
- Medical University «Reaviz»
- Issue: Vol 42, No 3 (2025)
- Pages: 130-143
- Section: Methods of diagnosis and technologies
- URL: https://bakhtiniada.ru/PMJ/article/view/312905
- DOI: https://doi.org/10.17816/pmj423130-143
- ID: 312905
Cite item
Full Text
Abstract
Objective. To develop a machine learning model for predicting the fact of recurrence in patients with thyroid cancer after surgical intervention.
Materials and Methods. According to the aim of the study, the case histories of 300 patients who had undergone surgical intervention for thyroid cancer were analyzed. The average age was 43.54 years. All patients included in the study underwent a comprehensive examination according to the clinical recommendations on the diagnosis and treatment of patients with thyroid cancer. Selection of the most appropriate model in machine learning is critical as it directly affects the accuracy and efficiency of prediction. Selection of the best model was done through comparing the performance of different algorithms on the same training sample using cross-validation. Each model was evaluated on such metrics as average accuracy and standard deviation to determine which model demonstrates the best results. The random forest model performed best in terms of average accuracy and was used hereafter. The model was trained using a matrix of predefined features. Using param grid, we can efficiently adjust hyperparameters such as the number of trees, maximum depth and minimum number of samples for separation, which will help us to find the optimal settings for our task. RandomizedSearchCV method was used to select the hyperparameters. During the hyperparameter search process, the model was trained on training data selected as 70% of the original dataset. The search resulted in the following best hyperparameters for the random forest model for our data specifically: n_estimators = 161; min_samples_split = 5; max_leaf_nodes = 39; max_depth = 12; bootstrap = True.
Results. A model that demonstrated high target feature accuracy was trained during the study. The proportion of patients with postoperative recurrence correctly identified by the model was 98 % of all patients with recurrence, and the proportion of patients without recurrence correctly classified by the model «as patients at no risk of recurrence» was 95 % of all patients without recurrence. This shows that the developed model effectively handles the task of classification based on medical parameters, which may be particularly important for decision making in clinical practice. The high accuracy indicates the reliability of the model and its ability to identify cases of recurrence correctly, this may contribute to the improvement of diagnostics and treatment.
Conclusions. A machine learning model to predict a high probability of thyroid cancer recurrence based on the analysis of medical parameters was developed while carrying out the study. The development process began with careful data preprocessing, which is a critical step in reliable models’ construction. During preprocessing, outliers and columns containing monotonic values were removed to improve the data quality and avoid distortions in the model training. Categorical variables were also coded to ensure that they could be used correctly in machine learning algorithms, and correlated features were excluded to minimize multicollinearity and increase the interpretability of the model.
Full Text
##article.viewOnOriginalSite##About the authors
M. А. Barulina
Perm State National Research University
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0003-3867-648X
DSc (Physics and Mathematics), Director of the Institute of Physics and Mathematics
Russian Federation, PermI. Yu. Bendik
Perm State National Research University
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0009-0000-7851-9492
1st-year Master's Student of the Institute of Physics and Mathematics
Russian Federation, PermI. I. Kovalenko
Perm State National Research University
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0003-4450-1184
Head of the Center for Artificial Intelligence of the Institute of Physics and Mathematics
Russian Federation, PermМ. A. Polidanov
University «Reaviz»
Author for correspondence.
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0001-7538-7412
Advisor to the Russian Academy of Natural Sciences (RANS), Research Department Specialist, Assistant of the Department of Biomedical Disciplines
Russian Federation, Saint PetersburgR. P. Petrunkin
University «Reaviz»
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0009-0003-3206-7920
3rd-year Student of the Faculty of Medicine
Russian Federation, Saint PetersburgV. N. Kudashkin
Samara State Medical University
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0001-9099-3517
Resident of the Department of Surgery with a Course in Cardiovascular Surgery of the Institute of Professional Education
Russian Federation, SamaraK. А. Volkov
Saratov State Medical University named after V.I. Razumovsky
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0002-3803-2644
3rd-year Student of the Institute of Clinical Medicine
Russian Federation, SaratovA. R. Kravchenya
Saratov State Medical University named after V.I. Razumovsky
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0003-2738-4510
PhD (Medicine), Associate Professor, Associate Professor of the Department of Pediatric Diseases of the Faculty of Medicine
Russian Federation, SaratovV. V. Maslyakov
Saratov State Medical University named after V.I. Razumovsky; Medical University «Reaviz»
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0001-6652-9140
DSc (Medicine), Professor, Professor of the Department of Mobilization Preparation of Healthcare and Disaster Medicine, Professor of the Department of Surgical Diseases
Russian Federation, Saratov; SaratovS. V. Kapralov
Saratov State Medical University named after V.I. Razumovsky
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0000-0001-5859-7928
DSc (Medicine), Associate Professor, Head of the Department of Faculty Surgery and Oncology
Russian Federation, SaratovH. E. Aslanov
Saratov State Medical University named after V.I. Razumovsky
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0009-0009-9497-5725
6th-year Student of the Institute of Clinical Medicine
Russian Federation, SaratovYe. V. Losyakova
Samara State Medical University
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0009-0003-8286-4266
6th-year Student of the Institute of Pediatrics
Russian Federation, SamaraI. S. Obukhov
Samara State Medical University
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0009-0007-5573-8431
6th-year Student of the Institute of Pediatrics
Russian Federation, SamaraA. D. Osina
Saratov State Medical University named after V.I. Razumovsky
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0009-0001-5294-3436
6th-year Student of the Institute of Clinical Medicine
Russian Federation, SaratovA. K. Kurmaeva
Saratov State Medical University named after V.I. Razumovsky
Email: maksim.polidanoff@yandex.ru
ORCID iD: 0009-0002-0886-6290
6th-year Student of the Institute of Clinical Medicine
Russian Federation, SaratovReferences
- Берштейн Л.М. Рак щитовидной железы: эпидемиология, эндокринология, факторы и механизмы канцерогенеза. Практическая онкология 2007; 8 (1): 1–8 / Berstein L.M. Thyroid cancer: epidemiology, endocrinology, factors and mechanisms of carcinogenesis. Praktical Onkology 2007; 8 (1): 1–8 (in Russian).
- Лушников Е.Ф., Цыб А.Ф., Ямасита С. Рак щитовидной железы в России после Чернобыля. М.: Медицина, 2006; 128. / Lushnikov E.F., Tsyb A.F., Yamashita S. Thyroid cancer in Russia after Chernobyl. Moscow: Medicine 2006; 128 (in Russian).
- Bentz B.G. et al. B-RAF V600E mutational analysis of fine needle aspirates correlates with diagnosis of thyroid nodules. Otolaryngol. Head Neck Surg. 2009; 140 (5): 709–714.
- Барчук А.С. Рецидивы дифференцированного рака щитовидной железы. Практическая онкология. 2007; 8 (1): 35. / Barchuk A.S. Recurrences of differentiated thyroid cancer. Practical Oncology 2007; 8 (1): 35 (in Russian).
- Amin M.B., Greene F.L., Edge S.B. et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more «personalized» approach to cancer staging. CA Cancer J Clin. 2017; 67 (2): 93–99.
- Kane S.M., Mulhern M.S., Pourshahidi L.K. et al. Micronutrients, iodine status and concentrations of thyroid hormones: a systematic review. Nutr Rev. 2018; 76 (6): 418–431.
- Agretti P. et al. MicroRNA expression profile helps to distinguish benign nodules from papillary thyroid carcinomas starting from cells of fine-needle aspiration. J. Eur. Endocrinol. 2012; 167 (3): 393–400.
- Румянцев П.О., Ильин А.А., Румянцева У.В. и др. Рак щитовидной железы: современные подходы к диагностике и лечению. М.: ГЭОТАР-Медиа 2009; 448. / Rumyantsev P.O., Ilyin A.A., Rumyantseva U.V. et al. Thyroid cancer: modern approaches to diagnosis and treatment. Moscow: GEOTAR-Media 2009; 448 (in Russian).
- Bellevicine C. et al. Cytological and molecular features of papillary thyroid carcinoma with prominent hobnail features: a case report. Acta Cytol. 2012; 56 (5): 560–564.
- Elisei R. et al. The BRAFV600E mutation is an independent, poor prognostic factor for the outcome of patients with low-risk intrathyroid papillary thyroid carcinoma: single-institution results from a large cohort study. J. Clin. Endocrinol. Metab. 2012; 97 (12): 4390–4398.
- Макарьин В.А. Рак щитовидной железы: пособие для пациентов. М. 2016; 168. / Makarin V.A. Thyroid cancer. A manual for patients. Moscow 2016; 168 (in Russian).
- Клинические рекомендации. Дифференцированный рак щитовидной железы. Кодирование по Международной статистической классификации болезней ипроблем, связанных со здоровьем: С 73. Возрастная группа: взрослые. М. 2020. / Clinical guidelines. Differentiated thyroid cancer. Coding according to the International Statistical Classification of Diseases and Related Health Problems: С 73. Age group: adults. Moscow 2020 (in Russian).
Supplementary files
