Аннотації

ДОСЛІДЖЕННЯ ЕФЕКТИВНОСТІ МЕТОДІВ КЛАСИФІКАЦІЇ ПРИ ПРОГНОЗУВАННІ В ЗАДАЧАХ МАШИННОГО НАВЧАННЯ

Автор(и):

Калініна І. О., Гожий О. П.

Автор(и) (англ)

Kalinina Iryna, Gozhyj Alexander

Дата публікації:

24.05.2021

Анотація (укр):

Розглянуто використання методів класифікації для вирішення завдання прогнозування аеродинамічних властивостей матеріалів. Запропоновано і досліджено методологію класифікації методами машинного навчання. Були використані такі методи класифікації: логістична регресія (LR), метод K-найближчих сусідів (KNN), дерева рішень (DT) та випадковий ліс (RF). Методологія складається з таких етапів: збирання даних, розвідувальний аналіз даних, моделювання, оцінювання ефективності моделей та підвищення ефективності моделей. Для реалізації процедури прогнозування проведено попереднє опрацювання даних, яке складається з етапів: збирання даних, розвідувальний аналіз даних. Наступний етап – Моделювання, складається з двох частин: підготовка та вибір моделі. Обрахована точність прогнозів. При аналізі були досліджені результати прогнозування з точки зору точності, як-от: відгук, F-міра, Каппа, значення робочої характеристики (ROC) та частоти помилок, вимірюваних середньою абсолютною помилкою (MAE) і середньоквадратичною помилкою (RMSE). Проведено аналіз точності прогнозування.

Анотація (рус):

Анотація (англ):

The article considers the use of classification methods to solve the problem of predicting the aerodynamic properties of materials. The methodology of classification by methods of machine learning is offered and investigated. The following logistic regression (LR), K-nearest neighbors (KNN) method, decision trees (DT) and random forest (RF) were used as classification methods. The methodology consists of the following stages: data collection, exploratory data analysis, modeling, evaluation of model efficiency, and improving model efficiency. To implement the forecasting procedure, preliminary data processing was performed, which consists of stages: Data collection and Intelligence data analysis. The next stage – Modeling, consists of two parts: Preparation and Selection of the model. The accuracy of forecasts is calculated. The analysis examined the prediction results in terms of accuracy, such as response, F-measure, Kappa, performance value (ROC) and error rate measured by the mean absolute error (MAE) and the root mean square error (RMSE). The analysis of forecasting accuracy is carried out.

Література:

Han, J., Kamber, M., & Pei, J. (2011). Data mining: concepts and techniques. New York: Elsevier.
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques.
Machine Learning Repository. URL: http://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise.
Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-based learning algorithms. Mach Lear., 6(1), 37–66.
Le Cessie, S. & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. J R Stat Soc., 41(1), 191–201.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Machine Learning.
Kuhn, M. & Johnson, K. (2013). Applied predictive modeling. New York: Springer; 26.
Breiman, L. (2001). Random forests. Mach Learn., 45(1), 5–32.
Breiman, L. (1996). Bagging predictors. Mach Learn., 24(2), 123–40.
Amit, Y. & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Comput., 9 (7), 1545–1588.
Witten, I. H., Frank, E., Trigg, L. E., Hall, M. A., Holmes, G. & Cunningham, S. J. (1999). Weka: Practical machine learning tools and techniques with java implementations.
Lanz, B. (2020). Machine Learning in R: Expert Techniques for Predictive Analysis. SPb.: Peter, 464. ISBN: 978-5-4461-1512-9.
James, G., Whitton, D., Hasti, T., Tibshirani, R. (2017). Introduction to statistical learning with examples in R. DMK Press, Moscow, 456. ISBN: 978-5-97060-495-3.
Bidyuk, P., Gozhyj, A., Kalinina, I. & Vysotska, V. (2020). Methods for forecasting nonlinear non-stationary processes in machine learning. In: Data Stream Mining and Processing. DSMP 2020. Communications in Computer and Information Science. vol. 1158, pp. 470–485. Springer, Cham, (2020). https://doi.org/10.1007/978-3-030-61656-4 32.
Bidyuk, P., Kalinina, I. & Gozhyj, A. (2021). An Approach to Identifying and Filling Data Gaps in Machine Learning Procedures. International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence” ISDMCI 2021: Lecture Notes in Computational Intelligence and Decision Making, pp. 164–176.