РОЗРОБЛЕННЯ І ДОСЛІДЖЕННЯ ІНСТРУМЕНТАЛЬНИХ ЗАСОБІВ ДЛЯ НЕЙРОМЕРЕЖЕВОГО АНАЛІЗУ ГОЛОСУ СЛУХАЧІВ СИСТЕМИ ДИСТАНЦІЙНОГО НАВЧАННЯ

Заголовок (російською):

РАЗРАБОТКА И ИССЛЕДОВАНИЕ ИНСТРУМЕНТАЛЬНЫХ СРЕДСТВ ДЛЯ НЕЙРОСЕТЕВОГО АНАЛИЗА ГОЛОСА СЛУШАТЕЛЕЙ СИСТЕМЫ ДИСТАНЦИОННОГО ОБУЧЕНИЯ

Заголовок (англійською):

DEVELOPMENT AND RESEARCH OF TOOLS FOR NEURAL NETWORK ANALYSIS OF THE VOICE OF LISTENERS OF THE DISTANCE LEARNING SYSTEM

Автор(и):

Чернышев Д. О.

Михайленко В. М.

Терейковская Л. А.

Автор(и) (англ):

Chernyshev Denys

Mihaylenko Victor

Tereikovska Liudmyla

Ключові слова (укр):

нейронні мережі; розпізнавання емоцій; розпізнавання особи; дистанційне навчання; захист інформації

Ключові слова (рус):

нейронные сети; распознавание эмоций; распознавание личности; дистанционное обучение; защита информации

Ключові слова (англ):

neural networks; recognition of emotions; personality recognition; distance learning; data protection

Анотація (укр):

Обґрунтовано актуальність завдання впровадження в наявні системи дистанційного навчання інструментальних засобів розпізнавання особистості і емоцій слухачів на підставі аналізу їх голосу. Показана перспективність розробки програмних засобів нейромережевого аналізу голосу. Встановлено, що в сучасній науково-прикладній літературі недостатня увага приділяється розробці архітектури зазначених засобів нейромережевого аналізу. В результаті проведених досліджень в термінах мови моделювання UML розроблено опис архітектури модуля нейромережевого аналізу голосу слухачів системи дистанційного навчання, орієнтованого на розпізнавання особистості і емоцій слухача. Розроблено діаграми прецендентів, класів і компонентів. Також побудована структурна схема модуля нейромережевого аналізу. Особливістю запропонованих рішень є адаптація архітектури модуля до використання нейронної мережі для аналізу коефіцієнтів Фур'є відфільтрованого голосового сигналу з метою комплексного розпізнавання особистості і емоцій слухача. Доцільність використання запропонованих архітектурних рішень підтверджена за допомогою комп'ютерних експериментів, спрямованих на визначення ефективності розробленого модуля при його використанні для розпізнавання емоцій дикторів, записи голосових сигналів яких представлені в базі даних Toronto emotional speech set. Експерименти показали, що вже після 100 епох навчання точність розпізнавання емоційного забарвлення голосового сигналу для прикладів, які не ввійшли в навчальну вибірку, перебуває в діапазоні значень від 0,94 до 0,95. Отже, за досягнутими показниками точності і ресурсоємності розпізнавання емоцій розроблений модуль не поступається найбільш відомим рішенням в цій сфері. Визначено, що напрями подальших досліджень пов'язані з розробленням модулів нейромережевого аналізу таких біометричних параметрів, як зображення обличчя, райдужна оболонка ока і клавіатурний почерк, а також з інтеграцією таких модулів в єдину систему.

Анотація (рус):

Обоснована актуальность задачи внедрения в существующие системы дистанционного обучения инструментальных средств распознавания личности и эмоций слушателей на основании анализа их голоса. Показана перспективность разработки программных средств нейросетевого анализа голоса. Установлено, что в современной научно-прикладной литературе недостаточное внимание уделяется разработке архитектуры указанных средств нейросетевого анализа. В результате проведенных исследований в терминах языка моделирования UML разработано описание архитектуры модуля нейросетевого анализа голоса слушателей системы дистанционного обучения, ориентированного на распознавание личности и эмоций слушателя. Разработаны диаграммы прецендентов, классов и компонентов. Также построена структурная схема модуля распознавания. Особенностью предложенных решений является адаптация архитектуры модуля к использованию нейронной сети для анализа коэффициентов Фурье отфильтрованного голосового сигнала с целью комплексного распознавания личности и эмоций слушателя. Целесообразность использования предложенных архитектурных решений подтверждена с помощью компьютерных экспериментов, направленных на определение эффективности разработанного модуля при его использовании для распознавания эмоций дикторов, записи голосовых сигналов которых представлены в базе данных Toronto emotional speech set. Эксперименты показали, что уже после 100 эпох обучения точность распознавания эмоциональной окраски голосового сигнала для примеров, которые не вошли в учебную выборку, находится в диапазоне значений от 0,94 до 0,95. Таким образом, по достигнутым показателям точности и ресурсоемкости распознавания эмоций разработанный модуль не уступает наиболее известным решениям в данной области. Определено, что направления дальнейших исследований связаны с разработкой модулей нейросетевого анализа таких биометрических параметров, как изображение лица, радужная оболочка глаза и клавиатурный почерк, а также с интеграцией таких модулей в единую систему.

Анотація (англ):

The urgency of the problem of introducing tools for recognizing the personality and emotions of listeners into the existing distance learning systems based on the analysis of their voice is substantiated. The prospects for the development of software for neural network analysis of voice are shown. It has been established that in the modern scientific and applied literature, insufficient attention is paid to the development of the architecture of these means of neural network analysis. As a result of the research carried out in terms of the UML modeling language, a description of the architecture of the module for neural network analysis of the voice of listeners of the distance learning system, focused on recognizing the personality and emotions of the listener, has been developed. Developed diagrams of use cases, classes and components. The block diagram of the recognition module is also built. A feature of the proposed solutions is the adaptation of the module architecture to the use of a neural network for the analysis of the Fourier coefficients of the filtered voice signal for the purpose of complex recognition of the listener's personality and emotions. The expediency of using the proposed architectural solutions was confirmed with the help of computer experiments aimed at determining the effectiveness of the developed module when using it to recognize the emotions of speakers whose voice recordings are presented in the Toronto emotional speech database. Experiments have shown that after 100 epochs of training, the accuracy of recognizing the emotional coloring of a voice signal for examples that were not included in the training sample is in the range of values from 0.94 to 0.95. Thus, in terms of the achieved indicators of accuracy and resource intensity of emotion recognition, the developed module is not inferior to the most well-known solutions in this area. It has been determined that the directions for further research are related to the development of modules for neural network analysis of such biometric parameters as facial image, iris and keyboard handwriting, as well as with the integration of such modules into a single system.

Публікатор:

Київський національний університет будівництва і архітектури

Назва журналу, номер, рік випуску (укр):

Управління розвитком складних систем, номер 43, 2020

Назва журналу, номер, рік випуску (рус):

Управление развитием сложных систем, номер 43, 2020

Назва журналу, номер, рік випуску (англ):

Management of Development of Complex Systems, Number 43, 2020

Мова статті:

Русский

Формат документа:

application/pdf

Документ:

20.pdf

Дата публікації:

02 Сентябрь 2020

Номер збірника:

Розділ:

ІНФОРМАЦІЙНІ ТЕХНОЛОГІЇ ПРОЄКТУВАННЯ

Університет автора:

Киевский национальный университет строительства и архитектуры, Киев

Литература:

Aitchanov, B., Korchenko, A., Tereykovskiy, I., Tereykovska L. (2017). Perspectives for using classical neural network models and methods of counteracting attacks on net-work resources of information systems. (2017). News of the national academy of sci-ences of the republic of Kazakhstan. Geology and technical sciences, 5, 425, 202 – 212.
Ajinkya, N. Jadhav, Nagaraj, V. Dharwadkar. (2018). A Speaker Recognition System Using Gaussian Mixture Model, EM Algorithm and K-Means Clustering. International Journal of Modern Education and Computer Science, 10, 11, 19-28.
Altincay, H. (2003). Speaker identification by combining multiple classifiers using Dempster–Shafer theory of evidence. Speech Communication, 41, 4, 531 – 547.
Campbell, W., Sturim, D., Reynolds, D. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett., 13, 5, 308–311.
Ehsan, Variani, Xin, Lei, McDermott, Erik, Moreno,Ignacio Lope & Gonzalez-Dominguez, Javier. (2014). Deep neural networks for small footprint text-dependent speaker verification. In Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference, pp. 4052–4056.
Ganchev, T., Fakotakis, N., Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. 10th International Conference on Speech and Computer. Patras, Greece.
Hu, Z., Tereykovskiy, I., Zorin, Y., Tereykovska, L., Zhibek, A. (2019). Optimization of Convolutional Neural Network Structure for Biometric Authentication by Face Ge-ometry. In: Hu Z., Petoukhov S., Dychka I., He M. (eds) Advances in Computer Science for Engineering and Education. ICCSEEA 2018. Advances in Intelligent Systems and Computing, 754. Springer, Cham.
Iandola, F.N., Han, S., W. Moskewicz M.W. (2016). SqueezeNet: AlexNetlevel accuracy with 50x fewer parameters and <0.5MB model size. arXiv:1602.07360v4 [cs.CV], 13. Available at: https://arxiv.org/pdf/1602.07360.pdf.
Ingale, A.B., Chaudhari, D.S. (2012). Speech emotion recognition. International Jour-nal of Soft Computing and Engineering (IJSCE), 2, 1, 235-238.
Ing-Jr Ding, Chih-Ta Yen, Yen-Ming Hsu. (2013). Developments of Machine Learning Schemes for Dynamic Time-Wrapping-Based Speech Recognition. Mathematical Prob-lems in Engineering, 56-68.
Juslin, P.N., Laukaa, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological bulletin, 129, 5, 770.
Karam, Z., Campbell, W. (2007). A new kernel for SVM MLLR based speaker recog-nition. In: Proc. Interspeech 2007, Antwerp, Belgium, August 2007, pp.290–293.
Lyon, R.F. (2010). Machine hearing: An emerging field. IEEE signal processing maga-zine, 27, 5, 131-139.
Makarova, V., Petrushin, V.A.(2002). RUSLANA: a database of Russian emotional utterances. ICSLP, 2041-2044.
McLaren, Mitchell, Yun, Lei, Scheffer, Nicolas and Ferrer, Luciana. (2014). Application of convolutional neural networks to speaker recognition in noisy conditions. 15th An-nual Conference of the International Speech Communication Association, Singapore, September 14-18, pp. 686–690. ISCA.
Partila, P., Tovarek, J. (2015). Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System. Scientific World Journal, 7.
Penagarikano, M., Bordel, G. (2004). Layered Markov models: A New architectural approach to automatic speech recognition. Machine Learning for Signal Processing XIV - Proceedings of the 2004 IEEE Signal Processing Society Workshop, pp. 305-314.
Ranganathan, H., Chakraborty, S., Panchanathan, S. (2016). Multimodal emotion recognition using deep learning architectures. 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 1-9.
Satyanand, Singh, Abhay, Kumar, Kolluri, David Raju. (2016). Efficient Modelling Technique based Speaker Recognition under Limited Speech Data. International Journal of Image, Graphics and Signal Processing, 8, 11, 41-48.
Savchenko, L. V., Savchenko, A.V. (2019). Fuzzy Phonetic Encoding of Speech Signals in Voice Processing Systems. Journal of Communications Technology and Electronics, 64, 3, 238-244.
Savchenko, V. V., Savchenko, A.V. (2016). Information Theoretic Analysis of Efficien-cy of the Phonetic Encoding–Decoding Method in Automatic Speech Recognition. Journal of Communications Technology and Electronics, 4(61), 430-435.
Savchenko, V. V. (2015). The Principle of the Information-Divergence Minimum in the Problem of Spectral Analysis of the Random Time Series Under the Condition of Small Observation Samples. Radiophysics and Quantum Electronics, 5(58),373-379.
Savchenko, V. V. (2016). Enhancement of the Noise Immunity of a Voice-Activated Robotics Control System Based on Phonetic Word Decoding Method. Journal of Communications Technology and Electronics, 12(61), 1374-1379.
Tereykovska, L., Tereykovskiy, I., Aytkhozhaeva, E., Tynymbayev, S., Imanbayev, A. (2017). Encoding of neural network model exit signal, that is devoted for distinction of graphical images in biometric authenticate systems. News of the national academy of sciences of the republic of Kazakhstan. Geology and technical sciences6, 426, 217–224.
Tereikovskiy, I., Parkhomenko, I., Toliupa, S., Tereikovska, L. (2018). Markov model of normal conduct template of computer systems network objects. 14th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Com-puter Engineering, TCSET 2018 – Proceedings. pp. 498 – 501.
Tereikovskyi, I., Subach, I., Tereikovskyi, O., Tereikovska, L., Toliupa, S., Nakonech-nyi, V.(2019). Parameter Definition for Multilayer Perceptron Intended for Speaker Identification. 2019 IEEE International Conference on Advanced Trends in Information Theory (ATIT), Kyiv, Ukraine, 2019, pp. 227-231.
Vaziri, G., Almasganj, F., Behroozmand, R. (2010). Pathological assessment of pa-tients' speech signals using nonlinear dynamical analysis, Comput. Biol. Med., 40, 1, 54-63.