Аннотації

Оптимізація й адаптація нейромереж на основі наявних архітектур: методи, виклики та перспективи

Автор(и):

Рябчун Ю. В., Курінський О. В., Доля О. В., Фесан А. О.

Автор(и) (англ)

Riabchun Yu., Kurinsky O., Dolya E., Fesan A.

Дата публікації:

08.04.2025

Анотація (укр):

У статті представлено комплексне дослідження розробки нейронних мереж на основі наявних архітектур, спрямоване на створення потужних моделей, здатних вирішувати складні завдання машинного навчання, з метою підвищення їх ефективності у вирішенні різноманітних завдань штучного інтелекту. Актуальність дослідження обумовлена зростанням вимог до продуктивності моделей у зв’язку з обмеженнями апаратних ресурсів, необхідністю швидкого реагування на нові виклики й адаптації до специфічних умов використання в реальному часі. Проаналізовано сучасні підходи до модифікації попередньо навчених моделей з метою підвищення їх продуктивності, зменшення обчислювальних витрат та адаптації до нових завдань. Дослідження включає аналіз наявних підходів, виявлення основних викликів при інтеграції оптимізованих нейронних мереж у різні галузі застосування та розроблення рекомендацій щодо покращення їх продуктивності й адаптивності. Результати дослідження уможливлюють не лише класифікувати наявні методи, а й окреслити перспективні напрями розвитку технологій, що сприятимуть створенню більш ефективних та гнучких систем штучного інтелекту. Використовуючи різноманітні методи і підходи, включаючи перенесення навчання (transfer learning), тонке налаштування (fine-tuning) та ансамблеві методи (ensemble methods), ця робота розлядає оптимізацію процесу створення нових нейронних мереж шляхом використання попередньо навчених моделей як основи. Окрему увагу приділено методам перенесення навчання, компресії моделей, квантилізації та пошуку нейромережевих архітектур (NAS). Центральним елементом підходу є розгортання нейронних мереж, побудованих на основі наявних моделей, що дає змогу суттєво скоротити час навчання і підвищити точність нових мереж. Методологія дослідження охоплює збір та підготовку набору даних, нормалізацію даних для забезпечення стабільності й ефективності навчання, а також використання оптимізатора Adam для швидкої та ефективної мінімізації функції втрат. Практична цінність отриманих результатів проявляється в можливості їх застосування для розробки більш енергоефективних рішень у таких галузях, як автономні транспортні засоби, системи обробки природної мови, медична діагностика та інших сферах, де критично важлива швидка адаптація до змінних умов роботи.

Анотація (рус):

Анотація (англ):

The article presents a comprehensive study of the development of neural networks based on existing architectures aimed at creating powerful models capable of solving complex machine learning problems in order to increase their effectiveness in solving various artificial intelligence tasks. The relevance of the study is due to the growing requirements for model performance due to hardware resource constraints, the need to respond quickly to new challenges and adapt to specific conditions of use in real time. The study analyzes modern approaches to modifying pre-trained models to improve their performance, reduce computational costs, and adapt to new tasks. The study includes an analysis of existing approaches, identification of the main challenges in integrating optimized neural networks into various applications, and development of recommendations for improving their performance and adaptability. The results of the study allow not only to classify existing methods but also to outline promising areas of technology development that will contribute to the creation of more efficient and flexible artificial intelligence systems. Using a variety of methods and approaches, including transfer learning, fine-tuning, and ensemble methods, this paper discusses the optimization of the process of creating new neural networks by using pre-trained models as a basis. Special attention is paid to methods of transfer learning, model compression, quantization, and neural network architecture search (NAS). The central element of the approach is the deployment of neural networks built on the basis of existing models, which can significantly reduce training time and improve the accuracy of new networks. The research methodology includes collecting and preparing the data set, normalizing the data to ensure stability and efficiency of training, and using the Adam optimizer to minimize the loss function quickly and efficiently. The practical value of the results obtained is manifested in the possibility of their application to develop more energy-efficient solutions in such areas as autonomous vehicles, natural language processing systems, medical diagnostics, and other areas where rapid adaptation to changing operating conditions is critical.

Література:

1. Koreniuk T., Honcharenko, T., Sapaiev, V. Individualization of Learning due to Introduction of Artificial Intelligence into the Education System. 2024 IEEE AITU: Digital Generation, Conference Pro eedings – AITU, 2024, pp. 150–153. URL: DOI: 10.1109/IEEECONF61558.2024.10585595.

2. Matsiievskyi, O., Honcharenko, T., Solovei, O., Liashchenko, T., Achkasov, I., Golenkov, V. Using Artificial Intelligence to Convert Code to Another Programming Language. 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST) 2024, pp. 379–385. URL: https://ieeexplore.ieee.org/abstract/document/10629305.

3. Gaudenz, Boesch. (2021). Very Deep Convolutional Networks (VGG) Essential Guide. URL: https://viso.ai/deep-learning/vgg-very-deep-convolutional-networks/

4. Simonyan, Karen, Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Vision and Pattern Recognition. arXiv:1409.1556. URL: https://doi.org/10.48550/arXiv.1409.1556.

5. Geoffrey, E., Hinton, O., Vinyals, J. Dean. (2015). Distilling the Knowledge in a Neural Network. NIPS 2014 Deep Learning Workshop, URL: https://doi.org/10.48550/arXiv.1503.02531.

6. Howard J., Gugger S. Fastai: A Layered API for Deep Learning. Information. 2020. Vol. 11, no. 2. P. 108. URL: https://doi.org/10.3390/info11020108.

7. Kirkpatrick J. et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences. 2017. Vol. 114, no. 13. P. 3521–3526. URL: https://doi.org/10.1073/pnas.1611835114.

8. Song Han, Jeff Pool, John Tran, William J. Dally. (2015). Learning both Weights and Connections for Efficient Neural Networks. Published as a conference paper at NIPS 2015. URL: https://doi.org/10.48550/arXiv.1506.02626.

9. Wang Y. et al. (2024). Spectrum-BERT: Pre-training of Deep Bidirectional Transformers for Spectral Classification of Chinese Liquors. IEEE Transactions on Instrumentation and Measurement. 2024. P. 1. URL: https://doi.org/10.1109/tim.2024.3374300.

10. Geoffrey, Hinton, Oriol, Vinyals, Jeff, Dean. (2015). Distilling the Knowledge in a Neural Network. NIPS 2014 Deep Learning Workshop. URL: https://doi.org/10.48550/arXiv.1503.02531.

11. Barret Zoph, Quoc V. Le, (2016). Neural Architecture Search with Reinforcement Learning. Machine Learning (cs.LG). URL: https://doi.org/10.48550/arXiv.1611.01578.

12. Cassimon A., Mercelis S., Mets K. Scalable reinforcement learning-based neural architecture search. Neural Computing and Applications. 2024. URL: https://doi.org/10.1007/s00521-024-10445-2.

13. Hanxiao, Liu, Karen, Simonyan, Yiming, Yang. (2018). Differentiable Architecture Search. Published at ICLR 2019. URL: https://doi.org/10.48550/arXiv.1806.09055.

14. Jbara W. A., Soud J. H. (2024) DeepFake Detection Based VGG-16 Model. 2024 2nd International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, 26–28 February 2024. URL: https://doi.org/10.1109/iccr61006.2024.10533024.

15. Qian Y. et al. Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition. IEEE / ACM Transactions on Audio, Speech, and Language Processing. 2016. Vol. 24, no. 12. P. 2263–2276. URL: https://doi.org/10.1109/taslp.2016.2602884.

16. Nikbakhtsarvestani, F., Ebrahimi, M., Rahnamayan, S. Multi-objective ADAM Optimizer (MAdam). 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA, 1–4 October 2023. 2023. URL: https://doi.org/10.1109/smc53992.2023.10394533.

17. Pateriya, P. N. et al. Deep Residual Networks for Image Recognition. International Journal of Innovative Research in Computer and Communication Engineering. 2023. Vol. 11, no. 09. P. 10742–10747. URL: https://doi.org/10.15680/ijircce.2023.1109026.

18. Krizhevsky Alex, Sutskever Ilya and E. Hinton Geoffrey. (2012). ImageNet Classification with Deep Convolutional Neural Networks", NIPS, pp. 1106–1114.

19. Xiaoling Xia, Cui Xu, and Bing Nan, (2017). Inception-v3 for flower classification. 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, 2017, pp. 783–787. URL: doi: 10.1109/ICIVC.2017.7984661.

References:

1. Koreniuk T., Honcharenko, T., Sapaiev, V. (2024). Individualization of Learning due to Introduction of Artificial Intelligence into the Education System. 2024 IEEE AITU: Digital Generation, Conference Pro eedings – AITU 2024, pp. 150–153. URL: DOI: 10.1109/IEEECONF61558.2024.10585595.

2. Matsiievskyi, O., Honcharenko, T., Solovei, O., Liashchenko, T., Achkasov, I., Golenkov, V. (2024). Using Artificial Intelligence to Convert Code to Another Programming Language. 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST), pp. 379–385. URL: https://ieeexplore.ieee.org/abstract/document/10629305.

3. Gaudenz, Boesch. (2021). Very Deep Convolutional Networks (VGG) Essential Guide. URL: https://viso.ai/deep-learning/vgg-very-deep-convolutional-networks/

5. Geoffrey, E., Hinton, O., Vinyals, J. Dean. (2015). Distilling the Knowledge in a Neural Network. NIPS 2014 Deep Learning Workshop. URL: https://doi.org/10.48550/arXiv.1503.02531.

6. Howard, J., Gugger, S. Fastai: A Layered API for Deep Learning. Information. 2020. Vol. 11, no. 2. P. 108. URL: https://doi.org/10.3390/info11020108.

7. Kirkpatrick, J., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences. Vol. 114, no. 13. P. 3521–3526. URL: https://doi.org/10.1073/pnas.1611835114.

8. Song, Han, Jeff, Pool, John, Tran, William, J. Dally. (2015). Learning both Weights and Connections for Efficient Neural Networks. Published as a conference paper at NIPS 2015. URL: https://doi.org/10.48550/arXiv.1506.02626.

9. Wang Y., et al. (2024). Spectrum-BERT: Pre-training of Deep Bidirectional Transformers for Spectral Classification of Chinese Liquors. IEEE Transactions on Instrumentation and Measurement. 2024. P. 1. URL: https://doi.org/10.1109/tim.2024.3374300.

10. Geoffrey, Hinton, Oriol, Vinyals, Jeff, Dean. (2015). Distilling the Knowledge in a Neural Network. NIPS 2014 Deep Learning Workshop. URL: https://doi.org/10.48550/arXiv.1503.02531.

11. Barret Zoph, Quoc V. Le (2016). Neural Architecture Search with Reinforcement Learning. Machine Learning (cs.LG). URL: https://doi.org/10.48550/arXiv.1611.01578.

12. Cassimon, A., Mercelis, S., Mets, K. (2024). Scalable reinforcement learning-based neural architecture search. Neural Computing and Applications. URL: https://doi.org/10.1007/s00521-024-10445-2.

13. Hanxiao Liu, Karen Simonyan, Yiming Yang. (2018). Differentiable Architecture Search. Published at ICLR 2019. URL: https://doi.org/10.48550/arXiv.1806.09055

14. Jbara, W. A., Soud, J. H. (2024). DeepFake Detection Based VGG-16 Model. 2024 2nd International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, 26–28 February 2024. URL: https://doi.org/10.1109/iccr61006.2024.10533024.

15. Qian Y. et al. (2016). Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2016. Vol. 24, no. 12. P. 2263–2276. URL: https://doi.org/10.1109/taslp.2016.2602884.

16. Nikbakhtsarvestani F., Ebrahimi M., Rahnamayan S. Multi-objective ADAM Optimizer (MAdam). 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA, 1–4 October 2023. 2023. URL: https://doi.org/10.1109/smc53992.2023.10394533

17. Pateriya P. N. et al. (2023). Deep Residual Networks for Image Recognition. International Journal of Innovative Research in Computer and Communication Engineering. Vol. 11, no. 09. P. 10742–10747. URL: https://doi.org/10.15680/ijircce.2023.1109026

18. Krizhevsky, Alex, Sutskever, Ilya, and E., Hinton, Geoffrey. (2012). ImageNet Classification with Deep Convolutional Neural Networks", NIPS, pp. 1106–1114.

19. Xiaoling Xia, Cui Xu and Bing Nan. (2017). Inception-v3 for flower classification. 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, 2017, pp. 783–787. URL: doi: 10.1109/ICIVC.2017.7984661.