Ottoni, Lara Toledo Cordeiro; https://orcid.org/0000-0003-3996-431X; http://lattes.cnpq.br/2944975805256817
Resumo:
The challenge of Human-Robot Interaction (HRI) is to build intelligent systems that can adapt to user and environmental changes in order to enhance real-time interaction. In this regard, an emerging approach is the use of emotions in HRI. There are multimodal emotion recognition systems that classify emotions across various modalities (facial expression, gestures, speech, among others). However, despite studies on multimodal emotion recognition, they still have limitations in emotion classification methodology, often considering emotions as binary and overlooking the various emotions that may be present in the user. Therefore, the aim of this work is to propose a multimodal and multiclass emotion recognition system for human-robot interaction. The use of facial expression and speech modalities, as well as emotion fusion, is proposed. The Speech Emotion Recognition Module (MREF) is responsible for inferring the user's emotion from speech, utilizing a deep learning model for emotion classification. Additionally, the Facial Expression Emotion Recognition Module (MREEF) is proposed, which classifies the user's emotion from facial expressions using convolutional neural networks (CNNs). Finally, emotion fusion is proposed using fuzzy systems. When the proposed system was tested using the MELD database, the MREF achieved an accuracy of 73%, the MREEF 78.06%, and the fusion of modules achieved an accuracy of 78.94%. Thus, it can be observed that a multimodal system is more effective.