Santos, Laila Pereira Mota; https://orcid.org/0009-0003-1849-0300; http://lattes.cnpq.br/5248712875633926
Resumo:
Lexical Semantic Change (LSC) in the Portuguese language over time, focusing on the use of contextual language models. LSM, which refers to the change in the meaning of words over time, is a complex and multifaceted phenomenon that reflects the cultural, social and technological changes in society (AITCHISON, 2002). Understanding LSM has applications in several areas, from historical linguistics to Natural Language Processing (NLP). The proposal highlights the challenges of detecting and interpreting LSM, such as polysemy (a word with multiple meanings) and the gradual and subtle nature of semantic change. To address these challenges, the research proposes the use of contextualized semantic spaces, generated by models such as BERT (DEVLIN et al., 2019), which capture the meaning of words in their specific contexts. The central hypothesis is that these contextualized semantic spaces can represent the changes in lexical units of the Portuguese language. However, the temporal aspect of these approaches is limited to the data and is not represented. Thus, this research project proposes the construction of a diachronic corpus of the Portuguese language with the aim of generating contextual embeddings that have temporal characteristics to detect, quantify and interpret lexical semantic change. The validation of the approach will explore different metrics and approaches. It is expected that this research will contribute to the advancement of knowledge in the area of MSL, providing a model for the analysis of lexical semantic change with temporal identification in the Portuguese language.