Bijos, Júlia Carolina Braz de Freitas; https://orcid.org/0000-0001-9151-4899; http://lattes.cnpq.br/1205668724465453
Resumo:
Wastewater Treatment Plants are a significant pathway for the emission of pharmaceutical concentrations into surface water, which can pose risks to aquatic species and human health. Using data on the quantity of medicines sold, it is possible to estimate the concentrations of pharmaceuticals that reach the surface waters after the treatment carried out in the treatment plants. In this sense, machine learning methods show learning capacity and, therefore, can be a relevant tool for proposing a new approach to reliable and robust models for predicting the occurrence of medicines. The research proposed a classification modeling mechanism capable of predicting the concentration level of some pharmaceuticals in effluents from wastewater treatment plants. To this end, drug sales data were used for Brazilian municipalities from 2014 to 2020, population and influent flow of municipal wastewater treatment plants, and data of occurrence concentrations of antibiotics, anti-inflammatories, and psychiatric drugs in treatment plants. RStudio software was used to manipulate drug sales data and obtain the annual mass sold, which was later used to calculate the influent concentration (ng. L-1) of drugs for treatment. Three Boosting classification methods were implemented in the Python language, running for three levels: 100 ng. L-1, 500 ng. L-1 and 1000 ng. L-1, to predict the pharmaceutical outlet concentration class for Carbamazepine and selected antibiotics. In both approaches, the XGBoost model achieved the highest performance at all levels. For the Carbamazepine approach, the XGBoost models achieved F1 scores of 63%, 91%, and 90%. For the antibiotic approach, the XGBoost model obtained an F1-score of 99%, 90%, and 85% for the 1000 ng limits. L-1, 500 ng. L-1 and 100 ng. L-1, respectively. For both approaches, it was possible to obtain models with satisfactory performance. However, the cutoff limit of 1000 ng. L-1 proved to be challenging, as it generated samples with highly imbalanced classes, while the cutoff limits were 500 ng. L-1 and 100 ng. L-1 enabled the construction of more viable models. The models revealed that Boosting classifiers have potential as an alternative for monitoring and controlling pharmaceuticals in sanitary sewage. The use of open-access data made available by ANVISA reflects the potentials and limitations for future studies since to date there has not been a broad exploration in research. Additionally, the drug occurrence database in WWTPs, built for this research, may be part of the investigative activities of other researchers, by aggregating new data, verifying other modeling techniques, or carrying out other types of analyses.