Silva, Bernardo Peters Menezes; https://orcid.org/0000-0002-8252-7230; http://lattes.cnpq.br/2843149830267243
Resumo:
Panoramic dental radiographs are highly versatile exams. They can be used to diagnose periodontal bone loss and lesions, cysts, and tumors, as well as to estimate the patient's age and biological sex. Studies that apply deep learning to determine these attributes in panoramic radiographs rely on supervised approaches that require manual annotation of each considered condition. However, manually annotating these radiographs is demanding, as it requires skilled labor and is, consequently, costly. This study aimed to overcome this challenge by exploring the concept of Human-in-the-Loop. To achieve this goal, special emphasis was placed on teeth, as they are the main reference points for radiologists interpreting panoramic radiographs. As a result, a dataset containing 4,000 instance segmentation images of teeth in panoramic radiographs was created: the O$^2$PR dataset. The other data used in this study include 4,795 radiographs in the Raw Panoramic Radiographs (RPR) dataset, with raw-format images, and the Textual Report Panoramic Radiographs (TRPR) dataset, containing 8,029 pairs of radiographic images and textual reports. Based on these datasets, we classified thirteen dental conditions present in the teeth or their surroundings. To do so, a holistic approach was necessary. First, we used the annotated radiographs from the O$^2$PR dataset to train an instance segmentation neural network to pseudo-label the teeth in unannotated radiographs. Next, we extracted crops of all teeth to enable the classification of dental conditions. The tooth crops from images without textual reports were used to pre-train Vision Transformers, which were later used as classifiers for dental conditions, through the Masked Autoencoders technique. The label extraction procedure employed a Large Language Model, GPT-4, to reduce the need for manual labeling of dental conditions. Its purpose was to identify noun phrases in the textual reports to find dental conditions. Then, a heuristic associated each tooth mentioned in the report sentences with all dental conditions present in the same sentence. We leveraged pre-trained Vision Transformers to create multiple models for dental condition classification. Encouragingly, the results consistently met or exceeded the benchmark metrics for the Matthews correlation coefficient. The comparison of the proposed solution with human professionals, supported by statistical analysis, highlighted its effectiveness and limitations. Based on the degree of agreement among specialists, the solution demonstrated a level of accuracy comparable to that of a junior specialist.