Introducing a self-supervised, superfeature-based network for Video object segmentation

Santos, Marcelo Mendonça dos

Campo DC

Valor

Idioma

dc.creator

Santos, Marcelo Mendonça dos

dc.date.accessioned

2023-10-06T12:45:48Z

dc.date.available

2023-10-06T12:45:48Z

dc.date.issued

2023-06-09

dc.identifier.uri

https://repositorio.ufba.br/handle/ri/37993

dc.description.abstract

Video object segmentation (VOS) is a complex computer vision task that involves identifying and separating the pixels in a video sequence based on regions, which can be either the background or foreground of the scene, or even specific objects within it. The task must be accomplished consistently throughout the sequence, ensuring that the same object or region receives the same label in all frames. Recent advances in deep learning techniques and high-definition datasets have led to significant progress in the VOS area. Modern methods can handle complex video scenarios, including multiple objects moving over dynamic backgrounds. However, these methods rely heavily on manually annotated datasets, which can be expensive and time-consuming to create. Alternatively, self-supervised methods have been proposed to eliminate the need for manual annotations during training. These methods utilize intrinsic properties of videos, such as the temporal coherence between frames, to generate a supervisory signal for training without human intervention. The downside is that self-supervised methods often demand extensive training data to effectively learn the VOS task without supervision. In this work, we propose Superfeatures in a Highly Compressed Latent Space (SHLS), a novel self-supervised VOS method that dispenses manual annotations while reducing substantially the demand for training data. Using a metric learning approach, SHLS combines superpixels and deep learning features, enabling us to learn the VOS task from a small dataset of unlabeled still images. Our solution is built upon Iterative over-Segmentation via Edge Clustering (ISEC), our efficient superpixel method that provides the same level of segmentation accuracy as top-performing superpixel algorithms while generating significantly fewer superpixels. This is especially useful for processing videos, where the number of pixels increases over time. Our proposed SHLS embeds convolutional features from the frame pixels into the corresponding superpixel areas, resulting in ultra-compact image representations called superfeatures. The superfeatures comprise a latent space where object information is efficiently stored, retrieved, and classified throughout the frame sequence. We conduct a series of experiments on the most popular VOS datasets and observe competitive results. Compared to state-of-the-art self-supervised methods, SHLS achieves the best performance on the single-object segmentation test of the DAVIS-2016 dataset and ranks in the top five on the DAVIS-2017 multi-object test. Remarkably, our method was trained with only 10,000 still images, outstanding from the other self-supervised methods, which require much larger video-based datasets. Overall, our proposed method represents a significant advancement in self-supervised VOS, offering an efficient and effective alternative to manual annotations and significantly reducing the demand for training data.

pt_BR

dc.language

eng

pt_BR

dc.publisher

Universidade Federal da Bahia

pt_BR

dc.rights

Attribution-NonCommercial-NoDerivs 3.0 Brazil

dc.rights.uri

http://creativecommons.org/licenses/by-nc-nd/3.0/br/

dc.subject

Segmentação de objetos em vídeo

pt_BR

dc.subject

Segmentação por superpixels

pt_BR

dc.subject

Redes neurais convolucionais

pt_BR

dc.subject.other

Video object segmentation

pt_BR

dc.subject.other

Superpixel segmentation

pt_BR

dc.subject.other

Convolutional neural networks

pt_BR

dc.title

Introducing a self-supervised, superfeature-based network for Video object segmentation

pt_BR

dc.type

Tese

pt_BR

dc.publisher.program

Programa de Pós-Graduação em Mecatrônica da UFBA (PPGM)

pt_BR

dc.publisher.initials

UFBA

pt_BR

dc.publisher.country

Brasil

pt_BR

dc.subject.cnpq

CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO::ARQUITETURA DE SISTEMAS DE COMPUTACAO

pt_BR

dc.contributor.advisor1

Oliveira, Luciano Rebouças de

dc.contributor.advisor1ID

https://orcid.org/0000-0001-7183-8853

pt_BR

dc.contributor.advisor1Lattes

http://lattes.cnpq.br/0372650483087124

pt_BR

dc.contributor.referee1

Oliveira, Luciano Rebouças de

dc.contributor.referee1ID

https://orcid.org/0000-0001-7183-8853

pt_BR

dc.contributor.referee1Lattes