We have collected an instance-level logo sketch dataset, which contains a total of 2,000 logo instances. We have also collected three or more corresponding sketches for each logo to capture the variability of painting ability and style, leading to a total of 9,347 sketches. In addition, we also provide 2,000 text labels for each instance, such that the dataset allows cross-modal retrieval. The following table compares our newly collected logo sketch dataset with the other existing sketch datasets. To our knowledge, our collected dataset is the first instance-level logo sketch dataset which lends itself to fine-grained and cross-model logo sketch retrieval.
These 2,000 logo images come from all walks of life and can be divided into three categories : transportation, life services, and enterprises business, which meet the needs of daily life scenes. The following image is the classification information of the logo images.
Since our dataset is built for instance-level retrieval, the logos should cover instance-level variability of visual appearance. To this end, we have exhausted major websites to collect logos in daily life scenes. Since a logo sketch does not contain color information, logos with the same shape but different colors are filtered out, and duplicate logos are removed with only one remained. Thus, a total of 2,000 logo images are obtained.
In terms of the sketch collection, we employ the collected logo images to generate corresponding sketches. To reflect the real-world application scenarios along with the variability of painting ability and style, we use different input devices to collect sketches and generate multiple sketches for each logo image. Considering the diversity of sketch painting, the volunteers collecting the logo sketches are divided into three groups. The first group of volunteers have professional drawing ability and they drew sketches on a tablet PC. With general painting ability, the second group of volunteers make use a variety of equipment including mobile phones, white paper and drawing boards to draw sketches. Lacking sufficient drawing skills, the third group of volunteers draw incomplete or simple strokes of logo sketches using different input devices.Given a logo, multiple corresponding sketches need to be painted and collected. For a given logo instance, three or more sketches are obtained by different volunteers under different settings, leading to a total of 9,347 sketches.Some representative logo examples and the corresponding sketches drawn are shown in the following figure.
The logo sketches exhibiting diverse drawing qualities under different settings indicate different difficulty levels of sketch retrieval, and we divide the whole sketch dataset into easy, medium and hard three subsets accordingly.The following figure is an example of easy\medium\hard subsets.
In addition to sketch-photo paired labels, we also collect 2,000 text annotations as auxiliary information for sketches, allowing cross-modal logo sketch retrieval. The labeled texts mainly characterize the key attributes of logos in terms of color,shape, quantity to supplement the logo description. They not only reduce the requirements for human sketch painting ability but also improves the retrieval accuracy of the model.
We develop a triple-branch network based on hybrid attention mechanism termed logoNet for fine-grained logo sketch retrieval. The following figure illustrates the overall framework of LogoNet. In the triple-branch network, large-kernel convolutions are followed by CNN backbone with hybrid attention mechanism embedded.
The model can distinguish visually similar trademark images, which can not only capture the overall features of the image, but also capture fine-grained features. On the hard class, the model can also retrieve the corresponding positive samples with very few abstract strokes in the logo sketch, indicating that for logos that are obscured in life or blurred in memory, users can still enter incomplete sketches or simple strokes into the model, and the model can still retrieve the corresponding logo image from the library.
The following figure shows the visual results of cross-modal retrieval on our dataset. Obviously, if there is color information in the added text label, the model tends to output the logo image of that color.
@articles{logoNet2023,
title={LogoNet: a fine-grained network for instance-level logo sketch retrieval},
author={Binbin Feng, Jun Li, Jianhua Xu},
journal={arXiv perprint arXiv:4827887, 2023.},
year={2023}
}