Skip to content

woctezuma/steam-filtered-image-data

Repository files navigation

Steam Filtered Image Data

This repository provides details about image data:

A dataset, called Steam-OneFace, is shared in this section.

Data

Data downloaded from Steam consists of:

  • PNG logos with transparency,
  • JPG vertical banners.

Filtered image data is shared on Google Drive.

The up-to-date list of appIDs for which I have tried to download image data is available in:

data/download-queries/app_ids.txt. Most of these ~48k appIDs do not feature any image data.

Here is an example of a vertical banner:

Example of a vertical Banner

Here is an example of a logo:

Example of a logo

Filtering

The filtering consists in the removal of:

  • blank images, i.e. images for which grayscale image intensity extrema are equal,
  • images of uncommon resolution:
    • anything but 640x360 for logos,
    • anything but 300x450 for vertical banners,
  • images with uncommon bands:
    • anything but RGBA for logos,
    • anything but RGB for banners,
  • images for which the bounding box of the non-zero regions covers:
    • 100% of the image space for transparent logos,
    • strictly less than 100% image space for banners, which happens with vignetting.

Suggestions of filtering

Suggestions of filtering include:

The enforcement of such filtering is left to the reader. Otherwise, it would be difficult to keep filtered data up-to-date.

Steam-OneFace dataset

The notebook build_steam_oneface_dataset.ipynb shows an application of the filters suggested above. Open In Colab

There are three options for the face detector:

With the face_alignment module

This allows to build a dataset, called Steam-OneFace, of 1688 images which should all feature exactly one face.

This dataset is shared on Google Drive in:

Thumbnails of Steam OneFace

To use this dataset on Google Colab, run the following:

!gdown --id 1QptHrW9vloTtP--YJsxMY8PZWI2D8NJt
!tar xf steam-oneface-lr.tar.gz
import glob
from pathlib import Path

file_names = glob.glob('steam-oneface-lr/*.jpg')
app_ids = [int(Path(fname).stem) for fname in file_names]

With the retinaface module

The dataset consists of 2472 images, shared in:

To use this dataset on Google Colab, run the following:

!gdown --id 1-0Nk7H6Cn3Nt60EdHG_NWSA8ohi2oBqr
!tar xf steam-oneface-lr_with_retinaface.tar.gz

With the dlib module

The dataset consists of 305 images, shared in:

To use this dataset on Google Colab, run the following:

!gdown --id 1-4RIn9G9Bee2JZ1bK1gkkgkLocHuWJJ4
!tar xf steam-oneface-lr_with_dlib.tar.gz

With several detection modules

The notebook trim_steam_oneface_dataset.ipynb trims the dataset by intersecting the results of different detectors. Open In Colab

The trimmed datasets are:

Steam-OneFace-small

To use this dataset on Google Colab, run the following:

!gdown --id 1-1V5fDhPo75iDtAbrD18rppV-lf51bPW
!tar xf steam-oneface-small-lr.tar.gz

Steam-OneFace-tiny

  • Steam-OneFace-tiny:
    • 168 images,
    • obtained with modules dlib, face_alignment and retinaface.

To use this dataset on Google Colab, run the following:

!gdown --id 1-2sCVgBUmu6LFug1pzBfmL8zNFFBq27F
!tar xf steam-oneface-tiny-lr.tar.gz

Thumbnails of the WHOLE dataset Steam OneFace Tiny

References