UPDATE MAY 13, 2024: Hosting costs for multiresolution are unsustainably expensive. Moving forward, only the resized videos will be available for download.
This is a collection of Quake 1 gameplay footage that has been preprocessed such that it is appropriate for use as a deep learning dataset.
There are no class labels or ground truth; this dataset is primarily intended for unsupervised learning.
A few videos containing weapon/enemy mods made their way into dataset. Future efforts may be directed at "purifying" the data in ways such as omitting these custom weapons.
Resolution | FPS | Size (GiB) | % Reduction | Download (.zip) |
---|---|---|---|---|
320x240 | 15 | 29 | 88 | Link |
640x480 | 15 | 87 | 63 | Link |
Source* | 30 | 233 | 0 (raw) | (Unavailable) |
* Most raw videos are at 1080p/720p but some are at lower resolutions
The data can be downloaded with the AWS Command Line Interface or compatible S3 API. Folders in the S3 bucket are named according to the resolution video they contain. Because the bucket contains all resolutions in both .mp4 and .zip format, syncing the entire bucket is highly redundant and discouraged. s3 sync
is the recommended download method for slow or interruptible connections, as it can stopped and resumed without issue.
$ mkdir quake-gameplay-dataset
$ cd quake-gameplay-dataset
# The resolutions are available as both folders and zip files
# --no-sign-request allows use of awscli without credentials
$ aws s3 ls \
--endpoint https://nyc3.digitaloceanspaces.com \
--no-sign-request \
s3://quake-gameplay-dataset/
# Sync only the folder with the resolution you want
$ aws s3 sync \
--endpoint https://nyc3.digitaloceanspaces.com \
--no-sign-request \
s3://quake-gameplay-dataset/320x240 \
320x240
There are several existing Python solutions for loading frames from a directory of videos. decord is currently the most promising, given its narrowly tailored focus of machine learning. Generally, the API entails pointing the loader at a directory containing video files:
import os
import torch
import decord
from decord import VideoLoader, cpu
# Configure decord to output torch.Tensor
# You can also do this for Tensorflow, etc...
decord.bridge.set_bridge('torch')
width = 320
height = 240
dir = f'/data/quake-gameplay-dataset/{width}x{height}'
video_files = [os.path.join(dir, f)
for f in os.listdir(dir)
if f.endswith('.mp4')]
num_frames = 1 # Likely (but not always) synonymous with batch_size
batch_shape = (num_frames, width, height, 3)
vl = VideoLoader(video_files,
ctx=[cpu(0)],
shape=batch_shape,
interval=0,
skip=0,
shuffle=1)
frame_data, indices = vl.next()
# `frame_data` contains the decoded frames
assert type(frame_data) == torch.Tensor
assert frame_data.shape == batch_shape
# `indices` is the (video_num, frame_num) for each frame
assert indices.shape == (num_frames, 2)
The code for this project is maintained over in the Doom Gameplay Dataset repository. It's much simpler to maintain only one repository for the compiler code.
Gameplay videos are sourced from YouTube with permission. Special thanks to the following creators for their contributions to the community. They are good folk.
If you would like to contribute, please open an issue or submit a pull request with links to YouTube videos or playlists. The complete list of videos and playlists is raw/quake.txt
.
All videos are property of their respective creators. Permission to transform and redistribute was granted in each case. This project makes no claims of ownership to the data.
This project's code is released under MIT / Apache 2.0 2.0 dual license, which is extremely permissive.
- Doom Gameplay Dataset
- decord, for quickly loading frames
- thavlik portfolio