Skip to content

Latest commit

 

History

History
 
 

cadl

A dataset of CAD sketches

Overview

This repository contains the dataset used in "Computer-Aided Design as Language". We provide the following splits:

  • Training (4,656,607 sketches)
  • Validation (50,000 sketches)
  • Test (50,000 sketches)

Quickstart

First, download the dataset files:

bash download_dataset.sh

This will place the splits under data subfolder.

In order to read the data, you will need protocol buffer compiler and Tensorflow:

apt install -y protobuf-compiler
virtualenv --python=python3.6 "${ENV}"
${ENV}/bin/activate
pip install tensorflow

Next, you need to compile .proto files that define the layout of entries in the dataset:

protoc --python_out=. *.proto

Finally, you can use the generated classes to access the examples. The following python snippet reads and prints the first 5 elements from the training split:

import tensorflow as tf

import example_pb2

dataset = tf.data.TFRecordDataset("data/train.tfrecord")

for raw_record in dataset.take(5).as_numpy_iterator():
  example = example_pb2.Example()
  example.ParseFromString(raw_record)
  print(example, "\n")

Please refer to example.proto for details on the data layout.

Citation

If you use this dataset in your research, please cite:

@article{ganin2021computer,
  title={Computer-aided design as language},
  author={Ganin, Yaroslav and Bartunov, Sergey and Li, Yujia and Keller, Ethan and Saliceti, Stefano},
  journal={arXiv preprint arXiv:2105.02769},
  year={2021}
}

License

The code is licensed under the Apache 2.0 License. The dataset is licensed under a Creative Commons Attribution 4.0 International License.

Disclaimer

This is not an official Google product.