Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a ADE20K and COCO2017 data conversion scripts #5

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

karan6181
Copy link
Collaborator

@karan6181 karan6181 commented Sep 8, 2022

Description

  • Added a ADE20K and COCO2017 data conversion scripts
  • Currently, it converts into mds format.
  • User can run the script as a standalone script to convert their RAW dataset into mds format.

Linting

$ isort <file_name>
$ yapf -i -vv -p <file_name>
$ pyright

Testing

  • Downloaded the raw ADE20K dataset locally and ran the ade20k.py script locally to generate the sharded mds file. The snapshot of structure is as shown below
.
├── train
│   ├── index.json
│   ├── shard.00000.mds
│   ├── shard.00001.mds
│   ├── shard.00002.mds
│   ├── shard.00003.mds
│   ├── shard.00004.mds
│   ├── shard.00005.mds
│   ├── shard.00006.mds
|    ............
│   ├── shard.00208.mds
│   ├── shard.00209.mds
│   └── shard.00210.mds
└── val
    ├── index.json
    ├── shard.00000.mds
    ├── shard.00001.mds
    ├── shard.00002.mds
     ............
    ├── shard.00018.mds
    ├── shard.00019.mds
    ├── shard.00020.mds
    └── shard.00021.mds

2 directories, 235 files
  • Downloaded the raw MSCOCO-2017 dataset locally and ran the coco.py script locally to generate the sharded mds file. The curtailed snapshot of structure is as shown below
.
├── train
│   ├── index.json
│   ├── shard.00000.mds
│   ├── shard.00001.mds
│   ├── shard.00002.mds
│   ├── shard.00003.mds
│   ├── shard.00004.mds
│   ├── shard.00005.mds
│   ├── shard.00006.mds
│   ├── shard.00007.mds.
│   ├── shard.00008.mds
│   ├── shard.00009.mds
│   ├── shard.00010.mds
|    .........
│   ├── shard.00572.mds
│   └── shard.00573.mds
└── val
    ├── index.json
    ├── shard.00000.mds
    ├── shard.00001.mds
    ├── shard.00002.mds
    ├── shard.00003.mds
     .........
    ├── shard.00022.mds
    ├── shard.00023.mds
    └── shard.00024.mds

2 directories, 601 files

@karan6181 karan6181 merged commit c8dae86 into mosaicml:main Sep 9, 2022
@karan6181 karan6181 deleted the convert_dataset_ade20k_coco branch September 9, 2022 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants