Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move dataloader initialize_object to factory methods #1510

Merged
merged 37 commits into from
Sep 27, 2022

Conversation

hanlint
Copy link
Contributor

@hanlint hanlint commented Sep 9, 2022

This PR continues our effort to remove logic from initialze_object in preparation of deprecating yahp from the codebase. We move most of the logic into factory methods, e.g.:

def build_ad20k_dataloader (...) -> DataSpec
   ...

def build_synthetic_ade20k_dataloader(..) -> DataSpec
  ...

TODO:

Need some help to complete the rest!

@hanlint hanlint requested review from knighton and a team as code owners September 9, 2022 03:23
@knighton
Copy link
Contributor

knighton commented Sep 9, 2022

ready for review or wait for all checks?

@hanlint
Copy link
Contributor Author

hanlint commented Sep 9, 2022

@knighton wait for all checks, thank you :)

@dblalock
Copy link
Contributor

dblalock commented Sep 13, 2022

Added CIFAR-10 dataloader factories. Also added tests for CIFAR10 and MNIST based on test_c4.py. There's a bunch of duplicate code across {CIFAR,MNIST} factory funcs and tests right now. Might be worth separating some code out into shared utilities once we have something working for all the datasets / know exactly what we'd like to DRY out.

Test code is especially redundant; only holding off on adding shared utils since maybe there's already a plan to replace test_dataset_registry.py.

@hanlint
Copy link
Contributor Author

hanlint commented Sep 13, 2022

Yeah we can DRY these out later.

@coryMosaicML since we are dropping BRATS, can you help test the imagenet/ade20k ones to make sure nothing broke with this PR? I think just kicking off a run and making sure it successfully starts training would be sufficient.

@A-Jacobson A-Jacobson requested a review from a team as a code owner September 13, 2022 23:40
@hanlint
Copy link
Contributor Author

hanlint commented Sep 15, 2022

@eracah @knighton this is ready for review!

Copy link
Contributor

@eracah eracah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @hanlint! This is super important for de-YAHPing because the dataset Hparams is where a lot of the "black magic" of hparams lives. Thanks for doing this. Looks pretty good to me, but I'll defer to James, who probably knows this stuff better than me

Copy link
Contributor

@A-Jacobson A-Jacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted what i think is a missing param in ade20k_hparams. otherwise lgtm! good stuff.

@hanlint hanlint enabled auto-merge (squash) September 27, 2022 17:30
@hanlint hanlint disabled auto-merge September 27, 2022 18:11
@hanlint hanlint merged commit 85ce967 into mosaicml:dev Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants