Dedicated data project #227

alecgunny · 2022-11-21T23:38:28Z

Moving all data generation projects to a single datagen project which carries all conda dependencies. Everything else can then be purely managed by Poetry and have cleaner dependency resolution. Gets rid of top level environment.yaml and poetry info to avoid confusion about how these are expected to be used. Closes #191 , closes #203 . Other salient changes

injection library moved into new datagen
trainer library carries Torch GPU dependencies

Other changes to make in future PRs for simplification

Once the WhiteningTransform and bbhnet.data.distributions submodules make it into ml4gw, the remaining dataloader and injector subclasses as well as the glitch sampler should be moved into projects/sandbox/train and the bbhnet.data project should be removed altogether. bbhnet.architectures should import the WhiteningTransform from ml4gw
I think we can get rid of bbhnet.parallelize altogether. We're much more conservative about how we submit futures now, so we don't need the default cancel_futures functionality that the AsyncExecutor class primarily serves for. For the dict-based as_completed function, we can drop this in favor of using the Future as the key of the dictionary which will map to the corresponding metadata. We can then call a regular wait on futures.keys(), then pop each key that gets returned as done.
For datagen.utils.injection.injection_waveforms, we might consider dropping the tuple background argument and the time array altogether in favor of accepting a sample_rate argument under the assumption that all of the signal_times have been normalized such that t=0 represents the start of background. Alternatively, we can even drop sample_rate and accept like a signal_offset argument that has already done the conversion to number of samples.

alecgunny added 6 commits November 22, 2022 11:19

aggregating all data generation projects into one

e34c59a

updating environments

6f4e835

adding tests

f594853

hanning -> hann

b8b4af9

getting datagen tests working

55dcd25

cherry-picking in changes from old timeslide injections

045f308

alecgunny force-pushed the dedicated-data-project branch from 640de50 to 045f308 Compare November 22, 2022 19:42

alecgunny added 7 commits November 22, 2022 13:23

specifying torch cpu source

25bba0c

updating lockfile

70046f8

updating lockfile

8797ad3

adding conda prefix echo for testing

f27f88b

adding conda prefix to environment

874ab7b

updating export to use CPU torch

0b7b86d

building in separate workflow step

d7950a0

wbenoit26 approved these changes Nov 25, 2022

View reviewed changes

EthanMarx merged commit c64c118 into ML4GW:main Nov 27, 2022

alecgunny deleted the dedicated-data-project branch November 28, 2022 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dedicated data project #227

Dedicated data project #227

alecgunny commented Nov 21, 2022 •

edited

Loading

Dedicated data project #227

Dedicated data project #227

Conversation

alecgunny commented Nov 21, 2022 • edited Loading

alecgunny commented Nov 21, 2022 •

edited

Loading