Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

Dedicated data project #227

Merged
merged 13 commits into from
Nov 27, 2022
Merged

Conversation

alecgunny
Copy link
Collaborator

@alecgunny alecgunny commented Nov 21, 2022

Moving all data generation projects to a single datagen project which carries all conda dependencies. Everything else can then be purely managed by Poetry and have cleaner dependency resolution. Gets rid of top level environment.yaml and poetry info to avoid confusion about how these are expected to be used. Closes #191 , closes #203 . Other salient changes

  • injection library moved into new datagen
  • trainer library carries Torch GPU dependencies

Other changes to make in future PRs for simplification

  • Once the WhiteningTransform and bbhnet.data.distributions submodules make it into ml4gw, the remaining dataloader and injector subclasses as well as the glitch sampler should be moved into projects/sandbox/train and the bbhnet.data project should be removed altogether. bbhnet.architectures should import the WhiteningTransform from ml4gw
  • I think we can get rid of bbhnet.parallelize altogether. We're much more conservative about how we submit futures now, so we don't need the default cancel_futures functionality that the AsyncExecutor class primarily serves for. For the dict-based as_completed function, we can drop this in favor of using the Future as the key of the dictionary which will map to the corresponding metadata. We can then call a regular wait on futures.keys(), then pop each key that gets returned as done.
  • For datagen.utils.injection.injection_waveforms, we might consider dropping the tuple background argument and the time array altogether in favor of accepting a sample_rate argument under the assumption that all of the signal_times have been normalized such that t=0 represents the start of background. Alternatively, we can even drop sample_rate and accept like a signal_offset argument that has already done the conversion to number of samples.

@alecgunny alecgunny force-pushed the dedicated-data-project branch from 640de50 to 045f308 Compare November 22, 2022 19:42
@EthanMarx EthanMarx merged commit c64c118 into ML4GW:main Nov 27, 2022
@alecgunny alecgunny deleted the dedicated-data-project branch November 28, 2022 01:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify main branch of typeo CUDA environment issues on Ampere GPUs
3 participants