Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.7 #132

Merged
merged 6 commits into from
Apr 4, 2022
Merged

Release 0.7 #132

merged 6 commits into from
Apr 4, 2022

Conversation

darsnack
Copy link
Member

@darsnack darsnack commented Mar 11, 2022

This should wait on #125 before merging and releasing (which I will review tomorrow). @theabhirath any other PRs with API changes that you think should go in this release?

Last minute changes:

  • added the ability to customize the activation function in MobileNet variants (came up in a colleague's research project and we allow this for e.g. ResNet)

@theabhirath
Copy link
Member

Nope, I think this is good, but maybe there's a way to exclude the ViTs from the release for now? I'm working on a refactor for those and there might be breaking API changes when that finalises (which would probably make releases more complicated). If it's not a biggie then that's fine (a lot of the models will probably require minor API tweaks for adding more general implementations in the future, though, so I'm a little unsure how this will play out)

@darsnack
Copy link
Member Author

That's fine, we can always make a breaking release for this package when needed. I just want to avoid releasing a new API and immediately changing it within days.

I commented out the ViT code from this release. When you submit the refactor, just comment it back in.

@theabhirath
Copy link
Member

Maybe it's worth trying to expose the activation function to the user at a uniform level in the API before the release? It's not much of a pain but some models do it (ResNet, MobileNet now) and others don't (DenseNet, frex.) One idea I had was that the function for returning the layers could expose this, with a documentation note that the original model can be used from the constructor while additional customisation can be added using the layers and wrapping them (this is implicit right now but isn't documented - also, docs keep failing, not sure what that's about)

Also, another quite minor change (but could potentially help clean up the code a little bit): conv_bn currently returns an array, meaning that it needs to be splatted every time it's called in functions. Maybe we could just return a Chain

@darsnack
Copy link
Member Author

One idea I had was that the function for returning the layers could expose this

You mean like densenet(..., activation = relu)? I definitely agree we should consistently expose this for all networks, but I'm not sure the API will be the same. For resnet, it's wrapped up inside the connection argument, so you can implicitly specify it. For mobilenetv[X], we require a different activation for some layers vs. others, so it's part of the config. It would be nice to have the same API but maybe that can be saved for a later release.

conv_bn currently returns an array, meaning that it needs to be splatted every time it's called in functions. Maybe we could just return a Chain

Taking a look over all the places it is used, it is almost always part of a larger Chain. From a users perspective, breaking the sequence into 2-layer Chains seems like a lot of visual/structural noise.

@theabhirath
Copy link
Member

You mean like densenet(..., activation = relu)? I definitely agree we should consistently expose this for all networks, but I'm not sure the API will be the same. For resnet, it's wrapped up inside the connection argument, so you can implicitly specify it. For mobilenetv[X], we require a different activation for some layers vs. others, so it's part of the config. It would be nice to have the same API but maybe that can be saved for a later release.

Yeah true, this probably needs quite some thinking for making it somewhat similar for all the networks - it probably doesn't need to hold up v0.7.

Taking a look over all the places it is used, it is almost always part of a larger Chain. From a users perspective, breaking the sequence into 2-layer Chains seems like a lot of visual/structural noise.

Hmmm, true. I also see that Chain on Flux master works even with arrays inside it, so pretty soon we can do away with the splats anyways I reckon

@ToucheSir
Copy link
Member

Chain with a Vector is currently an experimental option for models which would suffer from a ton of compilation latency otherwise. I'm not sure if any Metalhead ones fall into that bucket.

src/Metalhead.jl Outdated Show resolved Hide resolved
@darsnack darsnack force-pushed the release-0.7 branch 2 times, most recently from 1055055 to d162d9e Compare March 28, 2022 03:23
@darsnack
Copy link
Member Author

darsnack commented Mar 31, 2022

I tested the Chain(::Vector) option out, and it appears to make no difference for us. I don't think there's anything left before releasing (just need approval)?

@theabhirath
Copy link
Member

The ViT tests are still commented out, is that intentional until we find a fix for the OOMs?

@darsnack
Copy link
Member Author

darsnack commented Apr 1, 2022

It almost always OOMs for all systems right?

@theabhirath
Copy link
Member

theabhirath commented Apr 1, 2022

Yep, only macOS is spared because the runners have more memory. One solution could maybe be to remove tests for giant and gigantic - I think the tiny and base models should work fine

@darsnack
Copy link
Member Author

darsnack commented Apr 1, 2022

Let's see what happens

@darsnack
Copy link
Member Author

darsnack commented Apr 1, 2022

Okay, the Flux v0.13 release seems imminent, so let's just wait for that so we can bump our compat.

@darsnack darsnack mentioned this pull request Apr 2, 2022
@darsnack darsnack merged commit d51e3a4 into FluxML:master Apr 4, 2022
@darsnack darsnack mentioned this pull request Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants