-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve PCA and spectral inits #225
Conversation
A lot of tests fail because we would need to fix random seeds over there. I'm not doing it for now -- let's discuss first. |
I think adding a tiny amount of noise to each initialization is fine if it solves all these issues. I'm used to calling this jitter, so that's what I'll call it. However, I think the user should be able to turn this off. For instance, we could add a general parameter def jitter(x, scale=0.01, random_state):
std = np.std(x[:, 0])
target_std = std * scale
return x + random_state.normal(0, std, x.shape) The we can call this What do you think? |
Sure, let me move jitter out of I also agree that there should be a way to switch it off. What about adding I am not so sure about having the global parameter like |
Yes, this is exactly what I had in mind. We should add this to both
So you'd want to have jitter on by default for It might be fine though. The TSNE constructor has enough parameters as it is, and nobody will notice this in practice, since the added noise should be very small. Okay, I think this is fine for now then, and we'll add the parameter later, if we find any need for it. |
Agree. I refactored the jitter code, and also added a test that checks that our spectral init coincides with sklearn spectral embedding. Luckily, it passes :-) I also had to fix random seeds in some existing tests that used PCA/spectral initializations. |
Great, thanks! I'll close the related issues. |
A tiny PR that would fix #224, #223, and #180. I am not insisting on any of these changes, but thought I make the PR to make it easier to discuss.