-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] t-SNE is not deterministic even with random_state #2980
Comments
Thank you for opening the issue. Maybe @danielhanchen can give a hint on this? |
Not 100% sure, but I remember this was known issue (for me at least). Eg: FAISS has some randomness inside it, which I don't know if a random seed is used. Likewise, due to GPU parallelization, TSNE will output vastly different results, since the tree search methods terminate at random intervals for each thread. [Ie thread say 3 threads ABC. A terminates in trial 1, but B in trial 2 etc.] Further discussion: CannyLab/tsne-cuda#44 One way to alleviate this issue I found was to forcefully use a random embedding close to the top 2 PCA components. This way, at least the initial random embedding is a good preconditioner, so the final output won't diverge too much from run to run. Another solution which is highly not advised is to turn tree searching to become a locking construct. Ie synchronize the entire tree algorithm, which clearly will slow things down. Sadly I know my comments won't be of much help. |
Oh in regards to the plots where TSNE suddenly diverges, I actually noticed this issue before. [By the way nice plots :) ] To alleivate wild divergences, I leveraged 2 approaches:
(2) In IntegrationKernel, if the components exceed some MAX_BOUNDS measure, usually say 100 or so, or even 1000, reset the momentum vector and gains so to not cause TSNE to continue diverging:
|
Thanks @danielhanchen. Started landing those changes with #3018. |
I have a version working with FFT. It gives higher embedding quality, solves the cluster spreading issue and is more consistent between runs. However, there's still something upstream of the main loop (e.g. in the perplexity search) that, compared to CannyLab's impl, is causing less spacing between blobs, more variability between runs and lower embedding quality, as well as occasional "pinpoint" results. See below: CannyLab has better spacing
FFT is more consistent than BH and resolves spreading. CannyLab is more consistent between runs. cuML-FFT has occasional (~1 in 15 runs) pinpointing or similar math errors (4th run).
CannyLab ends with 1.8x better grad norm and converges much fasterThis was a bigger project than I intended to take on and don't know if it would even be accepted in cuML. (For my own [work] purposes I'm likely going to use an optimzied fork of CannyLab's.) I'm running out of steam for the moment to keep digging in to the remaining cuML-FFT issues, but in addition to the above wins, the FFT version ...
so I think is worth finishing. The current B-H version doesn't seem usable. @cjnolet @dantegd @drobison00 thoughts? |
@zbjornson This looks like a pretty significant change and I do think it's worth investigating more closely as a potential alternative for cuml. Please bear with me (and anyone else who might be interested) while I/we get get spun up on the details of this approach. Undortunately, today I found what appears to be a regression in 0.16 and I'm concerned this was introduced in recent changes: #3057. I believe there's a possibility this regression could be having an effect on your comparisons as well and could also be the reason why it appears to be unusable in its current state. We might need some more comparisons across versions but I don't recall seeing artifacts and outliers in the cuml tsne such as those in your recent images. What do you think about bringing over the FFT implementation and adding it as an additional option for now using
I've found that different implementations of the algorithm cause the perplexity/n_neighbors (and other hyper params) to have different effects on the resulting embedding. This makes me wonder whether the distances between the resulting clusters in the FFT version could be reproduced by adjusting the hyper-params and whether they can be directly compared using the same hyper-param settings. |
About to open a PR for this.
Argh... I'll bisect if I have time this weekend.
Section 4 of https://arxiv.org/abs/1712.09005 had an answer:
CannyLab's impl doesn't do this, so it's a bit odd that this impl needs it. Anyway, that's one way to resolve the spacing issue. I haven't resolved the occasional "pinpointing" issue. I'm not using any randomization, so I think that means it's due to some atomic operation that will be miserable to find and correct. It's likely outside of the main loop though given that this issue doesn't occur in the CannyLab impl and the loops are nearly identical. (The reason they don't need to use late exaggeration >1 is probably related.) Unfortunately this occurs more frequently with more rows. |
ref #3058 |
#3084 resolved the artifacts and the remaining non-determinism appears to be from FP atomics. Don't think there's anything left to do (without major perf impacts). |
Opening this back up to track continued investigation of potential data races in floating point additions |
Tagging @avantikalal |
Hi, may I ask what's the current status of this? I would like to look into it after UMAP. ;-) |
@trivialfis, the new (experimental) FFT implementation has lower variance and better numerical stability but it's not completely deterministic. It would be great if you were able to make this deterministic as well. |
Thanks for sharing the information. It will be on my to-do list. |
Describe the bug
cuML's t-SNE outputs vary from run to run, even when
random_state
is used or initial embeddings are provided (and #2549 is fixed).Steps/Code to reproduce bug
Expected behavior
Should be the same between runs, like scikit learn.
Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: