[BUG] t-SNE is not deterministic even with random_state #2980

zbjornson · 2020-10-14T17:22:47Z

Describe the bug
cuML's t-SNE outputs vary from run to run, even when random_state is used or initial embeddings are provided (and #2549 is fixed).

Steps/Code to reproduce bug

from cuml.manifold import TSNE
from sklearn.datasets.samples_generator import make_blobs
from matplotlib import pyplot as plt
import numpy as np

X, y = make_blobs(n_samples=10000, centers=8, n_features=4, random_state=0)
X = X.astype(np.float32)
tsne = TSNE(random_state=0)

Y1 = tsne.fit_transform(X)
plt.scatter(Y1[:, 0], Y1[:, 1])
plt.show()

Y2 = tsne.fit_transform(X)
plt.scatter(Y2[:, 0], Y2[:, 1])
plt.show()

Expected behavior
Should be the same between runs, like scikit learn.

Environment details (please complete the following information):

Environment location: GCE
Linux Distro/Architecture: Ubuntu 20.04 amd64
GPU Model/Driver: V100 and driver 440.33.01
CUDA: 10.2
Method of cuDF & cuML install: source, cmake 3.16.3, gcc 8.4.0, fbe6272

The text was updated successfully, but these errors were encountered:

viclafargue · 2020-10-15T14:12:11Z

Thank you for opening the issue. Maybe @danielhanchen can give a hint on this?

danielhanchen · 2020-10-15T14:59:06Z

Not 100% sure, but I remember this was known issue (for me at least).
The issue is that although the start embedding is initialized the same with the same random seed, other components inside TSNE are not determinstic (well not yet set deterministically).

Eg: FAISS has some randomness inside it, which I don't know if a random seed is used. Likewise, due to GPU parallelization, TSNE will output vastly different results, since the tree search methods terminate at random intervals for each thread. [Ie thread say 3 threads ABC. A terminates in trial 1, but B in trial 2 etc.]

Further discussion: CannyLab/tsne-cuda#44

One way to alleviate this issue I found was to forcefully use a random embedding close to the top 2 PCA components. This way, at least the initial random embedding is a good preconditioner, so the final output won't diverge too much from run to run.

Another solution which is highly not advised is to turn tree searching to become a locking construct. Ie synchronize the entire tree algorithm, which clearly will slow things down.

Sadly I know my comments won't be of much help.

zbjornson · 2020-10-16T23:08:40Z

Thanks for the tips. FAISS as used here is deterministic. It looks like the variation is due to atomics and parallelization.

I tried PCA initialization, but was surprised by the extent of variation between runs:

Runs 1,3,5 are similar. Runs 2,4 are similar. Run 2 looks like there's some legit FP error happening, but it's nearly impossible to debug without deterministic behavior.

Five runs with random init but the same seed:

Five runs with random init, random seeds:

UMAP also appears non-deterministic, despite spectral init, although it's less pronounced. Local structure is very well preserved, and global structure is well preserved with a higher number of neighbors.
k=15

k=50

danielhanchen · 2020-10-17T09:30:45Z

Oh in regards to the plots where TSNE suddenly diverges, I actually noticed this issue before. [By the way nice plots :) ]
#1383 had many changes I did a while back, but they weren't incorporated since I left NVIDIA a while back.

To alleivate wild divergences, I leveraged 2 approaches:
(1) Mean centre components after IntegrationKernel every time during an epoch. This should be super fast, since you can add atomicAdd in the IntegrationKernel.

    MLCommon::LinAlg::unaryOp(
      sums, sums, 2, [div_N] __device__(float x) { return x * div_N; }, stream);
    TSNE::mean_centre<<<MLCommon::ceildiv(n, 1024), 1024, 0, stream>>>(
      YY, YY + NNODES + 1, sums, n);
    CUDA_CHECK(cudaPeekAtLastError());

(2) In IntegrationKernel, if the components exceed some MAX_BOUNDS measure, usually say 100 or so, or even 1000, reset the momentum vector and gains so to not cause TSNE to continue diverging:

    // Confirm Y1 and Y2 are within bounds (-MAX_BOUNDS, MAX_BOUNDS)
    // These are arbitrary but can help with outliers
    // Also reset both gains and old forces
    if (Y1[i] < -MAX_BOUNDS) {
      Y1[i] = -MAX_BOUNDS;
      gains1[i] = 1;
      old_forces1[i] = 0;
    }
    else if (Y1[i] > MAX_BOUNDS) {
      Y1[i] = MAX_BOUNDS;
      gains1[i] = 1;
      old_forces1[i] = 0;
    }
    if (Y2[i] < -MAX_BOUNDS) {
      Y2[i] = -MAX_BOUNDS;
      gains2[i] = 1;
      old_forces2[i] = 0;
    }
    else if (Y2[i] > MAX_BOUNDS) {
      Y2[i] = MAX_BOUNDS;
      gains2[i] = 1;
      old_forces2[i] = 0;
    }

zbjornson · 2020-10-20T02:22:58Z

Thanks @danielhanchen. Started landing those changes with #3018.

zbjornson · 2020-10-20T19:07:40Z

Hrm, even with mean-centering and constraining to a bounding box, roughly 1 in 4 runs still give wonky results, and no runs give great results. Here are a few examples of the same dataset run several times with PCA init:

Exact	Okay B-H	Bad B-H 1	Bad B-H 2	Bad B-H 3

All 8 clusters clearly visible	All 8 clusters clearly visible, but some spreading.	Purple and green clusters tiny. Lots of stray points.	Green cluster tiny. Stray points. Clumping in teal cluster.	Lots of spreading.

I'm looking at incorporating Martin Burtscher's latest changes to the B-H kernels in hopes that numerical stability will be improved/FP errors reduced. Also looking at the newer FFT kernel approach. CannyLab's version is reproducible and gives consistently good results:

zbjornson · 2020-10-24T01:25:12Z

I have a version working with FFT. It gives higher embedding quality, solves the cluster spreading issue and is more consistent between runs. However, there's still something upstream of the main loop (e.g. in the perplexity search) that, compared to CannyLab's impl, is causing less spacing between blobs, more variability between runs and lower embedding quality, as well as occasional "pinpoint" results. See below:

CannyLab has better spacing

cuML-FFT	CannyLab

FFT is more consistent than BH and resolves spreading. CannyLab is more consistent between runs. cuML-FFT has occasional (~1 in 15 runs) pinpointing or similar math errors (4th run).

Run	1	2	3	4
cuML-BH
cuML-FFT
CannyLab

CannyLab ends with 1.8x better grad norm and converges much faster

This was a bigger project than I intended to take on and don't know if it would even be accepted in cuML. (For my own [work] purposes I'm likely going to use an optimzied fork of CannyLab's.) I'm running out of steam for the moment to keep digging in to the remaining cuML-FFT issues, but in addition to the above wins, the FFT version ...

is 20-30% faster
is significantly shorter and simpler
adds early stopping based on gradient norm
has more acceptable licensing than Burtscher's Barnes-Hut (Note than v4.5 has a more permissive license, but cuML is based on v3.1, which has a commercial-restrictive license. v4 uses a 3D tree.)

so I think is worth finishing. The current B-H version doesn't seem usable.

@cjnolet @dantegd @drobison00 thoughts?

cjnolet · 2020-10-24T01:49:24Z

@zbjornson This looks like a pretty significant change and I do think it's worth investigating more closely as a potential alternative for cuml. Please bear with me (and anyone else who might be interested) while I/we get get spun up on the details of this approach.

Undortunately, today I found what appears to be a regression in 0.16 and I'm concerned this was introduced in recent changes: #3057. I believe there's a possibility this regression could be having an effect on your comparisons as well and could also be the reason why it appears to be unusable in its current state. We might need some more comparisons across versions but I don't recall seeing artifacts and outliers in the cuml tsne such as those in your recent images.

What do you think about bringing over the FFT implementation and adding it as an additional option for now usingalgorithm='fft'?

However, there's still something upstream of the main loop (e.g. in the perplexity search) that, compared to CannyLab's impl, is causing less spacing between blobs, more variability between runs and lower embedding quality,

I've found that different implementations of the algorithm cause the perplexity/n_neighbors (and other hyper params) to have different effects on the resulting embedding. This makes me wonder whether the distances between the resulting clusters in the FFT version could be reproduced by adjusting the hyper-params and whether they can be directly compared using the same hyper-param settings.

zbjornson · 2020-10-25T02:21:05Z

What do you think about bringing over the FFT implementation and adding it as an additional option for now using algorithm='fft'

About to open a PR for this.

Undortunately, today I found what appears to be a regression in 0.16 and I'm concerned this was introduced in recent changes: #3057.

Argh... I'll bisect if I have time this weekend.

This makes me wonder whether the distances between the resulting clusters in the FFT version could be reproduced by adjusting the hyper-params and whether they can be directly compared using the same hyper-param settings.

Section 4 of https://arxiv.org/abs/1712.09005 had an answer:

We present a novel variation called "late exaggeration,” which refers to setting α >1 for the last several hundred iterations. This approach seems to produce more easily interpretable visualizations: one recurring issue with t-SNE outputs (see Fig. 5) is that the arising structure, while clustered, has its clusters close to each other and does not allow for an easy identification of segments. By increasing α (and thus the attraction between points in the same cluster), the clusters contract and are more easily distinguishable.

CannyLab's impl doesn't do this, so it's a bit odd that this impl needs it. Anyway, that's one way to resolve the spacing issue.

I haven't resolved the occasional "pinpointing" issue. I'm not using any randomization, so I think that means it's due to some atomic operation that will be miserable to find and correct. It's likely outside of the main loop though given that this issue doesn't occur in the CannyLab impl and the loops are nearly identical. (The reason they don't need to use late exaggeration >1 is probably related.) Unfortunately this occurs more frequently with more rows.

cjnolet · 2020-10-28T23:28:07Z

ref #3058

zbjornson · 2020-11-06T20:29:28Z

#3084 resolved the artifacts and the remaining non-determinism appears to be from FP atomics. Don't think there's anything left to do (without major perf impacts).

cjnolet · 2020-11-12T18:35:07Z

Opening this back up to track continued investigation of potential data races in floating point additions

cjnolet · 2020-11-12T18:35:17Z

Tagging @avantikalal

trivialfis · 2021-05-13T20:38:57Z

Hi, may I ask what's the current status of this? I would like to look into it after UMAP. ;-)

cjnolet · 2021-05-13T21:19:35Z

@trivialfis, the new (experimental) FFT implementation has lower variance and better numerical stability but it's not completely deterministic. It would be great if you were able to make this deterministic as well.

trivialfis · 2021-05-15T15:01:54Z

Thanks for sharing the information. It will be on my to-do list.

zbjornson added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 14, 2020

viclafargue removed the ? - Needs Triage Need team to review and classify label Oct 15, 2020

zbjornson closed this as completed Oct 16, 2020

zbjornson mentioned this issue Oct 16, 2020

[REVIEW] Obey initialize_embeddings parameter in B-H tSNE #3011

Merged

zbjornson mentioned this issue Oct 19, 2020

[REVIEW] Center components on the mean in Barnes-Hut t-SNE #3018

Closed

zbjornson reopened this Oct 20, 2020

zbjornson changed the title ~~[BUG] t-SNE is not deterministic even with random_state~~ [BUG] t-SNE is not deterministic even with random_state; gives odd results Oct 20, 2020

zbjornson mentioned this issue Oct 29, 2020

Fix artifacts in t-SNE results #3084

Merged

zbjornson closed this as completed Nov 6, 2020

cjnolet reopened this Nov 12, 2020

zbjornson changed the title ~~[BUG] t-SNE is not deterministic even with random_state; gives odd results~~ [BUG] t-SNE is not deterministic even with random_state Nov 15, 2020

dantegd mentioned this issue Aug 2, 2021

[TRACKER] Algorithm issues and tech debt #4139

Open

43 tasks

dantegd assigned lowener Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] t-SNE is not deterministic even with random_state #2980

[BUG] t-SNE is not deterministic even with random_state #2980

zbjornson commented Oct 14, 2020

viclafargue commented Oct 15, 2020

danielhanchen commented Oct 15, 2020

zbjornson commented Oct 16, 2020

danielhanchen commented Oct 17, 2020

zbjornson commented Oct 20, 2020

zbjornson commented Oct 20, 2020 •

edited

Loading

zbjornson commented Oct 24, 2020

cjnolet commented Oct 24, 2020 •

edited

Loading

zbjornson commented Oct 25, 2020

cjnolet commented Oct 28, 2020

zbjornson commented Nov 6, 2020

cjnolet commented Nov 12, 2020

cjnolet commented Nov 12, 2020

trivialfis commented May 13, 2021

cjnolet commented May 13, 2021

trivialfis commented May 15, 2021

[BUG] t-SNE is not deterministic even with random_state #2980

[BUG] t-SNE is not deterministic even with random_state #2980

Comments

zbjornson commented Oct 14, 2020

viclafargue commented Oct 15, 2020

danielhanchen commented Oct 15, 2020

zbjornson commented Oct 16, 2020

danielhanchen commented Oct 17, 2020

zbjornson commented Oct 20, 2020

zbjornson commented Oct 20, 2020 • edited Loading

zbjornson commented Oct 24, 2020

CannyLab has better spacing

FFT is more consistent than BH and resolves spreading. CannyLab is more consistent between runs. cuML-FFT has occasional (~1 in 15 runs) pinpointing or similar math errors (4th run).

CannyLab ends with 1.8x better grad norm and converges much faster

cjnolet commented Oct 24, 2020 • edited Loading

zbjornson commented Oct 25, 2020

cjnolet commented Oct 28, 2020

zbjornson commented Nov 6, 2020

cjnolet commented Nov 12, 2020

cjnolet commented Nov 12, 2020

trivialfis commented May 13, 2021

cjnolet commented May 13, 2021

trivialfis commented May 15, 2021

zbjornson commented Oct 20, 2020 •

edited

Loading

cjnolet commented Oct 24, 2020 •

edited

Loading