Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unordered_map order dependency in _EncodeIntegers causes non-deterministic USDC generation. #830

Closed
jdwilder-google opened this issue Apr 25, 2019 · 8 comments

Comments

@jdwilder-google
Copy link

In _EncodeIntegers (integerCoding.cpp), it iterates over unordered_map to find the highest element. This causes non-deterministic output for USDC files depending on build environment because the order of an unordered_map is unspecified.

I was able to fix this locally by tracking the highest element as they are counted instead of iterating over the unordered_map.

The original code:

for (Int const *cur = begin, *end = begin + numInts;
     cur != end; ++cur) {
    SInt val = _Signed(*cur) - prevVal;
    ++counts[val];
    prevVal = _Signed(*cur);
}
SInt commonValue=0;
{
    size_t commonCount=0;
    for (auto const &p: counts) {
        if (p.second > commonCount) {
            commonCount = p.second;
            commonValue = p.first;
        }
    }
}

I replaced with:

SInt commonValue = 0;
size_t commonCount = 0;
for (Int const *cur = begin, *end = begin + numInts;
     cur != end; ++cur) {
    SInt val = _Signed(*cur) - prevVal;
    const size_t count = ++counts[val];
    if (count > commonCount) {
        commonValue = val;
        commonCount = count;
    }
    prevVal = _Signed(*cur);
}
@gitamohr
Copy link
Contributor

This seems a fine change to me, but just to make sure we're on the same page here, this doesn't affect correctness, it simply removes one source of nondeterminism in usdc output, right? The result is that we break ties in the most common value by taking the "first".

For what it's worth I believe there are other sources of nondeterminism in usdc output, and we may not be willing to remove all of them, so I'd advise not relying on deterministic usdc output if possible. Is there a particular reason it's important to you?

@jdwilder-google
Copy link
Author

Yes, that's all it does. Previously in the case of a tie it would choose whatever the implementation of unordered_map happened to order first, but with this change it chooses the first in the input.

This is useful for our unit tests which compare built files vs known-good golden files. The library is otherwise deterministic insofar as how we're using it, but a recent change to our standard libraries broke this by (intentionally) changing hash table ordering to find order dependencies.

@gitamohr
Copy link
Contributor

Cool -- just to let you know I believe there are other sources of nondeterminism in usdc output so you may be bitten by this again. One option might be to use usddiff to compare your files rather than or in addition to looking at the literal bytes.

@jdwilder-google
Copy link
Author

Good to know. I'll investigate using usddiff (or at least add a TODO for the next time it breaks).

Thanks!

@spitzak
Copy link

spitzak commented Apr 25, 2019 via email

@jdwilder-google
Copy link
Author

I went with the simplest solution I could think of, but anything that's deterministic is fine with me.

@gitamohr
Copy link
Contributor

Thanks Bill -- that's a good idea. Using the largest value is the best thing to do in case of a tie. I'll make that change.

@jilliene
Copy link

Filed as internal issue #USD-5230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants