Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect I32Key Index Ordering #489

Closed
Ruborcalor opened this issue Oct 13, 2021 · 5 comments · Fixed by #582
Closed

Incorrect I32Key Index Ordering #489

Ruborcalor opened this issue Oct 13, 2021 · 5 comments · Fixed by #582
Assignees
Milestone

Comments

@Ruborcalor
Copy link

When using an I32Key as the primary key for an index, the ordering is done incorrectly. Instead of sorting signed integers as you would expect them to be sorted, all negative numbers are considered to be "greater" than all positive numbers.

IntKey new implementation:

impl<T: Endian> IntKey<T> {
    pub fn new(val: T) -> Self {
        IntKey {
            wrapped: val.to_be_bytes().into(),
            data: PhantomData,
        }
    }
}

The new implementation converts the i32 number to big endian bytes which doesn't preserve the sorting order.

My workaround was to restrict my range query with a max of the most negative number:

    let zero_winnings_key = leaderboard()
        .idx
        .winnings
        .index_key((I32Key::from(0), b"".to_vec()));
    let most_negative_key = leaderboard()
        .idx
        .winnings
        .index_key((I32Key::from(-2147483648), b"".to_vec()));

    let res = leaderboard()
        .idx
        .winnings
        .range(
            deps.storage,
            Some(Bound::inclusive(zero_winnings_key)),
            Some(Bound::exclusive(most_negative_key)),
            Order::Descending,
        )
        .take(limit)
        .collect::<StdResult<Vec<_>>>()?;

This fits my use case because I don't need the negative values when querying, but is quite counterintuitive and took me a long time to figure out what was going on.

@maurolacy
Copy link
Contributor

maurolacy commented Oct 13, 2021

Thanks for the issue.

Instead of Some(Bound::exclusive(most_negative_key)), I suggest Some(Bound::inclusive(most_positive_key)) for clarity.

most_positive_key is then built from (I32Key::new(i32::MAX), "").

The problem is that we're using the internal representation to define the order, and the internal representation of negative numbers is based on two's complement, in which the most significant bit is used as sign bit. So that, lexicographically, negative numbers come after positive numbers.

We can fix this by switching to another representation. Need to think about it.

@ethanfrey
Copy link
Member

The problem is that we're using the internal representation to define the order, and the internal representation of negative numbers is based on two's complement, in which the most significant bit is used as sign bit. So that, lexicographically, negative numbers come after positive numbers.

There is no way to easily scan besides byte order. I think a new byte representation for signed ints could be interesting.

I believe if we toggle the first bit on big endian, this will actually provide the ordering we want.

let mut be = value.to_be_bytes();
be[0] = be[0] ^ 0xf0;

Is that correct?

If so, we can use IntKey and UintKey as two types with different serialization/parsing logic and everyone will be happy.

@ethanfrey ethanfrey added this to the v0.11.0 milestone Oct 14, 2021
@maurolacy
Copy link
Contributor

maurolacy commented Oct 14, 2021

The problem is that we're using the internal representation to define the order, and the internal representation of negative numbers is based on two's complement, in which the most significant bit is used as sign bit. So that, lexicographically, negative numbers come after positive numbers.

There is no way to easily scan besides byte order. I think a new byte representation for signed ints could be interesting.

I believe if we toggle the first bit on big endian, this will actually provide the ordering we want.

let mut be = value.to_be_bytes();
be[0] = be[0] ^ 0xf0;

Is that correct?

Almost. It's

be[0] ^= 0x80;

But yes, this is a good idea, and will in fact solve the ordering issue(!).

When we deserialize, we can easily fix / revert this, and restore the original value. So, this will require a deserializing range to work properly. If not, raw negative and positive ints will be exchanged, and you would be able to access them because they leak from the raw range.

So, we either deprecate raw range entirely (which doesn't sound like a bad idea). Or, we can provide our own, sanctioned version of from_be_bytes() or similar, which includes the msb flip.

If so, we can use IntKey and UintKey as two types with different serialization/parsing logic and everyone will be happy.

@ethanfrey
Copy link
Member

So, we either deprecate raw range entirely (which doesn't sound like a bad idea).

I am fine deprecating in 0.11 and removing in 0.12. But I would like that to be independent of this issue.

Or, we can provide our own, sanctioned version of from_be_bytes() or similar, which includes the msb flip.

Hmmm... we do not promise what our encoding is. But we should provide some sanctioned function to get the eg. u32 from IntKey. No need to call it from_be_bytes()

@maurolacy
Copy link
Contributor

No need to call it from_be_bytes()

Right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants