Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache candidates seperate from their sorting #251

Merged
merged 7 commits into from
Jul 7, 2023

Conversation

baszalmstra
Copy link
Collaborator

@baszalmstra baszalmstra commented Jul 6, 2023

Improve performance of the solver by caching the candidates separately from the sorted candidates. The candidates were also computed to compare different solvables.

With these improvements libsolv-rs beats the C impl in every case. (on my machine..)

libsolv libsolv-rs
python=3.9 13.267ms 8.60ms
xtensor, xsimd 9.7439ms 4.15ms
tensorflow 1.3159s 0.753s
quetz 3.09s 2.38s
tensorboard=2.1.1, grpc-cpp=1.39.1 1.5399s 0.241s

@wolfv
Copy link
Contributor

wolfv commented Jul 6, 2023

libsolv libsolv-rs
python=3.9 6.9860ms 3.7800ms
xtensor, xsimd 5.5767ms 4.29ms
tensorflow 635.80ms 621.56ms
quetz 1.2691s 1.6378s
tensorboard=2.1.1, grpc-cpp=1.39.1 468.90ms 216.13ms

@ruben-arts
Copy link
Collaborator

ruben-arts commented Jul 6, 2023

req libsolv libsolv-rs
python=3.9 8.9680 ms 4.1932 ms
xtensor, xsimd 7.0295 ms 2.3236 9ms
tensorflow 931.89 ms 529.56 ms
quetz 1.8615 s 1.6781 s
tensorboard=2.1.1, grpc-cpp=1.39.1 698.74 ms 202.47 ms

@wolfv
Copy link
Contributor

wolfv commented Jul 7, 2023

Can you explain why AHashMap is faster? I looked at the crate briefly but it wasn't obvious to me why there would be a performance improvement.

Everything else looks great!

@wolfv
Copy link
Contributor

wolfv commented Jul 7, 2023

One other thing I noticed (and maybe that's related) is the order of the packages printed to-be-installed became non-deterministic.

I am not sure that it's what we want. We might even want the opposite and order the matchspecs initially to get results independent of spec-ordering.

@baszalmstra
Copy link
Collaborator Author

Can you explain why AHashMap is faster? I looked at the crate briefly but it wasn't obvious to me why there would be a performance improvement.

AHashMap should be faster for almost all cases but for me the results have been not been enormous. See https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#fxhash why we should use ahash.

One other thing I noticed (and maybe that's related) is the order of the packages printed to-be-installed became non-deterministic.

I think it should be left up to the user to determine how they want to sort the packages. Given that it is now (especially with your changes) relatively easy to generate a topological sort from the result I say we leave this as is.

I am not sure that it's what we want. We might even want the opposite and order the matchspecs initially to get results independent of spec-ordering.

Yeah, I also find it not very user friendly that a b might resolve differently than b a. However, Im not sure if this is something we should implement in the solver or whatever thats a useful artifact of the solver. Everyone using rattler is of course free to sort the input specs if they want.

@wolfv
Copy link
Contributor

wolfv commented Jul 7, 2023

The thing is that the order of evaluation of the packages influences the result (e.g. when you ask for xtensor xsimd you get something else from xsimd xtensor.

If I understood the AHashMap correctly, then it uses a source of randomness for the hashing. I guess I am wondering if that source of randomness could alter the resolutions.

@baszalmstra
Copy link
Collaborator Author

If I understood the AHashMap correctly, then it uses a source of randomness for the hashing. I guess I am wondering if that source of randomness could alter the resolutions.

As far as I know, the only place we use this HashMap is to lookup keys to values. I don't think we are iterating over the hashmaps anywhere which could indeed cause that behavior.

Do you see different results on every run?

The thing is that the order of evaluation of the packages influences the result (e.g. when you ask for xtensor xsimd you get something else from xsimd xtensor.

That was already the case right? Or does this PR change that?

@wolfv
Copy link
Contributor

wolfv commented Jul 7, 2023

The order of prints is already different every time in main. So let's forget about it for now.

@wolfv
Copy link
Contributor

wolfv commented Jul 7, 2023

I merge after the tests are done!

@baszalmstra baszalmstra merged commit 23df2a4 into conda:main Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants