-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to force Quantities to a specific unit per dimension on creation of the Quantity? #1750
Comments
Hi there. Most (all?) conversions are cached within the registry after they are used the first time. Speeding up is always a goal. Can you benchmark your specific use and let us know where are the bottlenecks? And by the way, have you looked at this: https://pint.readthedocs.io/en/stable/advanced/performance.html |
@hgrecco I have done profiling, Pint (
I can't reveal the specifics of some stuff but the yellow (gold) is the time spent within click which is what kicks off our app, the purplish is the system that click is kicking off and green is the library that contains the calls to Pint with everything you can still read from below being Pint. We did take a look at the performance page and unfortunately we found most of the suggestions such as using the wraps decreased performance instead of improving it. the main thing we are using pint for is to ensure we have the appropriate units before performing some math and the math being performed seems to be fast enough that it is not showing up on the chart. |
Thanks for the benchmark info. I do think there is room for improvement, but can you provide a little more information. Are you using multiplicative units or non-multiplicative units? |
@hgrecco there is a mix of Quantities that are non-dimensional as well as non-temperature based units. It is a mix of ("A", "V", "kW", "second") |
so a minor update related to another issue I had opened (#1735), when I updated how we create all our Quantities to use the parser as you suggest along with setting The one big change I noticed is we are not longer spending as much time in the The stuff inside the red box is all pint with most of the stuff outside being numpy: |
Performance will be address in the next version. Would you like to help developing or improving the benchmark? |
I could definitely help out with that, we are using Pint in an application and so I am trying to weigh spending my time re-writing something similar that meets our smaller use case or seeing if I could assist. Do you have a timeframe for release and an initial assessment for how much improvement could be expected? Also have you seen https://codspeed.io/ this could be very useful for your effort and is free for open source. |
Sometime ago I added https://github.com/hgrecco/pint/tree/master/benchmarks this which uses airspeed velocity. I have used locally, but I would love to see it migrated to pytest-benchmark and used within codespeed.io or similar. My plan would be:
Regarding rewriting: my experience is that making a faster unit package is easy. What is hard is to make it feature rich. Some pint features are obviously dragging pint but most (all?) can be mitigated. For example
|
100% agree with you and we would in no way be looking for feature parity. My preference would be to help you out if at all possible instead of just making a one-off bespoke package that will eventually run into issues. Our use case but it ham-strings a lot of the functionality provided by pint:In most cases I think that what we are looking for is the ability to force a Quantity to an agreed upon base unit per dimension (e.g. km/s for speed) and then basically do our best to avoid the live creation of Quantities on the fly. We have the ability to front load the creation of all the Quantities (we do this already) it is just the repeated calling to convert it to all sorts of different units that I think is killing us. Basically we kind of want all the niceties that Pint provides but only at a point in time when users are pre-conditioned to be more patient (system load like a video game load screen). We also want the niceties on the backend so knowing what the unit is for every component and being sure of that along the way. I think what I am trying to get at and sorry for the rambling is something like a "performance-focused" mode where if I create a quantity at the start to say be in km/s but I then pass in parsec/decade it would throw and error saying do the conversion yourself I am meant for speed so I won't convert parsec/decade for you I (the quantity) basically just check is the unit right if so allow the operation if not toss it. Going to take one last stab basically I want to be able to tell Pint be performance optimized and I accept that I lose a lot of the niceties and easy conversions it provides. this way I can also run it in either mode so I can see where the exception for (do it yourself) is happening then debug that and check in on what pint is doing because if I am getting that I probably have a dimensional analysis problem in how I am approaching the calculation. I get that Pint will solve that problem for me but I don't think it can both provide enhanced performance as well as doing stuff like dimensional analysis on the fly because they are two competing goals. Sorry for restating the same thing a few times I just want to ensure I eventually get the point across. Is there a slack or discord channel for pint where a more collaborative discussion could be had? |
(We do not have a Discord channel, maybe we should? but not sure that it will be very much used. In any case, I am open for conversation via Discord either direct or in a channel.). Going back to your case, I think some of these things will be identified by the benchmark. A multiplicative unit conversion that has been reused could become:
that's all. Right now is more lengthy because the way the cache is implemented. But we can do better. If an online benchmark becomes available, I would very happy to test the ideas that I have in mind. |
@hgrecco I threw together a quick proof of concept any chance I could get a branch that I can push to something like -------------------------------------------------------------------------------------------- benchmark: 3 tests -------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_unit_repr 714.5500 (1.0) 2,855.6999 (1.0) 742.1944 (1.0) 98.4562 (1.0) 727.0501 (1.0) 4.5002 (1.0) 1714;3182 1,347.3560 (1.0) 66752 20
test_deepcopy 860.0000 (1.20) 22,741.9987 (7.96) 921.2502 (1.24) 264.7437 (2.69) 899.9996 (1.24) 19.9980 (4.44) 3734;5563 1,085.4814 (0.81) 184502 1
test_creation 1,148.9974 (1.61) 22,710.9995 (7.95) 1,221.0247 (1.65) 348.8560 (3.54) 1,180.0003 (1.62) 10.0044 (2.22) 1333;8264 818.9842 (0.61) 60824 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
|
@matroscoe just fork the repo, push it to your master branch, and make a pull request. We can discuss it over there. |
I have signed in for codspeed.io which supports benchmarking python code on github. |
awesome, I wasn't converting existing benchmarks as they were a bit difficult for me to follow I was adding it to some of the existing pytest functions that covered more commonly used functionality |
@hgrecco starting to get the first results just working on tweaking the unit testing some so it runs a much smaller subset so I can check results faster but down mid-way is the pytest-benchmark results |
I think it is better to write specific benchmarks instead of updating the current tests to also have a benchmark. This would make things much cleaner and allow running the benchmarks independently if you put them in a specific folder. Also, it would leave the current tests untouched which is a good thing. |
We have been approved for the CodSpeed beta |
@hgrecco started adding tests that are solely for benchmarking and try to reflect our usage of the library as much as possible (test_pybenchmark.py). I did notice a pretty large discrepancy between when Quantities are created either as a single string or with the value and unit separated: GitHub CI (blurred out are the previous tests I have yet to revert): |
That makes sense to me. Using a string involves parsing it, which is rather expensive operation. It involves tokenizing, building a tree and then assembling the quantity. It might be a way to speed it up. |
Please take a look also at the benchmarks in the benchmarks folder. There we time all sort of things such as
there are things missing like unit math and parsing (without the registry overhead) |
I am slowly working through our primary use cases but it sounds like there are parts of the system you are suspicious of. Are these the ones listed above? If not could you provide a list of the stuff you suspect could be improved so I can target those? I am not as worried about start up stuff because I can front load that in our application and users are expecting "loading" to take some time. For us it is run-time work that is our biggest concern. |
There are few parts that we know are slow (e.g. parsing a string expression) and it would be nice to make them faster. But I also think there are certain things (conversion factors lookups after first time use) that can be improved. I think having a complete benchmark would allow us to dissect such things. |
I have been trying to copy over the stuff that is already in benchmarks and I am having a very hard time making heads or tails of what is happening and I cannot find instructions on how to kick them off and use it. https://github.com/hgrecco/pint/tree/master/benchmarks/benchmarks |
those benchmarks were using airspeed velocity https://asv.readthedocs.io/en/stable/writing_benchmarks.html Briefly, each a function called |
@matroscoe please ping me when there are news. I would be happy to jump in into pint code once that the benchmark is more complete |
definitely will, just working on figuring out what I want to benchmark and what was being done in airspeed so I can replicate it and getting side tracked with my regular work. |
let me know if you need any guidance or help to make a list of usual suspects |
I pushed some benchmarks. The results are : https://codspeed.io/hgrecco/pint/benchmarks There are still things to polish, but is a good start. |
See #1820 |
Pint is super useful but it is proving to be too slow in our application of the tool and I was hoping I could do some up-front work to avoid a lot of the time being spent inside the library.
My idea was to see if it is possible to set a unit for a dimension e.g. for length define that everything should be done in meters. Then whenever I create a Quantity with a dimensionality of "length" it would auto convert whatever dimensionality down to the base representation I want (e.g.
Quantity(5, km)
->Quantity(5,000, m)
).This would allow me to avoid a lot of the conversion I see happening continually while still leveraging the ease of use the library provides. It also helps us move the more time consuming act of calculating all the conversions to when we load all the Quantity objects to the start of the application when users are more tolerant of slow operations (think loading screen in a video game). You tolerate it when you first start but every one you see past the initial one kind of pisses you off a little bit more each time you see it.
The text was updated successfully, but these errors were encountered: