-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort allocate x2 more memory in master #47715
Comments
Ah interesting, can reproduce on M1 mac julia> using BenchmarkTools
julia> x = rand(10^8);
julia> @btime sort($x);
7.402 s (2 allocations: 762.94 MiB) 1.10.0-DEV.49:
While it allocates more, the new version is also quite a bit faster, so not entirely sure whether to tag with regression |
Ah, looks like the in-place version remains non-allocating on both versions (while again being quite a bit faster on nightly)
1.10.0-DEV.49:
|
We chose to prioritize runtime over allocations but if these allocations pose a problem for your use case then we may have to reconsider. Do those allocations pose a problem in your use case?
The benchmarks for the in-place version are misleading. The warmup sorts You can get more truthful benchmarks by ensuring that 1.8.3 julia> @btime sort!($x) setup=(rand!($x)) evals=1;
12.421 s (0 allocations: 0 bytes)
julia> @btime sort!(rand!($x));
15.917 s (0 allocations: 0 bytes) 1.10.0-DEV.27 julia> @btime sort!(rand!($x));
4.609 s (4 allocations: 762.95 MiB)
julia> @btime sort!($x) setup=(rand!($x)) evals=1;
4.213 s (4 allocations: 762.95 MiB) |
I can confirm this is a problem, though probably acceptable (master not as scalable, I can sort almost twice larger arrays on 1.8.3, and I can allocate 2.48x larger arrays on master than I can sort). What's showed is the cumulative memory allocated, but even with it larger, it might not have been, with at any point in time not that much allocated, potentially lower than on 1.8.3. So I tried to find out, how much memory is allocated max for sort and sort! the in-place version more likely to be used. I'm on Linux, with 32 GB RAM (the largest I could sort with master):
I though "Maximum resident set size" was most interesting, and something useful to track, not sure about the rest. Master really uses 95% more memory, and that can be a problem with huge sorts. I think master is faster all the way up to its limit, is it important to be able to go higher, which 1.8 can do, with sort in Base? You can always use some other algorithm (even the older one if you can find it in some package). The largest I could allocate without error (sometimes, seemingly right at the edge, and the edge likely moves over time), possibly a bit larger for master or I got lucky:
|
At best I can allocate only slightly more (i.e. 2% more succeeded) on 1.8.3 than on master (those are both amounts that sometimes succeed):
Note, RAM is a fixed amount, and I tried to not do anything else with the system, the random factor explaining why a certain amount sometimes fails isn't because of Julia (or rand, that just works on the memory after it's allocated and can't fail on its own). Sort however isn't deterministic. It uses randomized quicksort by now (and scratch space, why you get allocations, I believe you can pass it in if you want none), so since its non-deterministic, and thus its timing, I suppose too its allocations could be. Note, this is misleading:
no allocations, since then the array was already sorted, and sort is much quicker (not a fast-path per se, at least with the usual quicksort I know, just the behavior of quicksort on presorted data, while here it might be use a different code-path avoiding potential allocations, likely always guaranteed to not allocate with presorted), that's what happens with
I guess BenchmarkTools.jl intentionally ignores the first run, as compilation often triggers allocs and GC. Maybe an issue should be filed for it. Let it do compilation, and not count its allocations (likely already works that way), but if you then get 4 allocations and not any in next 43 samples, i.e. 4/ = 0.09 per sample, not show as "allocs estimate: 0", but possibly "allocs estimate: 0+"? |
Not mine. I was just looking into since I saw this issue, and I see you answered, but not before I posted, so it's at least partly redundant. It's going to be obvious that sort, even sort! allocates if you check the right way, with |
Seems like this is intended and worth the tradeoff then, but we can always revisit if it becomes blocking for anyone. There are also packages that may be useful, including (if you really want no allocations and are willing to pay the time performance cost):
|
For me it is an issue, because my system does not have a lot of memory. At least is it possible to keep the behaviour of 1.8.3 , i.e. via a flag or something? Some of you might want more speed some like me prefer a system that consume less memory especially when I do not have the lastest type of hardware some of you have.... |
What's the application where you are sorting arrays with a size of order more than half your RAM? Did you notice this by running out of RAM on with an application on master vs 1.8? |
I work in finance and i have an application that computes risk measures that I translated to julia from c++. The c++ version consumes much less memory and cpu than the julia version. When i tested with master i noticed a slow down versus 1.8.3 because it was using much more memory and "starving"... after investigation i noticed that the sort function was consuming twice much memory than before... |
@LilithHafner would it be worth adding a completely non-allocating version of |
I first noticed the memory consumption in a discussion with @LilithHafner here: |
It seems like a fully in-place version or variant of |
@gitboy16, thank you for reporting this issue. Could you please share a MWE that demonstrates the behavior you describe? I.e. a piece of code that runs faster on 1.8.3 than on master. |
Maybe we can keep the old QuickSort type and add a StableQuickSort instead which becomes the default? Then anyone who needs the really low allocation behavior can explicitly ask for QuickSort. I don't think the new sort can properly be called quicksort anyway, it seems to have different enough properties: stable, needs a buffer. |
@LilithHafner , can't provide the code and it will depends on your hardware but all you need is to run close to the memory limit of your machine. With 1.8.3 it stays slightly under the limit but with master it exceeds it and everything starts to slowdown... |
I'll try to fix this before the 1.9 release. In this case, we are using radix sort for 1.10 and the original QuickSort for 1.8. 1.8 also has partial quicksort which has been merged with 1.8's QuickSort into the single implementation of 1.10's stable and allocating PartialQuickSort (of which a special case is a stable and allocating QuickSort). Do folks think we also need to restore the separate slower and non-allocating 1.8 PartialQuickSort? |
If you're trying to simulate an environment with little memory and you're on a systemd based system, you can use this when launching julia to prevent it from using too much physical memory at the same time:
Quick demo: [sukera@tempman ~]$ systemd-run --user --scope -p MemoryMax=1G -p MemorySw
apMax=0 julia -q
Running scope as unit: run-r28284510d81b4f7ab451813b057f3500.scope
julia> Sys.free_memory() |> Int
930631680
julia> (2e10 |> Int) / 10^C
julia> Sys.free_memory() / 1024 / 1024 / 1024
0.8667106628417969
julia> rand(Int(2e9))^C
julia> 2e9 / 1024 / 1024 / 1024
1.862645149230957
julia> rand(Int(2e9))
Killed ## This is due to the OOM-killer - maybe julia should detect/prevent this itself? As long as GC is checking the free memory instead of the free physical memory, this should work (loading |
Nice! Is systemd equivalent to launchd on macos? |
Probably? I think the only thing they have in common is that they're both init systems for their respective OS. I'm not personally familiar with |
As a result of JuliaLang/julia#47715, I noticed this repository and thought: why not add a Rust comparison? For this addition to work, one would need to have `rustc` and `cargo` installed.
In master how can I use the same sorting algorithm as 1.8.3 to avoid allocating too much memory? |
|
Hi,
It seems the sort function allocates twice more memory in master vs 1.8.3:
In 1.8.3 I get 2 allocations 762 MiB vs 6 allocations 1.49GiB in master.
Is that considered a regression and will it be addressed?
I see that the time spent is improving from 10.2s to 6.2s.
Thank you
PS: I am on Windows and have asked the above question on discourse but did not get any feedback.
The text was updated successfully, but these errors were encountered: