Avoid non-API calls when truncating overallocated vectors #362
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #355
Closes #358
Description
In cpp11 we used 3 non-API calls:
SETLENGTH()
SET_TRUELENGTH()
SET_GROWABLE_BIT()
The combination of these 3 allowed us to implement efficient growable
r_vector
s. See:cpp11/inst/include/cpp11/r_vector.hpp
Lines 898 to 926 in aeb67b3
We absolutely cannot use these functions anymore. However, growable vectors are still pretty nice. The only viable alternative for cpp11 (as a header only package) seems to be to use
Rf_xlengthgets()
instead, so that is what this PR does.Using
Rf_xlengthgets()
means we can still use growable vectors as normal, but when we need to return aSEXP
back to R, if:writable::
vectorpush_back()
on it (thereby "growing" it and overallocating for it)Then we are going to have to pay the allocation and copy cost of truncating that vector down to its
length_
.The implicit conversion to
SEXP
is a pretty "hot" code path, so I've added some benchmarks below. In particular this operator is called:as_sexp()
is called on anr_vector
. When we return acpp11::writable::doubles
to the R side, the auto generated code incpp11.cpp
calls this.r_vector
.The benchmarks definitely show a noticeable performance penalty. But remember that this is only for writable vectors where
push_back()
has been used. In practice, for many (but not all) algorithms that I've written in C / C++, you can determine the required vector size up front, so you don't ever needpush_back()
. It might be that we try and recommend that people write their code with pre-allocated vectors if they are having issues. Note that we are still much much more efficient than repeatedpush_back()
calls in Rcpp though. Using growable vectors and then paying a 1 time truncation cost at the end is typically way more performant than reallocation on every push.Implementation notes and gotchas
In theory we could just switch to the code path covered by
R_VERSION < R_Version(3, 4, 0)
, i.e. just callRf_xlengthgets()
, but I think there are actually a few bugs and safety issues with this approach that people just haven't seen because everyone is on a newer version of R.Issue 1 - Protection
Using
SETLENGTH()
and friends is nice because there is no new allocation, meaning no new protection is required. That is not the case withRf_xlengthgets()
. If we blindly switch tothen no one is protecting
data_
and we have a serious protection issue. In theory this was happening on R < 3.4.0. If someone calls theSEXP
conversion operator after apush_back()
and then tries to continue to use their cpp11 writable vector, it will likely get gc'd out from under them.My solution is to switch to
resize()
, which does thepreserved.insert()
/preserved.release()
dance that is required to protect the newly allocated truncated data.Issue 2 - Internal state updates
In current cpp11, I think there is another bug with the current approach. After truncation, the following internal variables need updates and are not currently being updated:
capacity_
is_altrep_
data_p_
protect_
length_
stays the same anddata_
is updated)This seems pretty bad? If you request the
SEXP
and that truncates and allocates on old R versions, and then you try and use the originalwritable::doubles
object, then I think you can get into a pretty bad state if you try and pull elements out of it or push to it because these internal variables aren't right.My solution is the same as above, to switch to
resize()
. In addition to performing protection, it also updates these internal variables (with the exception ofis_altrep_
, but we can fix that in a follow up PR).Note 3 - Names handling
If we unconditionally use
Rf_xlengthgets()
, then names attribute truncation is automatically handled for us, so that's nice. I've removed the code that was trying to specially truncate names after truncating the original vector, because it is not required now. There is a list test that tests for this specifically.Note 4 - On updating
data_
in the conversion operatorThe current behavior of the implicit conversion to
SEXP
operator is to actually modify the original object even though it is labeled asconst
. This is really cheating a bit, and feels kind of wrong. It is the reason for theconst_cast<>
call there to throw away constness.We assume that the idea is that the
SEXP
returned to the caller should correspond to the internaldata_
stored by thewritable::
vector (at least until the nextpush_back()
when a reallocation could changedata_
). People are probably relying on this fact (it seems like it would be easy to do), so we probably can't change this at this point. It seems like it enables behavior like this:The implicit conversion operator is called, returning a
SEXP
, thatSEXP
is immediately modified, but the modification reflects inx
too. With the current API of cpp11, I imagine a lot of people are doing this.Note that this would pay the truncation cost if it was something like
This is certainly a tricky part of cpp11
Benchmarks
First, lets look at the
motivations.Rmd
benchmark forpush_back()
. This is the most relevant benchmark here, because it callspush_back()
many times and then has to convert toSEXP
to return the result to R. A pretty typical use case.It was this benchmark, running the
grow_()
andrcpp_grow_()
functions in cpp11test:When comparing
Main
toThis PR
, we see that the 10,000,000push_back()
case goes from 49.02ms -> 62.99ms with memory allocations of 256MB -> 332.3MB. This is the new extra 1 time truncation we pay at the end after all the pushes. I personally think this is okay.The absolute "worst case scenario" that I can think of is that a user allocates a vector of size
size
, then pushes exactly 1 item to the back of it (doubling the capacity), and then returns theSEXP
from that (truncating it and throwing away the extra capacity).I came up with a benchmark for that:
This shows that Rcpp is faster in this specific case because we only do 1 push. If we do repeated pushes (like above) we see that change quickly.
In the
Main
vsThis PR
cases for cpp11, both cases double the capacity after that onepush_back()
, but in this PR we also have to pay the truncation allocation, making it slower. If this is the worst case scenario then I think I'm okay with it.Same benchmark as above, but with a very large vector (10,000,000 elements). Here you really see the memory allocation differences between the two cpp11 approaches.