-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep copy a VectorSchemaRoot? #465
Comments
You should use VectorLoader/VectorUnloader to "move" the contents of the reader's root into your own |
That seems exactly what the inner of while (reader.loadNextBatch()) {
final VectorSchemaRoot source = reader.getVectorSchemaRoot();
final VectorUnloader unloader = new VectorUnloader(source);
final VectorSchemaRoot copy = VectorSchemaRoot.create(source.getSchema(), allocator);
final VectorLoader loader = new VectorLoader(copy);
loader.load(unloader.getRecordBatch());
batches.add(copy);
} |
That is the intended usage. What is the problem? (Note that you can also just keep an array of the batches from the unloader, and load/stream them through a root as necessary.) |
OK thanks. Yes it seems a list of ArrowRecordBatch owns the buffer and doesn't need to tune with the lifecycle of allocator. |
Emmm .. No. The ArrowRecordBatch's buffer is still bound to the allocator, and it doesn't have the schema info where we need to store elsewhere. |
Yes, there isn't really any way of untying things from an allocator (this is intentional). There are APIs to transfer memory between allocators (or you can just keep a single allocator across different contexts). |
@lidavidm Thanks for your information! Is there some docs/cookbook for copy VectorSchemaRoot? It seems challenging to ensure the lifetime of both data and allocator are aligned and I suppose some demo code would help a lot. |
For example, when I wrote: while (reader.loadNextBatch()) {
final VectorSchemaRoot source = reader.getVectorSchemaRoot();
final VectorSchemaRoot copy = VectorSchemaRoot.create(source.getSchema(), allocator);
new VectorLoader(copy).load(new VectorUnloader(source).getRecordBatch());
batches.add(copy);
} It seems the intermediate ArrowRecordBatch should be closed but it's very easy to get it wrong and receive a runtime exception .. |
Unfortunately not. You should do something like try (var batch = unloader.getRecordBatch()) {
loader.load(batch);
} |
Describe the enhancement requested
I'm writing a convertor method to convert a base64 encoded byte array into Arrow batches and returns it to the user.
Since
ArrowStreamReader
replace the batch referred bygetVectorSchemaRoot
in each iteration, I have to do a deepcopy of VectorSchemaRoot every time.Currently, I use Table's method as a workaround, but wonder if
VectorSchemaRoot
deserves acopy
method, or I implement such a typically use case in a wrong way.The text was updated successfully, but these errors were encountered: