Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

DESIGN: Cheaper DataFrame.append #53

Open
wesm opened this issue Oct 18, 2016 · 3 comments
Open

DESIGN: Cheaper DataFrame.append #53

wesm opened this issue Oct 18, 2016 · 3 comments

Comments

@wesm
Copy link
Owner

wesm commented Oct 18, 2016

I'm thinking we can come up with a plan to yield a better .append implementation that defers stitching together arrays until it's actually needed for computations.

We can do this by having a virtual pandas::Table interface that will consolidate fragmented columns only when they are requested. Will think some more about this

@shoyer
Copy link

shoyer commented Oct 18, 2016

The obvious alternative is to allow pandas objects to backed by dynamic arrays. This is possible now that we require arrays to 1D and contiguous.

This has the advantage of still using eager evaluation, so you don't need to build machinery for differed evaluation. Also, you still get predictable performance, even if you inspect the array in between appends. I would guess looking at DataFrames being appended piece-by-piece is pretty common, even if only to check the size.

The downside is that this wouldn't really work with the current interface, because such appends need to in-place. Also, dynamic arrays reduce speed and increase memory requirements by small constant multiples.

Maybe it would make sense to deprecate DataFrame.append and instead make an alternative DynamicDataFrame (sub?)class that does an in-place append?

@wesm
Copy link
Owner Author

wesm commented Oct 20, 2016

We could definitely have a mutating append and write into resizeable buffers (with growth factor 1.5 or 2). Something we can experiment with

@jreback
Copy link

jreback commented Jan 26, 2017

related this this, I think enlargement via an indexer, a bit too magical / not-transparanet / non-performant, unless you have a growable buffer.

Better to be explicit at a small loss of convenience in syntax.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants