DESIGN: Cheaper DataFrame.append #53

wesm · 2016-10-18T17:48:45Z

I'm thinking we can come up with a plan to yield a better .append implementation that defers stitching together arrays until it's actually needed for computations.

We can do this by having a virtual pandas::Table interface that will consolidate fragmented columns only when they are requested. Will think some more about this

The text was updated successfully, but these errors were encountered:

shoyer · 2016-10-18T19:49:19Z

The obvious alternative is to allow pandas objects to backed by dynamic arrays. This is possible now that we require arrays to 1D and contiguous.

This has the advantage of still using eager evaluation, so you don't need to build machinery for differed evaluation. Also, you still get predictable performance, even if you inspect the array in between appends. I would guess looking at DataFrames being appended piece-by-piece is pretty common, even if only to check the size.

The downside is that this wouldn't really work with the current interface, because such appends need to in-place. Also, dynamic arrays reduce speed and increase memory requirements by small constant multiples.

Maybe it would make sense to deprecate DataFrame.append and instead make an alternative DynamicDataFrame (sub?)class that does an in-place append?

wesm · 2016-10-20T21:52:20Z

We could definitely have a mutating append and write into resizeable buffers (with growth factor 1.5 or 2). Something we can experiment with

jreback · 2017-01-26T00:35:42Z

related this this, I think enlargement via an indexer, a bit too magical / not-transparanet / non-performant, unless you have a growable buffer.

Better to be explicit at a small loss of convenience in syntax.

jreback mentioned this issue Mar 20, 2017

Proposal to change behaviour with .loc and missing keys pandas-dev/pandas#15747

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DESIGN: Cheaper DataFrame.append #53

DESIGN: Cheaper DataFrame.append #53

wesm commented Oct 18, 2016

shoyer commented Oct 18, 2016

wesm commented Oct 20, 2016

jreback commented Jan 26, 2017

DESIGN: Cheaper DataFrame.append #53

DESIGN: Cheaper DataFrame.append #53

Comments

wesm commented Oct 18, 2016

shoyer commented Oct 18, 2016

wesm commented Oct 20, 2016

jreback commented Jan 26, 2017