Deprecate Series / DataFrame.append #35407

TomAugspurger · 2020-07-24T18:35:02Z

I think that we should deprecate Series.append and DataFrame.append. They're making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result.

These are also apparently popular methods. DataFrame.append is around the 10th most visited page in our API docs.

Unless I'm mistaken, users are always better off building up a list of values and passing them to the constructor, or building up a list of NDFrames followed by a single concat.

The text was updated successfully, but these errors were encountered:

jreback · 2020-07-24T18:43:54Z

+1 from me (though i will usually be plus on deprecating things generally)

yeah free here these are a foot gun

erfannariman · 2020-07-24T23:11:29Z

+1, it's better to have one method, which is pandas.concat, also it's more flexible with the list of dataframes and the option to concat over axis 0 / axis 1.

shoyer · 2020-07-25T00:24:30Z

Strong +1 from me!

Just look at all the (bad) answers to this StackOverflow question:
https://stackoverflow.com/questions/10715965/add-one-row-to-pandas-dataframe

jreback · 2020-07-25T00:43:16Z

we should also deprecate expansion indexing as well (which is an implicit append)

AlexKirko · 2020-07-27T10:16:12Z

+1 from me
There is really no reason to have this when we have concat available. Especially, because IIRC append works by calling concat and I don't think append abstracts away enough to keep it.

achapkowski · 2020-08-03T09:27:52Z

How do you expand a dataframe by a single row without having to create a whole dataframe then?

TomAugspurger · 2020-09-25T11:44:31Z

I'd recommend thinking about why you need to expand by a single row. Can those updates be batched before adding to the DataFrame?

If you know the label you want to set it at, then you can use .loc[key] = ... to expand the index without having to create an intermediate. Otherwise you'll need to create a DataFrame and use concat.

darindillon · 2020-09-27T01:22:36Z

Disagree. Appending a single row is useful functionality and very common. Yes, we understand its inefficient; but as TomAugspurger himself said, this is the 10th most commonly referenced page on the help, so clearly lots of people have this use case of adding a single row to the end. We can tell ourselves we're removing the method to "encourage good design" but people still want this functionality, so they'll just use the workaround of creating a new DataFrame with a single row and concat'ing, but that just requires the user to write even more code to still get the exact same performance hit, so how have we made anyone's life better?

taylor-schneider · 2020-10-07T00:39:58Z

Not being able to add rows to a data structure makes no sense. It's one thing to not add the inplace argument but to deprecate the feature is nutts.

achapkowski · 2020-10-07T10:11:36Z

@TomAugspurger using df.loc[] requires me to know the length of the dataframe. and create code like this:

df.iloc[len(df) + 1] = <new row>

This feel like overly complex syntax for an API that makes data operations simple. Internally df.append or series.append could just do what is shown above, but don't dirty up the user interface.

Why not take a page from lists, the append method is quick because it has pre-allocates slots in advanced. Modify the internals post DataFrame/Series creation to have 1000 empty hidden rows slotted and ready to have new information. If/When the slots are filled, then the DF/Series would expand it outside the view of the user.

TomAugspurger · 2020-10-07T11:04:38Z

loc requires you to know the label you want to insert it at, not the length.

Why not take a page from lists, the append method is quick because it has pre-allocates slots in advanced.

You could perhaps suggest that to NumPy. I don't think it would work in practice given the NumPy data model.

achapkowski · 2020-10-07T11:22:35Z

Is Numpy deprecating the append method? If not, why deprecate it here?

Numpy doc: https://numpy.org/doc/stable/reference/generated/numpy.append.html

MarcoGorelli · 2021-08-24T18:03:44Z

Shall we make this happen and get a deprecation warning in for 1.4 so these can be removed in 2.0? If there's no objections, I'll make a PR later (or anyone following along can, that's probably the fastest way to move the conversation forward)

achapkowski · 2021-08-24T18:08:05Z

@MarcoGorelli my question still stands, why is this being done?

darindillon · 2021-08-24T18:54:28Z

Yes, why are we doing this? It seems like we're removing a VERY popular feature (the 10th most visited help page according to the OP) just because that feature is slow. But if we remove the feature, people will still want this functionality so they'll just end up implementing it manually anyway, so how are we improving anything by removing this?

jreback · 2021-08-24T20:41:28Z

there is a ton of discussion pls read in full

this has long been planned as inplace operations make the code base inordinately complex and offer very little benefit

achapkowski · 2021-08-25T09:19:56Z

@jreback I don't see tons of discussion in this issue, please point me to the discussion that I might be better informed. From what I see is a community asking you not to do this.

MarcoGorelli · 2021-08-25T09:52:07Z

There's a long discussion here on deprecating inplace: #16529

But if we remove the feature, people will still want this functionality so they'll just end up implementing it manually anyway, so how are we improving anything by removing this?

I'd argue that this is still an improvement, because then it would be clearer to users that this is a slow feature - with the status quo, people are likely to think it's analogous to list.append

What's your use-case for append? What does it do that you can't do without 1-2 lines of code which call concat? If you want to make a case for keeping it, please show a minimal example of where having append is a significant improvement

neinkeinkaffee · 2021-11-19T12:01:59Z

take

jreback · 2022-04-06T23:55:36Z

@behrenhoff

By the way: how does concat improve this code:

total_df = pd.DataFrame()
for file in glob("*.csv"):
print(f"reading {file}")
df = pd.read_csv(file)
total_df = > total_df.append(df).drop_duplicates()

Yes, it is easy to replace:

total_df = pd.DataFrame()
for file in glob("*.csv"):
print(f"reading {file}")
df = pd.read_csv(file)
total_df = pd.concat([total_df,

this is exactly the reason append is super problematic
we have an entire doc note that i guess no one reads that explain as you are doing an exponential copy here (no kidding u run out ram)

so you have proved the point why append is a terrible idea - it's not about readability but easy to fall into traps that are non obvious at first glance

wumpus · 2022-04-07T05:42:18Z

If only there was a well-known algorithm which was not an exponential copy.

behrenhoff · 2022-04-07T06:11:18Z

this is exactly the reason append is super problematic
we have an entire doc note that i guess no one reads that explain as you are doing an exponential copy here (no kidding u run out ram)

You did not read or not understand what I was saying. The version with append is the one that WORKS, the one with concat at the end runs into memory issues (because there is the small drop_duplicates in the loop that fixes the problem and cannot be moved out).

And yes, you can be smarter, for example ((file1 + file2).drop_dups + (file3 + file4).drop_dups).drop_dups or similar - where + can be concat or append - doesn't matter. I was just proving the point that the suggested way "collect all DFs in a list and concat them all at the end" does not always work.

MarcoGorelli · 2022-04-07T08:48:18Z

Thanks @behrenhoff , that's a nice example - though can't you still batch the concats? Say, read 10 files at a time, concat them, drop duplicates, repeat...

This seems like a perfect summary of the issue anyway:

it's not about readability but easy to fall into traps that are non obvious at first glance

At some point we should lock the issue, this is taking a lot of attention away from a lot of people, there's been off-topic comments, no compelling use-case for keeping DataFrame.append, and strong agreement among pandas devs (especially those who have been around the longest)

behrenhoff · 2022-04-07T09:27:09Z

Say, read 10 files at a time, concat them, drop duplicates, repeat...

Yes, that would work. So would 1 million other solutions. In practice, I could even exploit more about the date ordering inside of the files (all files here have a rather long overlapping history, but newer files can overwrite (fix) data in older files, so it is of course a drop_dups with a subset and keep=last). My point is: this is a non-issue because the operation is done once per 6 month or so, the daily operation just adds exactly one file. No point in optimizing this further as long as it works. That is the whole point I was trying to make. You force people to optimize / change code where old code just works and where there is no need to modify it. And the real gains in this example are not in append vs concat but in exploiting knowledge of the input files and reading them in different order or in groups.

Note that I am not saying this is a usecase that can only be done with append. I am saying it that removing a common feature is unnecessary work imposed on many people and that you don't get performance gains for free by only replacing append with concat (you need to do more).

Anyway, end of discussion for me. I already did the work and got rid of all my appends.

I just fear that many people will not upgrade if their code breaks. You are also making it harder for new users. append is a good and common English word, concat is not, at least I can't find it in a dictionary (there is concatenate but it is a word that a lot fewer people know - this might not be a problem for native English speakers though). I would always search for "append", not for "concat" if I didn't knew the proper function name.

PolarNick239 · 2022-04-08T13:35:59Z

Hi, minimal reproducer that was totally broken:

Before:

a = pd.DataFrame({"A": 1, "B": 2}, index=[0])
b = pd.DataFrame({"A": 3}, index=[0])
for rowIndex, row in b.iterrows():
    print(a.append(row))
# Output:
#    A    B
#0  1  2.0
#0  3  NaN

After:

a = pd.DataFrame({"A": 1, "B": 2}, index=[0])
b = pd.DataFrame({"A": 3}, index=[0])
for rowIndex, row in b.iterrows():
    print(pd.concat([a, row]))
# Output:
#     A    B    0
#0  1.0  2.0  NaN
#A  NaN  NaN  3.0

Also, please, note that if you add deprecation warning in such popular method that is used widely and calls many times per second - this message will be spammed a lot leading to much bigger overhead than you have with allocations and memory copying. So it is beneficial to print such message only on first call.

phofl · 2022-04-08T13:37:40Z

What are you trying to do? It would be way more efficient to call

pd.concat([a, b], ignore_index=True)

Edit: Or was it on purpose to put A into the Index instead as a column?

PolarNick239 · 2022-04-08T13:40:47Z

I know, this is just an illustration. I was iterating over rows and if row is OK - adding it to another table. I believe that there are much better way via masking and concatenation with taking such masks into account, but I wanted to have code as simple as possible.

phofl · 2022-04-08T13:42:01Z

Thanks for your response. It is important for us to see usecases that can not be done more efficiently in another way. You are right, checking data can be done way more efficiently via masking and the concatenating the result.

PolarNick239 · 2022-04-08T14:00:05Z

How can I concat such row to another table a (with superset of row's column names) in such case?

MarcoGorelli · 2022-04-08T14:02:18Z

with

pd.concat([a, row.to_frame().T], ignore_index=True)

phofl · 2022-04-08T14:03:30Z

You can simply do:

a = pd.DataFrame({"A": 1, "B": 2}, index=[0])
b = pd.DataFrame({"A": [3, 4]})

result = pd.concat([a, b.loc[b["A"] > 3]], ignore_index=True)

Just change the greater 3 to a condition that suits your needs. This avoids the iterating over the rows step. If you have to iterate for some reason, you can use the example from @MarcoGorelli

PolarNick239 · 2022-04-08T14:06:02Z

Not all conditions and not every logic can be readable with such single-line expression.

For people who like me want to just get rid of warnings:

import pandas as pd
def pandas_append(df, row, ignore_index=False):
    if isinstance(row, pd.DataFrame):
        result = pd.concat([df, row], ignore_index=ignore_index)
    elif isinstance(row, pd.core.series.Series):
        result = pd.concat([df, row.to_frame().T], ignore_index=ignore_index)
    elif isinstance(row, dict):
        result = pd.concat([df, pd.DataFrame(row, index=[0], columns=df.columns)])
    else:
        raise RuntimeError("pandas_append: unsupported row type - {}".format(type(row)))
    return result

…thod is deprecated and will be removed from pandas in a future version. Use pandas.concat instead', see pandas-dev/pandas#35407

…d is deprecated and will be removed from pandas in a future version. Use pandas.concat instead', see pandas-dev/pandas#35407

wstomv · 2022-04-21T16:59:04Z

Here is a use case for Data.Frame.append, that I think makes sense and for which it took me way too long to figure out how to replace it with pandas.concat. (Do note that I am not a seasoned pandas user.)

I have a data frame with numeric values, such as

df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

and I append a single row with all the column sums

totals = df.sum()
totals.name = 'totals'
df_append = df.append(totals)

Simple enough.
Here are the values of df, totals, and df_append

>>> df
   A  B
0  1  2
1  3  4

>>> totals
A    4
B    6
Name: totals, dtype: int64

>>> df_append
        A  B
0       1  2
1       3  4
totals  4  6

Now, using pd.concat naively:

df_concat_bad = pd.concat([df, totals])

which produces

>>> df_concat_bad
     A    B    0
0  1.0  2.0  NaN
1  3.0  4.0  NaN
A  NaN  NaN  4.0
B  NaN  NaN  6.0

Apparently, with df.append the Series object got interpreted as a row, but with pd.concat it got interpreted as a column.
You cannot fix this with something like axis=1, because that would add the totals as column.

Fortunately, in a comment above, the implementation of DataFrame.append is quoted, and from this one can glean the solution:

df_concat_good = pd.concat([df, totals.to_frame().T])

which yields the desired

>>> df_concat_good
        A  B
0       1  2
1       3  4
totals  4  6

I think users need to be aware of such subtleties. I also posted this on StackOverflow.

MarcoGorelli · 2022-04-21T18:39:15Z

This was brought up in #35407 (comment) , and some other comments in this thread, and would/should be part of the transition docs (see #46825)

javiertognarelli · 2022-09-01T19:48:43Z

Worst idea I've seen, why complicate something so easy, I think it's better to have more options/ways to do something than just one strict way. Dataframe.append() was very easy for noobies to add data to a dataframe

etale-cohomology · 2022-10-17T21:07:23Z

"[...] around the 10th most visited page in our API docs" and they go ahead and deprecate it.

adding series to a pandas dataframe creates a performance warning append to dataframe is deprecated, use concat instead pandas-dev/pandas#35407

mcclaassen · 2023-03-06T23:49:39Z

This seems to be decided but, in the future, I would argue against doing these sort of things to improve user's code (and requesting proof why they can't use pd.concat when they disagree). If it improves maintainability, or makes things easier for devs, go for it. But if something is popular and not "correct", let people do what they want to do. The only valid point I've seen here is for removing the 'inplace' argument, everything else resembles nannying.

MarcoGorelli · 2023-03-07T10:38:39Z

Thanks all for your comments

This is becoming draining - some comments are off-topic, no new arguments are being presented, and some are not particularly respectful.

Locking for now then - if anyone has any new arguments and wants to make them in a respectful manner, no objections to opening a new issue

It's understandable that some people are unhappy with this decision and have to rewrite some code, but for newbies, getting them to write their code in a better way to begin with will be better for them in the long-run.

If the docs on how to use concat are unclear, pull requests are welcome

TomAugspurger added Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels Jul 24, 2020

rhshadrach mentioned this issue Aug 1, 2020

DOC: Data Editing Samples/Guide #35378

Open

mroeschke mentioned this issue Aug 13, 2020

ENH: DataFrame.append is slow when the DataFrame is huge #35710

Closed

TomAugspurger mentioned this issue Sep 25, 2020

ENH: Add 'inplace' parameter to DataFrame.append() #2801

Closed

mroeschke mentioned this issue Aug 13, 2021

QST: Appending to pandas DataFrames #36281

Closed

mroeschke mentioned this issue Oct 2, 2021

ENH: Add a level option to pd.DataFrame.append() #43821

Closed

github-actions bot assigned neinkeinkaffee Nov 19, 2021

neinkeinkaffee added a commit to neinkeinkaffee/pandas that referenced this issue Nov 20, 2021

DEPR: Series/DataFrame.append (pandas-dev#35407)

1626da0

neinkeinkaffee added a commit to neinkeinkaffee/pandas that referenced this issue Nov 20, 2021

DEPR: Series/DataFrame.append (pandas-dev#35407)

c1289e9

neinkeinkaffee mentioned this issue Nov 20, 2021

DEPR: Series/DataFrame.append (#35407) #44539

Merged

5 tasks

MarcoGorelli mentioned this issue Apr 21, 2022

DOC: write guide for how to replace append #46825

Closed

1 task

EwoutH mentioned this issue May 16, 2022

Pandas append() method depreciated, replace all occurrences with concat() quaquel/EMAworkbench#126

Closed

4n4nd mentioned this issue Aug 31, 2022

Fix [FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.] 4n4nd/prometheus-api-client-python#247

Merged

smcgivern mentioned this issue Sep 18, 2022

Speed up scrapers obrasier/cricketstats#8

Merged

jrbourbeau mentioned this issue Dec 12, 2022

append compatibility for pandas 2.0 dask/dask#9750

Merged

claudiodonofrio added a commit to ICOS-Carbon-Portal/pylib that referenced this issue Jan 8, 2023

use pd concat

9ddbbed

adding series to a pandas dataframe creates a performance warning append to dataframe is deprecated, use concat instead pandas-dev/pandas#35407

This was referenced Jan 8, 2023

#124, pandas performance warning ICOS-Carbon-Portal/pylib#125

Closed

pandas performance warning ICOS-Carbon-Portal/pylib#126

Merged

pandas-dev locked as resolved and limited conversation to collaborators Mar 7, 2023

Deprecate Series / DataFrame.append #35407

Deprecate Series / DataFrame.append #35407

Comments

TomAugspurger commented Jul 24, 2020

jreback commented Jul 24, 2020

erfannariman commented Jul 24, 2020

shoyer commented Jul 25, 2020

jreback commented Jul 25, 2020

AlexKirko commented Jul 27, 2020 • edited Loading

achapkowski commented Aug 3, 2020

TomAugspurger commented Sep 25, 2020

darindillon commented Sep 27, 2020

taylor-schneider commented Oct 7, 2020

achapkowski commented Oct 7, 2020

TomAugspurger commented Oct 7, 2020

achapkowski commented Oct 7, 2020

MarcoGorelli commented Aug 24, 2021 • edited Loading

achapkowski commented Aug 24, 2021

darindillon commented Aug 24, 2021 • edited Loading

jreback commented Aug 24, 2021

achapkowski commented Aug 25, 2021

MarcoGorelli commented Aug 25, 2021

neinkeinkaffee commented Nov 19, 2021

jreback commented Apr 6, 2022

wumpus commented Apr 7, 2022

behrenhoff commented Apr 7, 2022 • edited Loading

MarcoGorelli commented Apr 7, 2022

behrenhoff commented Apr 7, 2022 • edited Loading

PolarNick239 commented Apr 8, 2022

phofl commented Apr 8, 2022 • edited Loading

PolarNick239 commented Apr 8, 2022

phofl commented Apr 8, 2022

PolarNick239 commented Apr 8, 2022

MarcoGorelli commented Apr 8, 2022 • edited Loading

phofl commented Apr 8, 2022 • edited Loading

PolarNick239 commented Apr 8, 2022

wstomv commented Apr 21, 2022 • edited Loading

MarcoGorelli commented Apr 21, 2022

javiertognarelli commented Sep 1, 2022

etale-cohomology commented Oct 17, 2022

mcclaassen commented Mar 6, 2023 • edited Loading

MarcoGorelli commented Mar 7, 2023

AlexKirko commented Jul 27, 2020 •

edited

Loading

MarcoGorelli commented Aug 24, 2021 •

edited

Loading

darindillon commented Aug 24, 2021 •

edited

Loading

behrenhoff commented Apr 7, 2022 •

edited

Loading

behrenhoff commented Apr 7, 2022 •

edited

Loading

phofl commented Apr 8, 2022 •

edited

Loading

MarcoGorelli commented Apr 8, 2022 •

edited

Loading

phofl commented Apr 8, 2022 •

edited

Loading

wstomv commented Apr 21, 2022 •

edited

Loading

mcclaassen commented Mar 6, 2023 •

edited

Loading